Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-28 Thread Ben Hutchings
On Sat, 2013-04-27 at 10:13 +0200, Laszlo Kajan wrote:
 Dear Ben!
 
 On 27/04/13 00:46, Ben Hutchings wrote:
[...]
  However, I would expect the vast majority of installations to be on
  amd64, so if you always generate a 64-bit little-endian database
  and avoid duplicating when installing on such a machine then it
  would be better for most users (not so nice for others).
  
  (Incidentally, arch:all packages generating arch-specific data have
  interesting interactions with multi-arch.  I doubt many people with
  multi-arch systems would want this package to generate multiple
  versions of the database, but you never know...)
 
 I see. According to [1], Arch:all with Multi-Arch:same is an error.
 [1] https://wiki.ubuntu.com/MultiarchSpec

 So at this point I see one way forward:
 
 1: Move the postinst script into a new Arch:any package that depends on 
 'metastudent-data'. This Arch:any package would build the native
 database in postinst (with no multiarch support for now).
 
 What do you think?

This might work, but be careful.  Consider a multiarch system with armhf
as primary and armel as additional architecture.  Both architectures are
little-endian, 32-bit.  If you install metastudent-data-native/armel and
metastudent-data-native/armhf, they should only generate one native copy
of the data (right?).  But if you remove one and leave the other, the
native copy should stay.

Ben.

-- 
Ben Hutchings
Knowledge is power.  France is bacon.


signature.asc
Description: This is a digitally signed message part


Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-27 Thread Laszlo Kajan
Dear Ben!

On 27/04/13 00:46, Ben Hutchings wrote:
 On Fri, Apr 26, 2013 at 03:21:43PM +0200, Laszlo Kajan wrote:
 Dear FTP Masters!

 On 23/04/13 15:13, Benjamin Drung wrote:
 [...]
 You can use xz for the source and binary package to reduce the size. The
 default compression level for xz reduces the size of the source tarball
 from 415 MB to 272 MB:

 $ ls -1s --si metastudent-data_1.0.0.tar*
 823M metastudent-data_1.0.0.tar
 381M metastudent-data_1.0.0.tar.bz2
 415M metastudent-data_1.0.0.tar.gz
 272M metastudent-data_1.0.0.tar.xz
 $ ls -1sh metastudent-data_1.0.0.tar*
 784M metastudent-data_1.0.0.tar
 363M metastudent-data_1.0.0.tar.bz2
 396M metastudent-data_1.0.0.tar.gz
 259M metastudent-data_1.0.0.tar.xz

 Following Benjamin's suggestion and the data.debian.org document [1], we 
 have prepared a 'metastudent-data' arch:all package that is ~130MB (xz
 compressed).
 The package builds required architecture dependent databases in the postinst 
 script. The purpose of this is to save space in the archive that
 each architecture dependent version would take up.
 [...]
 
 Does this mean that installing the package results in having two
 uncompressed copies of the data on disk?  If so, wouldn't it be
 better to do:

Indeed, the original arch:all version, and the native one. The arch:all version 
is not needed any more after conversion, and could be removed.
Thanks for drawing my attention to this.

 1. Compress the database (with xz).
 2. Build the package without compression (contents are already
compressed so re-compressing would be a waste of time).
 3. In postinst, decompress and convert the database to native.
 
 However, I would expect the vast majority of installations to be on
 amd64, so if you always generate a 64-bit little-endian database
 and avoid duplicating when installing on such a machine then it
 would be better for most users (not so nice for others).
 
 (Incidentally, arch:all packages generating arch-specific data have
 interesting interactions with multi-arch.  I doubt many people with
 multi-arch systems would want this package to generate multiple
 versions of the database, but you never know...)

I see. According to [1], Arch:all with Multi-Arch:same is an error.
[1] https://wiki.ubuntu.com/MultiarchSpec

So at this point I see one way forward:

1: Move the postinst script into a new Arch:any package that depends on 
'metastudent-data'. This Arch:any package would build the native
database in postinst (with no multiarch support for now).

What do you think?

Best regards,
Laszlo


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/517b88b9.8000...@rostlab.org



Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-27 Thread Adam Borowski
On Fri, Apr 26, 2013 at 11:46:44PM +0100, Ben Hutchings wrote:
 On Fri, Apr 26, 2013 at 03:21:43PM +0200, Laszlo Kajan wrote:
  On 23/04/13 15:13, Benjamin Drung wrote:
  [...]
   You can use xz for the source and binary package to reduce the size. The
   default compression level for xz reduces the size of the source tarball
   from 415 MB to 272 MB:
  
  Following Benjamin's suggestion and the data.debian.org document [1], we 
  have prepared a 'metastudent-data' arch:all package that is ~130MB (xz
  compressed).
  The package builds required architecture dependent databases in the 
  postinst script. The purpose of this is to save space in the archive that
  each architecture dependent version would take up.
 [...]
 
 Does this mean that installing the package results in having two
 uncompressed copies of the data on disk?  If so, wouldn't it be
 better to do:
 
 1. Compress the database (with xz).
 2. Build the package without compression (contents are already
compressed so re-compressing would be a waste of time).
 3. In postinst, decompress and convert the database to native.

If it's never going to be recompressed, you really want to compress it up
the wazoo:
| compression   | decompression
xz  |  size |amd64 |  armhf |   armhf
-0  | 407076744 |  1:49.77 |6:14.47 | 1:23.31
-6  | 271088012 | 14:56.38 |   47:40.23 | 1:02.37
-9e | 195223672 | 19:38.15 | 1:06:50|   48.01

Far less space taken, _and_ it decompresses faster.

 However, I would expect the vast majority of installations to be on
 amd64, so if you always generate a 64-bit little-endian database
 and avoid duplicating when installing on such a machine then it
 would be better for most users (not so nice for others).

Looks like we're getting 539375329859372 cores on one blade armhf machines,
but you have a point.  I find it quite strange why would the on-disk format
ever care about word width, though: if the data fits in 32 bits, there's
lots of waste for no gain -- mmap or not.

-- 
ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130427140916.ga24...@angband.pl



Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-26 Thread Laszlo Kajan
Dear FTP Masters!

On 23/04/13 15:13, Benjamin Drung wrote:
[...]
 You can use xz for the source and binary package to reduce the size. The
 default compression level for xz reduces the size of the source tarball
 from 415 MB to 272 MB:
 
 $ ls -1s --si metastudent-data_1.0.0.tar*
 823M metastudent-data_1.0.0.tar
 381M metastudent-data_1.0.0.tar.bz2
 415M metastudent-data_1.0.0.tar.gz
 272M metastudent-data_1.0.0.tar.xz
 $ ls -1sh metastudent-data_1.0.0.tar*
 784M metastudent-data_1.0.0.tar
 363M metastudent-data_1.0.0.tar.bz2
 396M metastudent-data_1.0.0.tar.gz
 259M metastudent-data_1.0.0.tar.xz

Following Benjamin's suggestion and the data.debian.org document [1], we have 
prepared a 'metastudent-data' arch:all package that is ~130MB (xz
compressed).
The package builds required architecture dependent databases in the postinst 
script. The purpose of this is to save space in the archive that
each architecture dependent version would take up.
The arch:all package is almost identical to the source package.

* Please comment on this solution. If you like it, we will upload it (targeting 
the 'main' area), and have 'metastudent' (also in main) depend
on it.

[1] http://ftp-master.debian.org/wiki/projects/data/

Thank you for commenting.

Best regards,
Laszlo


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/517a7f67.3070...@rostlab.org



Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-26 Thread Ben Hutchings
On Fri, Apr 26, 2013 at 03:21:43PM +0200, Laszlo Kajan wrote:
 Dear FTP Masters!
 
 On 23/04/13 15:13, Benjamin Drung wrote:
 [...]
  You can use xz for the source and binary package to reduce the size. The
  default compression level for xz reduces the size of the source tarball
  from 415 MB to 272 MB:
  
  $ ls -1s --si metastudent-data_1.0.0.tar*
  823M metastudent-data_1.0.0.tar
  381M metastudent-data_1.0.0.tar.bz2
  415M metastudent-data_1.0.0.tar.gz
  272M metastudent-data_1.0.0.tar.xz
  $ ls -1sh metastudent-data_1.0.0.tar*
  784M metastudent-data_1.0.0.tar
  363M metastudent-data_1.0.0.tar.bz2
  396M metastudent-data_1.0.0.tar.gz
  259M metastudent-data_1.0.0.tar.xz
 
 Following Benjamin's suggestion and the data.debian.org document [1], we have 
 prepared a 'metastudent-data' arch:all package that is ~130MB (xz
 compressed).
 The package builds required architecture dependent databases in the postinst 
 script. The purpose of this is to save space in the archive that
 each architecture dependent version would take up.
[...]

Does this mean that installing the package results in having two
uncompressed copies of the data on disk?  If so, wouldn't it be
better to do:

1. Compress the database (with xz).
2. Build the package without compression (contents are already
   compressed so re-compressing would be a waste of time).
3. In postinst, decompress and convert the database to native.

However, I would expect the vast majority of installations to be on
amd64, so if you always generate a 64-bit little-endian database
and avoid duplicating when installing on such a machine then it
would be better for most users (not so nice for others).

(Incidentally, arch:all packages generating arch-specific data have
interesting interactions with multi-arch.  I doubt many people with
multi-arch systems would want this package to generate multiple
versions of the database, but you never know...)

Ben.

-- 
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
  - Albert Camus


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130426224644.gf2...@decadent.org.uk



Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-24 Thread Olivier Sallou

On 04/23/2013 11:48 AM, Laszlo Kajan wrote:
 Dear Russ, Debian Med Team, Charles!

 (Please keep Tobias Hamp in replies.)

 @Russ: Please allow me to include you in a discussion about a few 
 bioinformatics packages that depend on big, but free data [2]. I have cited
 your opinion [3] in this discussion before. You are on the technical 
 committee and on the policy team, so you, together with Charles, can help
 substantially here.

 [2] 
 http://lists.alioth.debian.org/pipermail/debian-med-packaging/2013-April/thread.html
 [3] https://lists.debian.org/debian-vote/2013/03/msg00279.html

 This email is to continue the discussion about free packages that depend on 
 big (e.g. 400MB) free data outside 'main'. These packages
 apparently violate policy 2.2.1 [0] for inclusion in 'main' because they 
 require software outside the 'main' area to function. They do not
 violate point #1 of the social contract [1], which requires non-dependency on 
 non-free components. For these big data packages, policy seems to
 be overly restrictive compared to the social contract, leading to seemingly 
 unfounded rejection from 'main'.
Indeed, many bioinformatics programs relies on external data. But I am
afraid that if we start to add some data packages, we will open an
endless open door BioInformatics datasets are large, and becoming
huge and numerous.
This size will be an issue for Debian mirrors (mainly if some indexed
data are system dependent) but will also be a pain for the user if, when
installing a program (to have a look), it downloads GBs of dependent
packaged data. It may be really slow and fill the user disk (and I do
not talk of package updates).

Should not those data dependency clearly stated somewhere with the
software package, with a script to get them ?

Olivier

 [0] http://www.debian.org/doc/debian-policy/ch-archive.html
 [1] http://www.debian.org/social_contract

 * In case the social contract indeed allows such packages to be in 'main' 
 (and policy is overly restrictive), how could it be ensured that the
 packages are accepted?

 * What is the procedure within Debian to elicit a decision about the handling 
 of such packages in terms of archive area? Discussion on d-devel,
 followed by policy change? Asking the policy team to clarify policy for such 
 packages? Technical committee?

  + Charles suggested such packages could go into 'main' [4], with a clear 
 indication of the large data dependency of the package in the long
 description.
When possible, providing the scripts for generating the large data as well.

  [4] 
 http://lists.alioth.debian.org/pipermail/debian-med-packaging/2013-April/019292.html

 My goal as a Debian Developer and a packager is to get packages into Debian 
 (so 'main') that are allowed in there, in reasonably short time. I
 would like to resolve this issue properly, because I believe it may pop up 
 more often in bioinformatics software. For example, imagine a protein
 folding tool that would require a very large database to search for 
 homologues for contact prediction, and using the contacts it would predict
 protein three-dimensional structure. This has been done before [5], and such 
 a tool would be (is) immensely useful for bioinformatics. This tool
 would depend on gigabytes of data we would not package. Yet, by all means, I 
 would want the tool to be part of the distribution.

 [5] http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0028766

 Thank you for your opinion and advice.

 Best regards,
 Laszlo

 ___
 Debian-med-packaging mailing list
 debian-med-packag...@lists.alioth.debian.org
 http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/debian-med-packaging

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/51777990.60...@irisa.fr



Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-24 Thread Olе Streicher
Olivier Sallou olivier.sal...@irisa.fr writes:
 Indeed, many bioinformatics programs relies on external data. But I am afraid
 that if we start to add some data packages, we will open an endless open
 door BioInformatics datasets are large, and becoming huge and numerous.
 This size will be an issue for Debian mirrors (mainly if some indexed data are
 system dependent) but will also be a pain for the user if, when installing a
 program (to have a look), it downloads GBs of dependent packaged data. It may
 be really slow and fill the user disk (and I do not talk of package updates).

Without having any solution, I'd like to mention that this is not a med
specific problem. I am working on a number of packages for astrophysics
that require/suggest up to some hundreds of megabytes calibration data
http://www.eso.org/sci/software/pipelines/. Given that these packages
are of quite specific interest, it makes IMO no sense to pollute all
Debian mirrors with these files.

Best regards

Ole


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/ytzobd49zcp@news.ole.ath.cx



Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-24 Thread Laszlo Kajan
Hi Olivier!

On 24/04/13 08:20, Olivier Sallou wrote:
 
 On 04/23/2013 11:48 AM, Laszlo Kajan wrote:
 Dear Russ, Debian Med Team, Charles!

 (Please keep Tobias Hamp in replies.)

 @Russ: Please allow me to include you in a discussion about a few 
 bioinformatics packages that depend on big, but free data [2]. I have cited
 your opinion [3] in this discussion before. You are on the technical 
 committee and on the policy team, so you, together with Charles, can help
 substantially here.

 [2] 
 http://lists.alioth.debian.org/pipermail/debian-med-packaging/2013-April/thread.html
 [3] https://lists.debian.org/debian-vote/2013/03/msg00279.html

 This email is to continue the discussion about free packages that depend on 
 big (e.g. 400MB) free data outside 'main'. These packages
 apparently violate policy 2.2.1 [0] for inclusion in 'main' because they 
 require software outside the 'main' area to function. They do not
 violate point #1 of the social contract [1], which requires non-dependency 
 on non-free components. For these big data packages, policy seems to
 be overly restrictive compared to the social contract, leading to seemingly 
 unfounded rejection from 'main'.
 Indeed, many bioinformatics programs relies on external data. But I am
 afraid that if we start to add some data packages, we will open an
 endless open door BioInformatics datasets are large, and becoming
 huge and numerous.
 This size will be an issue for Debian mirrors (mainly if some indexed
 data are system dependent) but will also be a pain for the user if, when
 installing a program (to have a look), it downloads GBs of dependent
 packaged data. It may be really slow and fill the user disk (and I do
 not talk of package updates).
 
 Should not those data dependency clearly stated somewhere with the
 software package, with a script to get them ?

Yes, the former (clearly state large external data dependency in the long 
package description) is exactly what Charles Plessy recommended.
And your idea with the script to get the data is exactly what we implemented 
for this 'metastudent' package. So we clearly think along the same
lines... Now we just have to discuss it with the FTP master team as well, so we 
see if this is acceptable for them (or they prefer to have the
data in the archive).
Laszlo


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/5177e18b.4010...@rostlab.org



Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-24 Thread Laszlo Kajan
Hello Didier!

On 24/04/13 09:32, Didier 'OdyX' Raboud wrote:
 Le mardi, 23 avril 2013 12.23:23, Andreas Tille a écrit :
 I would even go that far that it might make sense to package these data
 and upload it to demonstrate that we should *really* create a solution
 for such cases if they will increase in the number and size of data
 packages.
 
 Isn't that what data.debian.org is supposed to be(come) ?
 
   * http://ftp-master.debian.org/wiki/projects/data/
   * http://lists.debian.org/debian-devel/2010/09/msg00692.html

Thanks for pointing this out, I didn't know about this. This would work very 
well for the 'metastudent' (and other of the same kind) data. A
policy change (point 'We need to change policy.' of [1]) could be initiated, as 
Russ Allbery noted before [2]. But - data.debian.org does not
(yet) exist, does it?

[1] http://ftp-master.debian.org/wiki/projects/data/
[2] 
http://lists.alioth.debian.org/pipermail/debian-med-packaging/2013-April/019320.html

Laszlo


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/5177e604.50...@rostlab.org



Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-24 Thread Olivier Sallou

On 04/24/2013 04:02 PM, Laszlo Kajan wrote:
 Hello Didier!

 On 24/04/13 09:32, Didier 'OdyX' Raboud wrote:
 Le mardi, 23 avril 2013 12.23:23, Andreas Tille a écrit :
 I would even go that far that it might make sense to package these data
 and upload it to demonstrate that we should *really* create a solution
 for such cases if they will increase in the number and size of data
 packages.
 Isn't that what data.debian.org is supposed to be(come) ?

  * http://ftp-master.debian.org/wiki/projects/data/
  * http://lists.debian.org/debian-devel/2010/09/msg00692.html
 Thanks for pointing this out, I didn't know about this. This would work very 
 well for the 'metastudent' (and other of the same kind) data. A
 policy change (point 'We need to change policy.' of [1]) could be initiated, 
 as Russ Allbery noted before [2]. But - data.debian.org does not
 (yet) exist, does it?
sounds idea is quite old (2009) but did not progress. Could be the
opportunity to relaunch it.
 [1] http://ftp-master.debian.org/wiki/projects/data/
 [2] 
 http://lists.alioth.debian.org/pipermail/debian-med-packaging/2013-April/019320.html

 Laszlo

 ___
 Debian-med-packaging mailing list
 debian-med-packag...@lists.alioth.debian.org
 http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/debian-med-packaging

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/5177ec98.5040...@irisa.fr



Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-23 Thread Laszlo Kajan
Hello Andreas!

On 23/04/13 12:23, Andreas Tille wrote:
 On Tue, Apr 23, 2013 at 11:48:05AM +0200, Laszlo Kajan wrote:

 This email is to continue the discussion about free packages that depend on 
 big (e.g. 400MB) free data outside 'main'.
 
 In your practical case is this data say 500MB?  Are we talking about
 compressed or uncompressed data (= 400MB on users harddisk or on all
 Debian mirrors world-wide)?

It is around 404MB, gzip compressed [1]. I think it is not arch independent. I 
think BLAST databases (the main bulk in the tar.gz) are sensitive
to the size of int, and endian-ness.

[1] ftp://rostlab.org/metastudent/metastudent-data_1.0.0.tar.gz

 We do actually have examples of 500MB binary packages:
 
 udd@ullmann:/srv/mirrors/debian$ find . -type f -size +500M -name *.deb
 ./pool/main/f/freefoam/freefoam-dev-doc_0.1.0+dfsg-1_all.deb
 ./pool/main/libr/libreoffice/libreoffice-dbg_4.0.3~rc1-3_amd64.deb
 ./pool/main/libr/libreoffice/libreoffice-dbg_4.0.3~rc1-3_kfreebsd-amd64.deb
 ./pool/main/libr/libreoffice/libreoffice-dbg_4.0.3~rc1-2_amd64.deb
 ./pool/main/libr/libreoffice/libreoffice-dbg_4.0.3~rc1-2_kfreebsd-amd64.deb
 ./pool/main/n/ns3/ns3-doc_3.16+dfsg1-1_all.deb
 ./pool/main/n/ns3/ns3-doc_3.15+dfsg-1_all.deb
 ./pool/main/w/webkitgtk/libwebkit2gtk-3.0-0-dbg_1.11.91-1_amd64.deb
 ./pool/non-free/r/redeclipse-data/redeclipse-data_1.4-1_all.deb
 
 Even if the topic should be clarified in general because we will
 certainly have larger data sets than this in the future I could imagine
 that packaging this very data in your case should not be the main
 problem under the current circumstances as long there is no better
 solution found.
 
 I would even go that far that it might make sense to package these data
 and upload it to demonstrate that we should *really* create a solution
 for such cases if they will increase in the number and size of data
 packages.

All right, we will package and upload the big data in case no one thinks of a 
better solution and discussion dies in, say, a week.

Laszlo


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/517675d3.3030...@rostlab.org



Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-23 Thread Benjamin Drung
Am Dienstag, den 23.04.2013, 13:51 +0200 schrieb Laszlo Kajan:
 Hello Andreas!
 
 On 23/04/13 12:23, Andreas Tille wrote:
  On Tue, Apr 23, 2013 at 11:48:05AM +0200, Laszlo Kajan wrote:
 
  This email is to continue the discussion about free packages that
  depend on big (e.g. 400MB) free data outside 'main'.
  
  In your practical case is this data say 500MB?  Are we talking about
  compressed or uncompressed data (= 400MB on users harddisk or on all
  Debian mirrors world-wide)?
 
 It is around 404MB, gzip compressed [1]. I think it is not arch
 independent. I think BLAST databases (the main bulk in the tar.gz) are
 sensitive
 to the size of int, and endian-ness.
 
 [1] ftp://rostlab.org/metastudent/metastudent-data_1.0.0.tar.gz

You can use xz for the source and binary package to reduce the size. The
default compression level for xz reduces the size of the source tarball
from 415 MB to 272 MB:

$ ls -1s --si metastudent-data_1.0.0.tar*
823M metastudent-data_1.0.0.tar
381M metastudent-data_1.0.0.tar.bz2
415M metastudent-data_1.0.0.tar.gz
272M metastudent-data_1.0.0.tar.xz
$ ls -1sh metastudent-data_1.0.0.tar*
784M metastudent-data_1.0.0.tar
363M metastudent-data_1.0.0.tar.bz2
396M metastudent-data_1.0.0.tar.gz
259M metastudent-data_1.0.0.tar.xz

-- 
Benjamin Drung
Debian  Ubuntu Developer


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1366722785.3022.4.camel@deep-thought



Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-23 Thread Laszlo Kajan
Hello Benjamin!

On 23/04/13 15:13, Benjamin Drung wrote:
 Am Dienstag, den 23.04.2013, 13:51 +0200 schrieb Laszlo Kajan:
 Hello Andreas!

 On 23/04/13 12:23, Andreas Tille wrote:
 On Tue, Apr 23, 2013 at 11:48:05AM +0200, Laszlo Kajan wrote:

 This email is to continue the discussion about free packages that
 depend on big (e.g. 400MB) free data outside 'main'.

 In your practical case is this data say 500MB?  Are we talking about
 compressed or uncompressed data (= 400MB on users harddisk or on all
 Debian mirrors world-wide)?

 It is around 404MB, gzip compressed [1]. I think it is not arch
 independent. I think BLAST databases (the main bulk in the tar.gz) are
 sensitive
 to the size of int, and endian-ness.

 [1] ftp://rostlab.org/metastudent/metastudent-data_1.0.0.tar.gz
 
 You can use xz for the source and binary package to reduce the size. The
 default compression level for xz reduces the size of the source tarball
 from 415 MB to 272 MB:
 
 $ ls -1s --si metastudent-data_1.0.0.tar*
 823M metastudent-data_1.0.0.tar
 381M metastudent-data_1.0.0.tar.bz2
 415M metastudent-data_1.0.0.tar.gz
 272M metastudent-data_1.0.0.tar.xz
 $ ls -1sh metastudent-data_1.0.0.tar*
 784M metastudent-data_1.0.0.tar
 363M metastudent-data_1.0.0.tar.bz2
 396M metastudent-data_1.0.0.tar.gz
 259M metastudent-data_1.0.0.tar.xz

Ah great! Thanks for checking this. A lesson for the future. We will switch to 
xz. Best regards,

Laszlo


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/5176f4e6.2000...@rostlab.org



Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-23 Thread Laszlo Kajan
Dear Russ!

Thank you for getting back to me.

On 23/04/13 18:48, Russ Allbery wrote:
 Laszlo Kajan lka...@debian.org writes:
 
 This email is to continue the discussion about free packages that depend
 on big (e.g. 400MB) free data outside 'main'. These packages apparently
 violate policy 2.2.1 [0] for inclusion in 'main' because they require
 software outside the 'main' area to function. They do not violate point
 #1 of the social contract [1], which requires non-dependency on non-free
 components. For these big data packages, policy seems to be overly
 restrictive compared to the social contract, leading to seemingly
 unfounded rejection from 'main'.
 
 * In case the social contract indeed allows such packages to be in
 'main' (and policy is overly restrictive), how could it be ensured that
 the packages are accepted?
 
 Yes, I agree.  Although we should probably talk with ftp-master about
 whether they would like the data to just be packaged and uploaded as a
 regular package.

Ftp-master was included in the initial thread [1], but they did not (yet) 
respond, and I started to feel that it may be impolite to flood their
inbox with an issue like this. Since perhaps they alone can not decide about 
it. So yes, ftp-master is included in the mail once again.

[1] 
http://lists.alioth.debian.org/pipermail/debian-med-packaging/2013-April/019282.html

 * What is the procedure within Debian to elicit a decision about the
 handling of such packages in terms of archive area? Discussion on
 d-devel, followed by policy change? Asking the policy team to clarify
 policy for such packages? Technical committee?
 
 Discussing it on debian-devel seems right, but I would also draw it to
 ftp-master's attention, since they're the people who have to worry about
 archive size).  We can easily move on to modifying Policy if there's a
 consensus to let packages like that pull the data down from some external
 source.

How to gauge that consensus?

Laszlo


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/5176f5f5.5040...@debian.org



Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-23 Thread Russ Allbery
Laszlo Kajan lka...@debian.org writes:
 On 23/04/13 18:48, Russ Allbery wrote:

 Discussing it on debian-devel seems right, but I would also draw it to
 ftp-master's attention, since they're the people who have to worry
 about archive size).  We can easily move on to modifying Policy if
 there's a consensus to let packages like that pull the data down from
 some external source.

 How to gauge that consensus?

Generally the way it works is that if no one objects to the idea, we bring
it up on the Policy list, where we have a somewhat more formal process
that involves seconds or objections.

I think the ideal from a usability standpoint would be to just upload the
data directly to the Debian archive, though.  It's just a question of how
big of packages we want to handle through the mirror network, or whether
it's worth the effort to create a separate archive of huge data packages.

-- 
Russ Allbery (r...@debian.org)   http://www.eyrie.org/~eagle/


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/878v487vrc@windlord.stanford.edu