Re: Description for lefse tools (Was: Origin of data files in MetaPhLan2)

2016-08-03 Thread Nicola Segata
Great, thanks Andreas. We provide the "*.bt2" files so that the user can
run BowTie2 internally to MetaPhlAn directly without first building the
indexes (it will take quite a bit of time). Also, the indexes are smaller
in size than the sequence file...

cheers
Nicola

On Wed, Aug 3, 2016 at 6:08 PM Andreas Tille  wrote:

> Hi Tin,
>
> On Wed, Aug 03, 2016 at 02:01:01PM +, Duy Tin Truong wrote:
> > > - Tin can also provide more info about the binary data in db_v20. The
> files
> > > ending with "bt2" are created using a script in the Bowtie2 package
> > > (bowtie2-build) using a sequence file Tin can provide (it can also be
> > > recovered from the bt2 files with bowtie2-inspect if I remember well).
> > As Nicola said, those files in db_v20 are created with bowtie2-build
> > using a sequence file and you can recover the sequence file by:
> >
> > bowtie2-inspect metaphlan2/db_v20/mpa_v20_m200 > metaphlan2/markers.fasta
> >
> > If you want to rebuild them, the command is:
> >
> > bowtie2-build metaphlan2/markers.fasta metaphlan2/db_v21/mpa_v21_m200
>
> I can confirm that I can reproduce the files byte identical from
> markers.fasta.  Is there any reason to ship the binary form instead of
> the fasta text file?  Moreover, what is the source of the markers.fasta?
> Is there any related publication or so?
>
> > > For the mpa_v20_m200.pkl Tin can also provide the uncompressed python
> > > object (or he can provide a couple of lines of code to uncompress it?)
> > It is python dictionary and can be read as:
> >
> > import cPickle as pickleimport bz2
> > db = pickle.load(bz2.BZ2File('db_v20/mpa_v20_m200.pkl', 'r'))
> >
> > You can have more information about them at:
> >
> https://bitbucket.org/biobakery/metaphlan2#markdown-header-customizing-the-database
>
> OK, that page clarifies the method.  Just a personal remark from the
> point of view of an outsider of bioinformatics:  I'd regard the creation
> process of the mpa_v20_m200.pkl file a bit cumbersome.  I'd personally
> prefer droping some text record somewhere and call a script processing
> this record rather than writing an own script.
>
> > In addition, some files were changed the names:
> >- metaphlan2_strainer.py -> strainphlan.py
> >- strainer_src -> strainphlan_src
> >- strainer_tutorial -> strainphlan_tutorial
> >
> > Some source files were updated as well.
> > Please let me know if you need other information.
>
> Just drop me a not once you might release a new version containing these
> changes.  I think I'll try to release the current version as is since at
> least the origin of the files is clarified now.  I'm not yet sure whether
> the size of the data is acceptable or might spoil some limit.  Regarding
> this I'm wondering whether I create a source tarball including rather
> markers.fasta and create the bt2 files in the build process.
>
> Kind regards
>
>Andreas.
>
> --
> http://fam-tille.de
>


[ti...@debian.org: Bug#833388: ITP: metaphlan2 -- Metagenomic Phylogenetic Analysis]

2016-08-03 Thread Andreas Tille
Please join the discussion on debian-devel (see below).  Andreas.

- Forwarded message from Andreas Tille  -

Date: Wed, 03 Aug 2016 21:00:05 +0200
From: Andreas Tille 
To: Debian Bug Tracking System 
Subject: Bug#833388: ITP: metaphlan2 -- Metagenomic Phylogenetic Analysis
X-Debian-PR-Message: report 833388
X-Debian-PR-Package: wnpp
X-Debian-PR-Keywords: 

Package: wnpp
Severity: wishlist
Owner: Andreas Tille 

* Package name: metaphlan2
  Version : 2.5.0
  Upstream Author : Nicola Segata 
* URL : https://bitbucket.org/nsegata/metaphlan2/wiki/Home
* License : MIT
  Programming Lang: Python
  Description : Metagenomic Phylogenetic Analysis
 MetaPhlAn is a computational tool for profiling the composition of
 microbial communities (Bacteria, Archaea, Eukaryotes and Viruses) from
 metagenomic shotgun sequencing data with species level resolution. From
 version 2.0, MetaPhlAn is also able to identify specific strains (in the
 not-so-frequent cases in which the sample contains a previously
 sequenced strains) and to track strains across samples for all species.
 .
 MetaPhlAn 2.0 relies on ~1M unique clade-specific marker genes (the
 marker information file can be found at src/utils/markers_info.txt.bz2
 or here) identified from ~17,000 reference genomes (~13,500 bacterial
 and archaeal, ~3,500 viral, and ~110 eukaryotic), allowing:
 .
  * unambiguous taxonomic assignments;
  * accurate estimation of organismal relative abundance;
  * species-level resolution for bacteria, archaea, eukaryotes and
viruses;
  * strain identification and tracking
  * orders of magnitude speedups compared to existing methods.
  * metagenomic strain-level population genomics


Remark: The package is a target for Debian Med in itself and will be
used by metaBIT.  It will be maintained by the Debian Med team and the
packaging is currently available at
   svn://anonscm.debian.org/debian-med/trunk/packages/metaphlan2/trunk/


*** I'd like to discuss the following issue on debian-devel list ***

While Debian Med is injecting several low popularity contest packages
this one has an extraordinary large set of data and thus I want to
discuss the following options:

  1) Original orig.tar.gz has 1GB and contains 1.2GB uncompressed
 binary data.  License-wise it should not be a problem since
 there is a recipe given how to translate these into text form
 back and forth[1].

 We would have: source package 1GB + binary package 1GB

  2) When unpackaging the orig.tar.gz translating binary data to
 text format and recompress using xz the tarball is "only" 265MB.
 The transformation process takes about 30min on my Laptop - not
 longer than any larger project might need to build but the
 resulting binary package would have again close to 1GB.

 This enables the options:

 2a) Source tarball 256MB + binary package 1GB

 2b) Do the conversion of the format in postinst at the expense
 of users time which is acceptable since the package usually
 unpacks on high performance machines and not so many
 installations which means bandwidth and disk space on Debian
 mirrors should be saved here instead of users machine

 Source tarball 256MB + binary package ~250MB (estimated)

  3) Strip all data from the source package and download data in
 postinst from upstream Git repository.  This makes the package
 of uncritical size from a Debian point of view but might be
 problematic in some user setups which might have problems with
 larger data downloads (possibly be upstream can be convinced
 to provide a *.bz2 tarball for maximum compression).

 3a) Use postinst

 3b) Inform user to call a download script manually to do not
 block apt for a longer time dealing with potential download
 problems.

What do you think what strategy should be choosen to be kind to Debian
(and mirror) resources?

Kind regards

Andreas.

[1] 
https://bitbucket.org/biobakery/metaphlan2#markdown-header-customizing-the-database



- End forwarded message -

-- 
http://fam-tille.de



Re: [ncbi-blast+] 01/01: upload upstream release 2.3.0

2016-08-03 Thread olivier sallou
Thank you!

Le mer. 3 août 2016 à 21:37, Aaron M. Ucko  a écrit :

> olivier sallou  writes:
>
> > you did not pushed your updates with new upstream release?
>
> I asked about that earlier; he said that he had been concerned that he
> might have incorrectly refreshed the existing patches.
>
> At any rate, I'm on it, and should be able to upload 2.4.0 tonight.
>
> For the record, the immediate problem turned out to be fallout from missing
> https://www.ncbi.nlm.nih.gov/viewvc/v1?view=revision=72513, which
> evidently didn't make it into the 2.4.0 release branch.
>
> --
> Aaron M. Ucko, KB1CJC (amu at alum.mit.edu, ucko at debian.org)
> http://www.mit.edu/~amu/ |
> http://stuff.mit.edu/cgi/finger/?a...@monk.mit.edu
>


Re: Status of seqan2

2016-08-03 Thread Andreas Tille
Hi Sascha,

On Wed, Aug 03, 2016 at 05:15:15PM +0100, Sascha Steinbiss wrote:
> > is it correct to assume that packaging seqan2 version 2.2 instead of 2.1
> > is the right way to go and should I help doing so?
> 
> Well, if I could make a wish, then I would say 2.2 for sure! I have a
> half-ready package for the free fast BLAST replacement Lambda [1] and
> upstream states that current releases will 'work with the last official
> SeqAn release' so I guess that would be 2.2. It would be great to get to
> finish this ;)

OK, I tried and the Git repository now contains SeqAn2 2.2 - but the
build process does not start properly.  Anybody with cmake knowledge
might hopefully be able to help ...
 
Kind regards

   Andreas.

> [1] https://github.com/seqan/lambda
> 
> > On Tue, Aug 02, 2016 at 09:10:27AM +0200, Andreas Tille wrote:
> >> On Thu, Jul 21, 2016 at 12:52:54PM +1000, Kevin Murray wrote:
> >>>
> >>> There were many complex merge conflicts between master and upstream. It 
> >>> was
> >>> actually a lot easier to resolve than I expected. It's now ready for 
> >>> review.
> >>> However, it would be great if someone could take a close look at the 
> >>> package,
> >>> particularly to ensure that the source is exactly what upstream provides 
> >>> (I've
> >>> tried to check this with git, and I think I got it right, but more 
> >>> experienced
> >>> eyes may differ).
> >>
> >> When trying to compare Upstream with the Git archive I stumbled upon the
> >> first question:  Any reason to stick to version 2.1.0 if 2.2.0 is out?
> >> May be the question is naive, but if we have trouble managing a single
> >> seqan version (we failed to fix bugs for a long time) and now agreed
> >> upon the need for two versions - old 1.4.2 (see my other mail) and 2.x
> >> series, does the status of the Git repository mean you intend to package
> >> 2.1 and 2.2 separately?
> >>  
> > Shall we start with a "simple" libseqan2-dev package with the latest 
> > upstream
> > version (2.2.0)? I'll see if I can build on Michael's work in the seqan2
> > package.
> 
>  Yes, please keep it as simple as possible (but not simpler :-P ).
> >>>
> >>> Working on this now. There are already a couple of errors, so we'll see 
> >>> how I
> >>> go. I'll try to push early and often, so don't assume that the repo is in 
> >>> a
> >>> working state :).
> >>
> >> No visible commit since
> >>
> >> commit 003f498e234ecc31229f6ba624c9d1afc6618d0d
> >> Author: Kevin Murray 
> >> Date:   Thu Jul 21 17:54:39 2016 +1000
> >>
> >> Did you pushed regularly?  Please git pull - I've fixed Vcs fields.
> >>
> >> Kind regards
> >>
> >>   Andreas.
> >>
> >> -- 
> >> http://fam-tille.de
> >>
> >>
> > 
> 
> 
> -- 
>  The Wellcome Trust Sanger Institute is operated by Genome Research 
>  Limited, a charity registered in England with number 1021457 and a 
>  company registered in England with number 2742969, whose registered 
>  office is 215 Euston Road, London, NW1 2BE. 
> 
> 

-- 
http://fam-tille.de



Re: [ncbi-blast+] 01/01: upload upstream release 2.3.0

2016-08-03 Thread Aaron M. Ucko
olivier sallou  writes:

> you did not pushed your updates with new upstream release?

I asked about that earlier; he said that he had been concerned that he
might have incorrectly refreshed the existing patches.

At any rate, I'm on it, and should be able to upload 2.4.0 tonight.

For the record, the immediate problem turned out to be fallout from missing
https://www.ncbi.nlm.nih.gov/viewvc/v1?view=revision=72513, which
evidently didn't make it into the 2.4.0 release branch.

-- 
Aaron M. Ucko, KB1CJC (amu at alum.mit.edu, ucko at debian.org)
http://www.mit.edu/~amu/ | http://stuff.mit.edu/cgi/finger/?a...@monk.mit.edu



Re: [ncbi-blast+] 01/01: upload upstream release 2.3.0

2016-08-03 Thread Andreas Tille
On Wed, Aug 03, 2016 at 06:35:53PM +, olivier sallou wrote:
> 
> oh sorry, I just saw an unpackaged  release in changelog and though it is
> the expected one.
> 
> you did not pushed your updates with new upstream release?

Yes, I did not pushed as I tried to explain to Aaron since I'm not fully
sure whether my adaption of the patches was well done.

Its just a gbp import-orig + quilt push/refresh with some -f and
adapting the patches.  I can not push my changes before tomorrow
morning but I would prefer a double check.

Kind regards

  Andreas.

-- 
http://fam-tille.de



Re: [ncbi-blast+] 01/01: upload upstream release 2.3.0

2016-08-03 Thread olivier sallou
Le mer. 3 août 2016 à 20:06, Andreas Tille  a écrit :

> Olivier, your @d.o address was bouncing - so I tryy the list.
>
> Your upload was of the old version instead of the new version 2.4.0
> which should be packaged.
>

oh sorry, I just saw an unpackaged  release in changelog and though it is
the expected one.

you did not pushed your updates with new upstream release?

Olivier

>
> Kind regards
>
> Andreas.
>
> - Forwarded message from Andreas Tille  -
>
> Date: Wed, 3 Aug 2016 18:10:40 +0200
> From: Andreas Tille 
> To: Olivier Sallou 
> Subject: Re: [ncbi-blast+] 01/01: upload upstream release 2.3.0
>
> Hi Olivier,
>
> the issue I was discussing on the list was about the *new* version
> 2.4.0.  :-)
>
> Could you have a look into this one?
>
> Thanks
>
>Andreas.
>
> On Wed, Aug 03, 2016 at 10:49:49AM +, Olivier Sallou wrote:
> > This is an automated email from the git hooks/post-receive script.
> >
> > osallou pushed a commit to branch master
> > in repository ncbi-blast+.
> >
> > commit 81497a8867902c852248367b37d88c2ec1f0b614
> > Author: Olivier Sallou 
> > Date:   Wed Aug 3 11:39:55 2016 +0200
> >
> > upload upstream release 2.3.0
> > ---
> >  debian/changelog | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/debian/changelog b/debian/changelog
> > index c3be456..05d4746 100644
> > --- a/debian/changelog
> > +++ b/debian/changelog
> > @@ -1,4 +1,4 @@
> > -ncbi-blast+ (2.3.0-2) UNRELEASED; urgency=medium
> > +ncbi-blast+ (2.3.0-2) unstable; urgency=medium
> >
> >* Team upload.
> >* Autopkgtest added
> >
> > --
> > Alioth's /usr/local/bin/git-commit-notice on /srv/
> git.debian.org/git/debian-med/ncbi-blast+.git
> >
> > ___
> > debian-med-commit mailing list
> > debian-med-com...@lists.alioth.debian.org
> >
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/debian-med-commit
> >
>
> --
> http://fam-tille.de
>
> - End forwarded message -
>
> --
> http://fam-tille.de
>


Re: [ncbi-blast+] 01/01: upload upstream release 2.3.0

2016-08-03 Thread Andreas Tille
Olivier, your @d.o address was bouncing - so I tryy the list.

Your upload was of the old version instead of the new version 2.4.0
which should be packaged.

Kind regards

Andreas.

- Forwarded message from Andreas Tille  -

Date: Wed, 3 Aug 2016 18:10:40 +0200
From: Andreas Tille 
To: Olivier Sallou 
Subject: Re: [ncbi-blast+] 01/01: upload upstream release 2.3.0

Hi Olivier,

the issue I was discussing on the list was about the *new* version
2.4.0.  :-)

Could you have a look into this one?

Thanks

   Andreas.

On Wed, Aug 03, 2016 at 10:49:49AM +, Olivier Sallou wrote:
> This is an automated email from the git hooks/post-receive script.
> 
> osallou pushed a commit to branch master
> in repository ncbi-blast+.
> 
> commit 81497a8867902c852248367b37d88c2ec1f0b614
> Author: Olivier Sallou 
> Date:   Wed Aug 3 11:39:55 2016 +0200
> 
> upload upstream release 2.3.0
> ---
>  debian/changelog | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/debian/changelog b/debian/changelog
> index c3be456..05d4746 100644
> --- a/debian/changelog
> +++ b/debian/changelog
> @@ -1,4 +1,4 @@
> -ncbi-blast+ (2.3.0-2) UNRELEASED; urgency=medium
> +ncbi-blast+ (2.3.0-2) unstable; urgency=medium
>  
>* Team upload.
>* Autopkgtest added 
> 
> -- 
> Alioth's /usr/local/bin/git-commit-notice on 
> /srv/git.debian.org/git/debian-med/ncbi-blast+.git
> 
> ___
> debian-med-commit mailing list
> debian-med-com...@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/debian-med-commit
> 

-- 
http://fam-tille.de

- End forwarded message -

-- 
http://fam-tille.de



Re: Status of seqan

2016-08-03 Thread Sascha Steinbiss
Hi Andreas,

> is it correct to assume that packaging seqan2 version 2.2 instead of 2.1
> is the right way to go and should I help doing so?

Well, if I could make a wish, then I would say 2.2 for sure! I have a
half-ready package for the free fast BLAST replacement Lambda [1] and
upstream states that current releases will 'work with the last official
SeqAn release' so I guess that would be 2.2. It would be great to get to
finish this ;)

Cheers
Sascha

[1] https://github.com/seqan/lambda

> On Tue, Aug 02, 2016 at 09:10:27AM +0200, Andreas Tille wrote:
>> On Thu, Jul 21, 2016 at 12:52:54PM +1000, Kevin Murray wrote:
>>>
>>> There were many complex merge conflicts between master and upstream. It was
>>> actually a lot easier to resolve than I expected. It's now ready for review.
>>> However, it would be great if someone could take a close look at the 
>>> package,
>>> particularly to ensure that the source is exactly what upstream provides 
>>> (I've
>>> tried to check this with git, and I think I got it right, but more 
>>> experienced
>>> eyes may differ).
>>
>> When trying to compare Upstream with the Git archive I stumbled upon the
>> first question:  Any reason to stick to version 2.1.0 if 2.2.0 is out?
>> May be the question is naive, but if we have trouble managing a single
>> seqan version (we failed to fix bugs for a long time) and now agreed
>> upon the need for two versions - old 1.4.2 (see my other mail) and 2.x
>> series, does the status of the Git repository mean you intend to package
>> 2.1 and 2.2 separately?
>>  
> Shall we start with a "simple" libseqan2-dev package with the latest 
> upstream
> version (2.2.0)? I'll see if I can build on Michael's work in the seqan2
> package.

 Yes, please keep it as simple as possible (but not simpler :-P ).
>>>
>>> Working on this now. There are already a couple of errors, so we'll see how 
>>> I
>>> go. I'll try to push early and often, so don't assume that the repo is in a
>>> working state :).
>>
>> No visible commit since
>>
>> commit 003f498e234ecc31229f6ba624c9d1afc6618d0d
>> Author: Kevin Murray 
>> Date:   Thu Jul 21 17:54:39 2016 +1000
>>
>> Did you pushed regularly?  Please git pull - I've fixed Vcs fields.
>>
>> Kind regards
>>
>>   Andreas.
>>
>> -- 
>> http://fam-tille.de
>>
>>
> 


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 



Re: Description for lefse tools (Was: Origin of data files in MetaPhLan2)

2016-08-03 Thread Andreas Tille
Hi Tin,

On Wed, Aug 03, 2016 at 02:01:01PM +, Duy Tin Truong wrote:
> > - Tin can also provide more info about the binary data in db_v20. The files
> > ending with "bt2" are created using a script in the Bowtie2 package
> > (bowtie2-build) using a sequence file Tin can provide (it can also be
> > recovered from the bt2 files with bowtie2-inspect if I remember well).
> As Nicola said, those files in db_v20 are created with bowtie2-build
> using a sequence file and you can recover the sequence file by:
> 
> bowtie2-inspect metaphlan2/db_v20/mpa_v20_m200 > metaphlan2/markers.fasta
> 
> If you want to rebuild them, the command is:
> 
> bowtie2-build metaphlan2/markers.fasta metaphlan2/db_v21/mpa_v21_m200

I can confirm that I can reproduce the files byte identical from
markers.fasta.  Is there any reason to ship the binary form instead of
the fasta text file?  Moreover, what is the source of the markers.fasta?
Is there any related publication or so?

> > For the mpa_v20_m200.pkl Tin can also provide the uncompressed python
> > object (or he can provide a couple of lines of code to uncompress it?)
> It is python dictionary and can be read as:
> 
> import cPickle as pickleimport bz2
> db = pickle.load(bz2.BZ2File('db_v20/mpa_v20_m200.pkl', 'r'))
>
> You can have more information about them at:
> https://bitbucket.org/biobakery/metaphlan2#markdown-header-customizing-the-database

OK, that page clarifies the method.  Just a personal remark from the
point of view of an outsider of bioinformatics:  I'd regard the creation
process of the mpa_v20_m200.pkl file a bit cumbersome.  I'd personally
prefer droping some text record somewhere and call a script processing
this record rather than writing an own script.
 
> In addition, some files were changed the names:
>- metaphlan2_strainer.py -> strainphlan.py
>- strainer_src -> strainphlan_src
>- strainer_tutorial -> strainphlan_tutorial
> 
> Some source files were updated as well.
> Please let me know if you need other information.

Just drop me a not once you might release a new version containing these
changes.  I think I'll try to release the current version as is since at
least the origin of the files is clarified now.  I'm not yet sure whether
the size of the data is acceptable or might spoil some limit.  Regarding
this I'm wondering whether I create a source tarball including rather
markers.fasta and create the bt2 files in the build process.

Kind regards

   Andreas. 

-- 
http://fam-tille.de



Re: Status of seqan

2016-08-03 Thread Andreas Tille
Hi again,

is it correct to assume that packaging seqan2 version 2.2 instead of 2.1
is the right way to go and should I help doing so?

Kind regards

  Andreas.

On Tue, Aug 02, 2016 at 09:10:27AM +0200, Andreas Tille wrote:
> On Thu, Jul 21, 2016 at 12:52:54PM +1000, Kevin Murray wrote:
> > 
> > There were many complex merge conflicts between master and upstream. It was
> > actually a lot easier to resolve than I expected. It's now ready for review.
> > However, it would be great if someone could take a close look at the 
> > package,
> > particularly to ensure that the source is exactly what upstream provides 
> > (I've
> > tried to check this with git, and I think I got it right, but more 
> > experienced
> > eyes may differ).
> 
> When trying to compare Upstream with the Git archive I stumbled upon the
> first question:  Any reason to stick to version 2.1.0 if 2.2.0 is out?
> May be the question is naive, but if we have trouble managing a single
> seqan version (we failed to fix bugs for a long time) and now agreed
> upon the need for two versions - old 1.4.2 (see my other mail) and 2.x
> series, does the status of the Git repository mean you intend to package
> 2.1 and 2.2 separately?
>  
> > > > Shall we start with a "simple" libseqan2-dev package with the latest 
> > > > upstream
> > > > version (2.2.0)? I'll see if I can build on Michael's work in the seqan2
> > > > package.
> > > 
> > > Yes, please keep it as simple as possible (but not simpler :-P ).
> > 
> > Working on this now. There are already a couple of errors, so we'll see how 
> > I
> > go. I'll try to push early and often, so don't assume that the repo is in a
> > working state :).
> 
> No visible commit since
> 
> commit 003f498e234ecc31229f6ba624c9d1afc6618d0d
> Author: Kevin Murray 
> Date:   Thu Jul 21 17:54:39 2016 +1000
> 
> Did you pushed regularly?  Please git pull - I've fixed Vcs fields.
> 
> Kind regards
> 
>   Andreas.
> 
> -- 
> http://fam-tille.de
> 
> 

-- 
http://fam-tille.de



Re: Description for lefse tools (Was: Origin of data files in MetaPhLan2)

2016-08-03 Thread Duy Tin Truong
Hi Andreas,

Thanks for you work. I answer your questions as bellow:

- some small fixes:
https://anonscm.debian.org/viewvc/debian-med/trunk/packages/metaphlan2/trunk/debian/patches/fix_sequence.patch?view=markup

-> fixed
- some spelling issues https://anonscm.debian.
org/viewvc/debian-med/trunk/packages/metaphlan2/trunk/debian/patches/spelling.patch?view=markup

- Tin can also provide more info about the binary data in db_v20. The files
ending with "bt2" are created using a script in the Bowtie2 package
(bowtie2-build) using a sequence file Tin can provide (it can also be
recovered from the bt2 files with bowtie2-inspect if I remember well).
  As Nicola said, those files in db_v20 are created with bowtie2-build
using a sequence file and you can recover the sequence file by:

bowtie2-inspect metaphlan2/db_v20/mpa_v20_m200 > metaphlan2/markers.fasta

If you want to rebuild them, the command is:

bowtie2-build metaphlan2/markers.fasta metaphlan2/db_v21/mpa_v21_m200


- For the mpa_v20_m200.pkl Tin can also provide the uncompressed python
object (or he can provide a couple of lines of code to uncompress it?)
   It is python dictionary and can be read as:

import cPickle as pickleimport bz2
db = pickle.load(bz2.BZ2File('db_v20/mpa_v20_m200.pkl', 'r'))


You can have more information about them at:
https://bitbucket.org/biobakery/metaphlan2#markdown-header-customizing-the-database

In addition, some files were changed the names:
   - metaphlan2_strainer.py -> strainphlan.py
   - strainer_src -> strainphlan_src
   - strainer_tutorial -> strainphlan_tutorial

Some source files were updated as well.
Please let me know if you need other information.

Thanks,
Tin

On Wed, Aug 3, 2016 at 3:38 PM Andreas Tille  wrote:

> Hi Nicola,
>
> thanks for your answer.
>
> On Tue, Aug 02, 2016 at 04:32:31PM +, Nicola Segata wrote:
> > Hi Andreas,
> >  sorry for the delay in replying. I did get your last two emails but it
> > seems the fist one (On Mon, Jul 25, 2016 at 09:45:57PM) never arrived.
>
> Hmmm, sad that there seems to be some mail loss.
>
> > Tin can also provide more info about the binary data in db_v20. The files
> > ending with "bt2" are created using a script in the Bowtie2 package
> > (bowtie2-build) using a sequence file Tin can provide (it can also be
> > recovered from the bt2 files with bowtie2-inspect if I remember well).
> >
> > For the mpa_v20_m200.pkl Tin can also provide the uncompressed python
> > object (or he can provide a couple of lines of code to uncompress it?)
>
> Anything that qualifies as source would be really welcome.  If the
> generation of the binary from this source does not make a big effort (in
> terms of "takes way longer than 1 hour on a decent build machine")
> generating the binaries would be really prefered.
>
> > For the LEfSe package I just added the license in the bitbucket
> repository.
> > For the description, I think you can use the following page:
> > https://bitbucket.org/biobakery/biobakery/wiki/lefse
> > Does it sound like an appropriate description for the package?
>
> I found this after I've sent my mails - thanks for confirming that this
> is the correct description.  I've just uploaded the package to the
> Debian new queue.
>
> > Let me know if you have other questions or if I missed answering to other
> > emails.
>
> If Tin will answer the binary data issue above I have no further
> questions and do not remember any unanswered e-mails.
>
> > thanks so much for your work!
>
> You are welcome
>
>   Andreas.
>
> --
> http://fam-tille.de
>


Re: Please provide source code for MALT

2016-08-03 Thread Andreas Tille
Hi Jihyeok,

thanks a lot for this effort.  Just for to let the list know:  I've
created a Wiki page for this kind of efforts to keep track since
otherwise I might forget to nag again.  Feel free to add other
liberation attempts here:

   https://wiki.debian.org/DebianMed/SoftwareLiberation

Kind regards and good luck for MALT

   Andreas.

On Wed, Aug 03, 2016 at 07:06:40PM +0900, Jihyeok Seo wrote:
> Dear Prof. Dr. Daniel Huson,
> 
> I’m writing to you on behalf of the Debian Med team which is a subgroup 
> inside Debian with the objective to package free software in the field of 
> biology and medicine. You can see the packages we worked on in the field of 
> bioinformatics on our so-called tasks list[1].
> 
> While packaging MEGAN Community Edition[2], we hit a roadblock because we 
> were unable to find the source code for MEGAN alignment tool[3]. Please 
> provide source code for MALT on a public place, so we may continue packaging 
> MEGAN.
> 
> Kind regards and thank you for choosing a free license,
> Jihyeok
> 
> [1]: https://blends.debian.org/med/tasks/bio
> [2]: https://github.com/danielhuson/megan-ce/
> [3]: http://ab.inf.uni-tuebingen.de/data/software/malt/download/welcome.html
> 

-- 
http://fam-tille.de



Re: Description for lefse tools (Was: Origin of data files in MetaPhLan2)

2016-08-03 Thread Andreas Tille
Hi Nicola,

thanks for your answer.

On Tue, Aug 02, 2016 at 04:32:31PM +, Nicola Segata wrote:
> Hi Andreas,
>  sorry for the delay in replying. I did get your last two emails but it
> seems the fist one (On Mon, Jul 25, 2016 at 09:45:57PM) never arrived.

Hmmm, sad that there seems to be some mail loss.
 
> Tin can also provide more info about the binary data in db_v20. The files
> ending with "bt2" are created using a script in the Bowtie2 package
> (bowtie2-build) using a sequence file Tin can provide (it can also be
> recovered from the bt2 files with bowtie2-inspect if I remember well).
> 
> For the mpa_v20_m200.pkl Tin can also provide the uncompressed python
> object (or he can provide a couple of lines of code to uncompress it?)

Anything that qualifies as source would be really welcome.  If the
generation of the binary from this source does not make a big effort (in
terms of "takes way longer than 1 hour on a decent build machine")
generating the binaries would be really prefered.
 
> For the LEfSe package I just added the license in the bitbucket repository.
> For the description, I think you can use the following page:
> https://bitbucket.org/biobakery/biobakery/wiki/lefse
> Does it sound like an appropriate description for the package?

I found this after I've sent my mails - thanks for confirming that this
is the correct description.  I've just uploaded the package to the
Debian new queue.
 
> Let me know if you have other questions or if I missed answering to other
> emails.

If Tin will answer the binary data issue above I have no further
questions and do not remember any unanswered e-mails.
 
> thanks so much for your work!

You are welcome

  Andreas.

-- 
http://fam-tille.de



[GSoC] ncbi-blast+ and htslib uploaded

2016-08-03 Thread Andreas Tille
Hi Canberk,

On Wed, Aug 03, 2016 at 11:40:58AM +0300, Canberk Koç wrote:
> I prepare htslib and commit.

I uploaded htslib and Olivier cared for ncbi-blast+
 
Suggested todo list

ncbi-tools6
njplot (according to manual there is an option without X11,
-> https://anonscm.debian.org/git/debian-med/njplot.git
amap-align
-> https://anonscm.debian.org/git/debian-med/amap-align.git
emboss
picard-tools

Thanks for your work on the tests

 Andreas.

-- 
http://fam-tille.de



Please provide source code for MALT

2016-08-03 Thread Jihyeok Seo
Dear Prof. Dr. Daniel Huson,

I’m writing to you on behalf of the Debian Med team which is a subgroup inside 
Debian with the objective to package free software in the field of biology and 
medicine. You can see the packages we worked on in the field of bioinformatics 
on our so-called tasks list[1].

While packaging MEGAN Community Edition[2], we hit a roadblock because we were 
unable to find the source code for MEGAN alignment tool[3]. Please provide 
source code for MALT on a public place, so we may continue packaging MEGAN.

Kind regards and thank you for choosing a free license,
Jihyeok

[1]: https://blends.debian.org/med/tasks/bio
[2]: https://github.com/danielhuson/megan-ce/
[3]: http://ab.inf.uni-tuebingen.de/data/software/malt/download/welcome.html


Re: New version of ncbi-blast+ available

2016-08-03 Thread Olivier Sallou


- Mail original -
> De: "Andreas Tille" 
> À: "Olivier Sallou" 
> Cc: debian-med@lists.debian.org
> Envoyé: Mercredi 3 Août 2016 11:25:57
> Objet: Re: New version of ncbi-blast+ available
> 
> On Wed, Aug 03, 2016 at 11:09:17AM +0200, Olivier Sallou wrote:
> > > Great.  May be I simply messed up the patches.  Could you push (or even
> > > upload - may be after running the autopkgtest)?
> > 
> > autopkgtest si not automatically ran during build ?
> 
> No, its no build time test (may be its a good idea to create such one as
> well).
> 
> > Else, how do I run it?
> 
> If you are to lazy to install debci install the package and run
> 
> sh debian/tests/run-unit-test
> 
> This is what I'm usually doing (knowing that this is a bit less than
> debci does but to my opinion sufficient to upload the package).

it works


I gonna rebuild and upload

> 
> Kind regards
> 
> Andreas.
> 
> PS: I've not seen any commits ...
> 
> --
> http://fam-tille.de
> 
> 



Re: New version of ncbi-blast+ available

2016-08-03 Thread Andreas Tille
On Wed, Aug 03, 2016 at 11:09:17AM +0200, Olivier Sallou wrote:
> > Great.  May be I simply messed up the patches.  Could you push (or even
> > upload - may be after running the autopkgtest)?
> 
> autopkgtest si not automatically ran during build ?

No, its no build time test (may be its a good idea to create such one as
well).

> Else, how do I run it?

If you are to lazy to install debci install the package and run

sh debian/tests/run-unit-test

This is what I'm usually doing (knowing that this is a bit less than
debci does but to my opinion sufficient to upload the package). 

Kind regards

Andreas.

PS: I've not seen any commits ...

-- 
http://fam-tille.de



Re: New version of ncbi-blast+ available

2016-08-03 Thread Olivier Sallou


- Mail original -
> De: "Andreas Tille" 
> À: debian-med@lists.debian.org
> Envoyé: Mercredi 3 Août 2016 11:00:51
> Objet: Re: New version of ncbi-blast+ available
> 
> On Wed, Aug 03, 2016 at 10:42:14AM +0200, Olivier Sallou wrote:
> > > I just imported the new tarball and tried to refresh the patches.  Some
> > > small adaptions were needed but I'd prefer you do these with a deeper
> > > understanding.  I became unsure whether I might have mixed up with the
> > > wrong patching.  (Currently I do not have access to the box where I did
> > > this.)
> > 
> > I made a quick build test and it worked fine for me.
> 
> Great.  May be I simply messed up the patches.  Could you push (or even
> upload - may be after running the autopkgtest)?

autopkgtest si not automatically ran during build ? Else, how do I run it?
> 
> Kind regards
> 
>   Andreas.
> 
> --
> http://fam-tille.de
> 
> 



Re: New version of ncbi-blast+ available

2016-08-03 Thread Andreas Tille
On Wed, Aug 03, 2016 at 10:42:14AM +0200, Olivier Sallou wrote:
> > I just imported the new tarball and tried to refresh the patches.  Some
> > small adaptions were needed but I'd prefer you do these with a deeper
> > understanding.  I became unsure whether I might have mixed up with the
> > wrong patching.  (Currently I do not have access to the box where I did
> > this.)
> 
> I made a quick build test and it worked fine for me.

Great.  May be I simply messed up the patches.  Could you push (or even
upload - may be after running the autopkgtest)?

Kind regards

  Andreas. 

-- 
http://fam-tille.de



Re: New version of ncbi-blast+ available

2016-08-03 Thread Olivier Sallou


- Mail original -
> De: "Andreas Tille" 
> À: debian-med@lists.debian.org
> Cc: "Aaron M. Ucko" 
> Envoyé: Mardi 2 Août 2016 22:54:16
> Objet: Re: New version of ncbi-blast+ available
> 
> Hi Aaron,
> 
> On Tue, Aug 02, 2016 at 04:03:48PM -0400, Aaron M. Ucko wrote:
> > > I gonna have a look, but this is really a build modification, I may need
> > > Aaron help, there build system is really complex.
> > 
> > I'll be happy to take a look.
> > 
> > Andreas, can I pull your preliminary changes from anywhere, or should I
> > recreate them locally?
> 
> I just imported the new tarball and tried to refresh the patches.  Some
> small adaptions were needed but I'd prefer you do these with a deeper
> understanding.  I became unsure whether I might have mixed up with the
> wrong patching.  (Currently I do not have access to the box where I did
> this.)

I made a quick build test and it worked fine for me.
>  
> Kind regards
> 
>   Andreas.
> 
> --
> http://fam-tille.de
> 
> 



Re: [GSoC] any success with ncbi-blast+?

2016-08-03 Thread Andreas Tille
On Wed, Aug 03, 2016 at 07:55:03AM +0200, Andreas Tille wrote:
> Hi Canberk,
> 
> On Wed, Aug 03, 2016 at 01:35:39AM +0300, Canberk Koç wrote:
> > 
> > I prepare amap-align but package not in git can you please move it to git.
> 
> I'll do - could you meanwhile pick another package from the list.

   ->  https://anonscm.debian.org/git/debian-med/amap-align.git
 
Good luck

  Andreas.

  
> > Canberk Koç
> > [image: https://]about.me/canberkkoc
> > 
> > 
> > 2016-08-02 13:50 GMT+03:00 Canberk Koç :
> > 
> > > Hello Andreas,
> > >
> > > I commit Blast+.
> > >
> > > Best Regards
> > >
> > >
> > >
> > > Canberk Koç
> > > [image: https://]about.me/canberkkoc
> > >
> > > 
> > >
> > > 2016-08-02 9:24 GMT+03:00 Andreas Tille :
> > >
> > >> Hi Canberk
> > >>
> > >> On Fri, Jul 29, 2016 at 09:51:06PM +0300, Canberk Koç wrote:
> > >> > I commit probcons :-) Move on blast+.
> > >>
> > >> Do you have any success with ncbi-blast+?  If not, please report about
> > >> show stoppers and commit intermediate results.
> > >>
> > >> Suggested todo list
> > >>
> > >> ncbi-tools6
> > >> ncbi-blast+
> > >> njplot (according to manual there is an option without X11,
> > >> -> https://anonscm.debian.org/git/debian-med/njplot.git
> > >> amap-align
> > >> htslib
> > >>
> > >> Kind regards
> > >>
> > >>Andreas.
> > >>
> > >> --
> > >> http://fam-tille.de
> > >>
> > >>
> > >
> 
> -- 
> http://fam-tille.de
> 
> 

-- 
http://fam-tille.de



Re: Links to Debian packages

2016-08-03 Thread Andreas Tille
Hi Jon,

On Wed, Aug 03, 2016 at 09:23:07AM +0200, ji...@cbs.dtu.dk wrote:
> Forgive the very naive question, but do you maintain a list of links to 
> packages (source, binary) currently available in
> the Debian distro ?
> 
> I want to support linking to Debian packages from named tools in bio.tools.

May be either packages.debian.org or tracker.debian.org is what you are
seeking for depending from the amount of information you want to
present.  For instance

https://packages.debian.org/bwa=1
https://tracker.debian.org/bwa
 
> If not a link, I guess I could just support package names; in this case, is 
> there a valid syntax for package names (so I
> can constrain this in our schema) ?

While there are syntactical constraints (lower case letters, numbers,
'-', '.', '+'; no upper case letters, no '_') you probably want to link
to existing packages which per definition will have a valid name.  Or am
I missing something?

Kind regards

 Andreas.

-- 
http://fam-tille.de



Links to Debian packages

2016-08-03 Thread jison
Hi folks

Forgive the very naive question, but do you maintain a list of links to 
packages (source, binary) currently available in
the Debian distro ?

I want to support linking to Debian packages from named tools in bio.tools.

If not a link, I guess I could just support package names; in this case, is 
there a valid syntax for package names (so I
can constrain this in our schema) ?

Cheers

Jon