Re: [galaxy-dev] FastQC wrapper not seeing files at gzipped

2015-01-18 Thread John Chilton
Peter has already voted and if I recall correctly Ryan cannot access
Trello - so this might be a waste to bring up - but here is a Trello
card for voting on this issue and tracking progress
https://trello.com/c/3RkTDnIn.

To summarize previous discussion - this would be fantastic to have and
Galaxy needs this - but we solve this on usegalaxy.org by using a
compressed file system - a more elegant solution when it is a
possibility - so it has never been a tier one priority for the
devteam. The only update on this is that I don't think we are using a
compressed file system anymore so this might become and issue again
someday soon.

This would be non-trivial to implement - but I have always felt this
would be a fairly fun project to work on if anyone really tight on
space locally wants to try to tackle it :).

-John

On Tue, Jan 13, 2015 at 9:54 AM, Ryan G ngsbioinformat...@gmail.com wrote:
 Agreed.

 On Mon, Jan 12, 2015 at 10:24 PM, Peter Cock p.j.a.c...@googlemail.com
 wrote:

 Hi Ryan,

 That is the workaround I am using, which means
 keeping an uncompressed copy of the FASTQ
 file on our main storage from where Galaxy can
 see it (for people to use within their histories).

 From a long term storage perspective this is not
 ideal - so I am keen for better handling of gzipped
 files within Galaxy (particularly within libraries
 which we use for raw data).

 Peter

 On Mon, Jan 12, 2015 at 5:20 PM, Ryan G ngsbioinformat...@gmail.com
 wrote:
  Yes, I'm doing a link to file on file system when doing a library
  import.
  Does this mean I should link to the the uncompressed file?
 
  On Mon, Jan 12, 2015 at 12:14 PM, Peter Cock p.j.a.c...@googlemail.com
  wrote:
 
  Ah. Then this is more subtle... are you using the
  library import option where Galaxy just symlinks
  to existing files? I thought that was not possible
  with gzipped files (for the reasons given below).
  Perhaps this is not being blocked, leading to the
  confused state you're seeing?
 
  Peter
 
  On Mon, Jan 12, 2015 at 4:52 PM, Ryan G ngsbioinformat...@gmail.com
  wrote:
   Galaxy is not decompressing the file.  The file is linked to on the
   filesystem.
  



 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   https://lists.galaxyproject.org/

 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] FastQC wrapper not seeing files at gzipped

2015-01-13 Thread Ryan G
Agreed.

On Mon, Jan 12, 2015 at 10:24 PM, Peter Cock p.j.a.c...@googlemail.com
wrote:

 Hi Ryan,

 That is the workaround I am using, which means
 keeping an uncompressed copy of the FASTQ
 file on our main storage from where Galaxy can
 see it (for people to use within their histories).

 From a long term storage perspective this is not
 ideal - so I am keen for better handling of gzipped
 files within Galaxy (particularly within libraries
 which we use for raw data).

 Peter

 On Mon, Jan 12, 2015 at 5:20 PM, Ryan G ngsbioinformat...@gmail.com
 wrote:
  Yes, I'm doing a link to file on file system when doing a library import.
  Does this mean I should link to the the uncompressed file?
 
  On Mon, Jan 12, 2015 at 12:14 PM, Peter Cock p.j.a.c...@googlemail.com
  wrote:
 
  Ah. Then this is more subtle... are you using the
  library import option where Galaxy just symlinks
  to existing files? I thought that was not possible
  with gzipped files (for the reasons given below).
  Perhaps this is not being blocked, leading to the
  confused state you're seeing?
 
  Peter
 
  On Mon, Jan 12, 2015 at 4:52 PM, Ryan G ngsbioinformat...@gmail.com
  wrote:
   Galaxy is not decompressing the file.  The file is linked to on the
   filesystem.
  

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] FastQC wrapper not seeing files at gzipped

2015-01-12 Thread Ryan G
Hi all - I've got a bunch of fatsq files uploaded into a data library in
Galaxy.  The underlying files is gzipped however Galaxy strips the .gz from
the filename and displays it as .fastq.  When the python wrapper
rgFastQC.py gets called, it correctly sees the fastq.gz file.  The wrapper
creates a symbolic link to the .gz file in a tmp directory.  The link is
.fastq.  When FastQC tries to read this file, it fails because its
compressed.  So one of two things is going wrong here:

1)  It looks like the wrapper is incorrectly renaming the file, but its
using the name given to it in Galaxy.

2)  When the file is uploaded into the data library, Galaxy is stripping
off the .gz extension.

I think #2 is the more correct problem.  How can I keep Galaxy from
stripping the .gz extension?
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] FastQC wrapper not seeing files at gzipped

2015-01-12 Thread Ryan G
To (I think) fix this, I changed line 50 in rgFastQC.py from
infname = self.opts.inputfilename

to
infname = self.opts.input

This will force FastQC to look at the real file and not the renamed
dataset.


On Mon, Jan 12, 2015 at 12:20 PM, Ryan G ngsbioinformat...@gmail.com
wrote:

 Yes, I'm doing a link to file on file system when doing a library import.
 Does this mean I should link to the the uncompressed file?

 On Mon, Jan 12, 2015 at 12:14 PM, Peter Cock p.j.a.c...@googlemail.com
 wrote:

 Ah. Then this is more subtle... are you using the
 library import option where Galaxy just symlinks
 to existing files? I thought that was not possible
 with gzipped files (for the reasons given below).
 Perhaps this is not being blocked, leading to the
 confused state you're seeing?

 Peter

 On Mon, Jan 12, 2015 at 4:52 PM, Ryan G ngsbioinformat...@gmail.com
 wrote:
  Galaxy is not decompressing the file.  The file is linked to on the
  filesystem.
 
  On Mon, Jan 12, 2015 at 10:28 AM, Peter Cock p.j.a.c...@googlemail.com
 
  wrote:
 
  Hi Ryan,
 
  The problem isn't Galaxy stripping the extension, rather
  Galaxy is actually decompressing the file as part of the
  upload process.
 
  Unfortunately (and there is an open Trello enhancement
  request on this), Galaxy does not support sorting any of
  the defined datatypes in compressed form UNLESS they
  are defined that way (like BAM files).
 
  This has lead some Galaxy Admins to define a new datatype
  lgzippedfastq (or similar - I'd have to check my old emails
  for the exact name used as a gripped alternative to the
  Galaxy sangerfastq datatype) and then modified many/all
  their tools to handle this. That is a lot of work, but does
  offer big disk savings for this key datatype.
 
  The Galaxy team instead use a compressed file system,
  so for usegalaxy.org ALL their data files are compressed
  but Galaxy can ignore this complexity.
 
  Peter
 
  On Mon, Jan 12, 2015 at 3:15 PM, Ryan G ngsbioinformat...@gmail.com
  wrote:
   Hi all - I've got a bunch of fatsq files uploaded into a data
 library in
   Galaxy.  The underlying files is gzipped however Galaxy strips the
 .gz
   from
   the filename and displays it as .fastq.  When the python wrapper
   rgFastQC.py
   gets called, it correctly sees the fastq.gz file.  The wrapper
 creates a
   symbolic link to the .gz file in a tmp directory.  The link is
 .fastq.
   When
   FastQC tries to read this file, it fails because its compressed.  So
 one
   of
   two things is going wrong here:
  
   1)  It looks like the wrapper is incorrectly renaming the file, but
 its
   using the name given to it in Galaxy.
  
   2)  When the file is uploaded into the data library, Galaxy is
 stripping
   off
   the .gz extension.
  
   I think #2 is the more correct problem.  How can I keep Galaxy from
   stripping the .gz extension?
  
   ___
   Please keep all replies on the list by using reply all
   in your mail client.  To manage your subscriptions to this
   and other Galaxy lists, please use the interface at:
 https://lists.galaxyproject.org/
  
   To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/
 
 



___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] FastQC wrapper not seeing files at gzipped

2015-01-12 Thread Peter Cock
Hi Ryan,

That is the workaround I am using, which means
keeping an uncompressed copy of the FASTQ
file on our main storage from where Galaxy can
see it (for people to use within their histories).

From a long term storage perspective this is not
ideal - so I am keen for better handling of gzipped
files within Galaxy (particularly within libraries
which we use for raw data).

Peter

On Mon, Jan 12, 2015 at 5:20 PM, Ryan G ngsbioinformat...@gmail.com wrote:
 Yes, I'm doing a link to file on file system when doing a library import.
 Does this mean I should link to the the uncompressed file?

 On Mon, Jan 12, 2015 at 12:14 PM, Peter Cock p.j.a.c...@googlemail.com
 wrote:

 Ah. Then this is more subtle... are you using the
 library import option where Galaxy just symlinks
 to existing files? I thought that was not possible
 with gzipped files (for the reasons given below).
 Perhaps this is not being blocked, leading to the
 confused state you're seeing?

 Peter

 On Mon, Jan 12, 2015 at 4:52 PM, Ryan G ngsbioinformat...@gmail.com
 wrote:
  Galaxy is not decompressing the file.  The file is linked to on the
  filesystem.
 
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] FastQC wrapper not seeing files at gzipped

2015-01-12 Thread Ryan G
Galaxy is not decompressing the file.  The file is linked to on the
filesystem.

On Mon, Jan 12, 2015 at 10:28 AM, Peter Cock p.j.a.c...@googlemail.com
wrote:

 Hi Ryan,

 The problem isn't Galaxy stripping the extension, rather
 Galaxy is actually decompressing the file as part of the
 upload process.

 Unfortunately (and there is an open Trello enhancement
 request on this), Galaxy does not support sorting any of
 the defined datatypes in compressed form UNLESS they
 are defined that way (like BAM files).

 This has lead some Galaxy Admins to define a new datatype
 lgzippedfastq (or similar - I'd have to check my old emails
 for the exact name used as a gripped alternative to the
 Galaxy sangerfastq datatype) and then modified many/all
 their tools to handle this. That is a lot of work, but does
 offer big disk savings for this key datatype.

 The Galaxy team instead use a compressed file system,
 so for usegalaxy.org ALL their data files are compressed
 but Galaxy can ignore this complexity.

 Peter

 On Mon, Jan 12, 2015 at 3:15 PM, Ryan G ngsbioinformat...@gmail.com
 wrote:
  Hi all - I've got a bunch of fatsq files uploaded into a data library in
  Galaxy.  The underlying files is gzipped however Galaxy strips the .gz
 from
  the filename and displays it as .fastq.  When the python wrapper
 rgFastQC.py
  gets called, it correctly sees the fastq.gz file.  The wrapper creates a
  symbolic link to the .gz file in a tmp directory.  The link is .fastq.
 When
  FastQC tries to read this file, it fails because its compressed.  So one
 of
  two things is going wrong here:
 
  1)  It looks like the wrapper is incorrectly renaming the file, but its
  using the name given to it in Galaxy.
 
  2)  When the file is uploaded into the data library, Galaxy is stripping
 off
  the .gz extension.
 
  I think #2 is the more correct problem.  How can I keep Galaxy from
  stripping the .gz extension?
 
  ___
  Please keep all replies on the list by using reply all
  in your mail client.  To manage your subscriptions to this
  and other Galaxy lists, please use the interface at:
https://lists.galaxyproject.org/
 
  To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] FastQC wrapper not seeing files at gzipped

2015-01-12 Thread Peter Cock
Ah. Then this is more subtle... are you using the
library import option where Galaxy just symlinks
to existing files? I thought that was not possible
with gzipped files (for the reasons given below).
Perhaps this is not being blocked, leading to the
confused state you're seeing?

Peter

On Mon, Jan 12, 2015 at 4:52 PM, Ryan G ngsbioinformat...@gmail.com wrote:
 Galaxy is not decompressing the file.  The file is linked to on the
 filesystem.

 On Mon, Jan 12, 2015 at 10:28 AM, Peter Cock p.j.a.c...@googlemail.com
 wrote:

 Hi Ryan,

 The problem isn't Galaxy stripping the extension, rather
 Galaxy is actually decompressing the file as part of the
 upload process.

 Unfortunately (and there is an open Trello enhancement
 request on this), Galaxy does not support sorting any of
 the defined datatypes in compressed form UNLESS they
 are defined that way (like BAM files).

 This has lead some Galaxy Admins to define a new datatype
 lgzippedfastq (or similar - I'd have to check my old emails
 for the exact name used as a gripped alternative to the
 Galaxy sangerfastq datatype) and then modified many/all
 their tools to handle this. That is a lot of work, but does
 offer big disk savings for this key datatype.

 The Galaxy team instead use a compressed file system,
 so for usegalaxy.org ALL their data files are compressed
 but Galaxy can ignore this complexity.

 Peter

 On Mon, Jan 12, 2015 at 3:15 PM, Ryan G ngsbioinformat...@gmail.com
 wrote:
  Hi all - I've got a bunch of fatsq files uploaded into a data library in
  Galaxy.  The underlying files is gzipped however Galaxy strips the .gz
  from
  the filename and displays it as .fastq.  When the python wrapper
  rgFastQC.py
  gets called, it correctly sees the fastq.gz file.  The wrapper creates a
  symbolic link to the .gz file in a tmp directory.  The link is .fastq.
  When
  FastQC tries to read this file, it fails because its compressed.  So one
  of
  two things is going wrong here:
 
  1)  It looks like the wrapper is incorrectly renaming the file, but its
  using the name given to it in Galaxy.
 
  2)  When the file is uploaded into the data library, Galaxy is stripping
  off
  the .gz extension.
 
  I think #2 is the more correct problem.  How can I keep Galaxy from
  stripping the .gz extension?
 
  ___
  Please keep all replies on the list by using reply all
  in your mail client.  To manage your subscriptions to this
  and other Galaxy lists, please use the interface at:
https://lists.galaxyproject.org/
 
  To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] FastQC wrapper not seeing files at gzipped

2015-01-12 Thread Ryan G
Yes, I'm doing a link to file on file system when doing a library import.
Does this mean I should link to the the uncompressed file?

On Mon, Jan 12, 2015 at 12:14 PM, Peter Cock p.j.a.c...@googlemail.com
wrote:

 Ah. Then this is more subtle... are you using the
 library import option where Galaxy just symlinks
 to existing files? I thought that was not possible
 with gzipped files (for the reasons given below).
 Perhaps this is not being blocked, leading to the
 confused state you're seeing?

 Peter

 On Mon, Jan 12, 2015 at 4:52 PM, Ryan G ngsbioinformat...@gmail.com
 wrote:
  Galaxy is not decompressing the file.  The file is linked to on the
  filesystem.
 
  On Mon, Jan 12, 2015 at 10:28 AM, Peter Cock p.j.a.c...@googlemail.com
  wrote:
 
  Hi Ryan,
 
  The problem isn't Galaxy stripping the extension, rather
  Galaxy is actually decompressing the file as part of the
  upload process.
 
  Unfortunately (and there is an open Trello enhancement
  request on this), Galaxy does not support sorting any of
  the defined datatypes in compressed form UNLESS they
  are defined that way (like BAM files).
 
  This has lead some Galaxy Admins to define a new datatype
  lgzippedfastq (or similar - I'd have to check my old emails
  for the exact name used as a gripped alternative to the
  Galaxy sangerfastq datatype) and then modified many/all
  their tools to handle this. That is a lot of work, but does
  offer big disk savings for this key datatype.
 
  The Galaxy team instead use a compressed file system,
  so for usegalaxy.org ALL their data files are compressed
  but Galaxy can ignore this complexity.
 
  Peter
 
  On Mon, Jan 12, 2015 at 3:15 PM, Ryan G ngsbioinformat...@gmail.com
  wrote:
   Hi all - I've got a bunch of fatsq files uploaded into a data library
 in
   Galaxy.  The underlying files is gzipped however Galaxy strips the .gz
   from
   the filename and displays it as .fastq.  When the python wrapper
   rgFastQC.py
   gets called, it correctly sees the fastq.gz file.  The wrapper
 creates a
   symbolic link to the .gz file in a tmp directory.  The link is .fastq.
   When
   FastQC tries to read this file, it fails because its compressed.  So
 one
   of
   two things is going wrong here:
  
   1)  It looks like the wrapper is incorrectly renaming the file, but
 its
   using the name given to it in Galaxy.
  
   2)  When the file is uploaded into the data library, Galaxy is
 stripping
   off
   the .gz extension.
  
   I think #2 is the more correct problem.  How can I keep Galaxy from
   stripping the .gz extension?
  
   ___
   Please keep all replies on the list by using reply all
   in your mail client.  To manage your subscriptions to this
   and other Galaxy lists, please use the interface at:
 https://lists.galaxyproject.org/
  
   To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/
 
 

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/