I'm looking at extending the metadata fields for one of the supported file
types. The files can get VERY large, and since I'm creating those files, I'd
like to save as metadata some of the information I have on the contents.
Specifically, I'd like to tag the files with information about the
-Original Message-
From: Nate Coraor [mailto:n...@bx.psu.edu]
Sent: Monday, May 09, 2011 8:27 AM
To: Duddy, John
Cc: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] Unable to set metadata in API call
Duddy, John wrote:
I need to be able to set some metadata in some custom data types. For now
Doesn't this violate one of the basic tenets of Galaxy - reproducibility?
Without the ability to provide full traceability to the inputs, one can make no
guarantees about the outputs.
John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel:
There is a C program for merging Gzip files (gzjoin) that I'd love to rely on
for a core Galaxy capability. Is there a standard way to get things like this
included in Galaxy? Recoding it in Python would be a bit of a pain, and might
be a lot slower due to the IO layer not allowing the reuse of
that happens on first start, such as copying *example files.
Would it be far-fetched to compile the program at that stage?
From: James Taylor [ja...@jamestaylor.org]
Sent: Tuesday, May 24, 2011 11:51 PM
To: Duddy, John
Cc: galaxy-...@bx.psu.edu Dev
: 858-736-3584
E-mail: jdu...@illumina.com
-Original Message-
From: Nate Coraor [mailto:n...@bx.psu.edu]
Sent: Wednesday, May 25, 2011 9:24 AM
To: Duddy, John
Cc: James Taylor; galaxy-...@bx.psu.edu Dev
Subject: Re: [galaxy-dev] Getting binary programs into Galaxy distribution?
Duddy, John
We'd like to be able to associate fixed things (project, Sample, sequencer
used) with user's FASTQ files, and we'd also like to allow users to associate
dynamic, site-specific stuff with the sequencing run. Currently, users track
their runs using a CSV sample sheet, and often they add columns
:38 AM
To: Duddy, John
Cc: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] Is dynamic associated information per dataset
possible?
Duddy, John wrote:
We'd like to be able to associate fixed things (project, Sample, sequencer
used) with user's FASTQ files, and we'd also like to allow users
Message-
From: Greg Von Kuster [mailto:g...@bx.psu.edu]
Sent: Thursday, May 26, 2011 2:18 AM
To: Duddy, John
Cc: galaxy-dev
Subject: Re: [galaxy-dev] Is dynamic associated information per dataset
possible?
In addition to Data Library templates, which are useful after the sequencer
data has
I have my data in a data library and have a form template defined so I can
enter the sample information.
So, I import a data file into a history and want to run a tool on it. Can I
pass the values of those form templates to my tool? Sort of line
${input.form_field_id} ?
Thanks!
John Duddy
I'd like to have Galaxy and another application installed on the same Apache
server and have the user authenticate only once. I think I understand how to do
that by deferring authentication to Apache (instead of using Galaxy's built-in
database). So far, so good, I think.
What I'm wondering is
I can give you a very good example - if you are doing alignment and for some
reason need to convert the input file before operating on them, such that you
need a complete copy, /tmp may not have enough room. I have had this happen to
me running lots of instances of an aligner, temporarily using
[mailto:ja...@jamestaylor.org]
Sent: Tuesday, August 09, 2011 5:41 PM
To: Duddy, John
Cc: galaxy-dev
Subject: Re: [galaxy-dev] Customizing/reusing the workflows/run.mako template
John, the prefixes like 22| are added to the inputs associated with each
step, so that they can be separated back out
This is probably something only the Galaxy devs can answer, but I thought I'd
give it a shot in the wider community. Some of you are doing some very
complicated stuff.
If you have a workflow with several input blocks, you might have multiple fastq
files you need to provide. A good example of
While snooping around the Galaxy code, I noticed that some tool features are
not supported in workflows, only in histories. Is there a list somewhere that
lists the restrictions?
Specifically, are multiple inputs supported?
John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne
I'd like to have a datatype with a dict as metadata. This dict() would store
file offsets to enable seeking around to process different sections of the file.
How do I add a dictionary data metadata element?
John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San
AM
To: Duddy, John
Cc: galaxy-dev
Subject: Re: [galaxy-dev] Storing a dict as metadata
Hey John, are you sure you don't want to use a converted dataset rather than
a metadata element for this. This is how we handle most types of secondary
indexes for visualization.
If you do it this way
to
move the file pointer so each task can grab it's part.
On Tue, Aug 2, 2011 at 10:54 AM, Duddy, John jdu...@illumina.com wrote:
I did something similar, but implemented as an evolution of the original
basic parallelism (see BWA), that:
- Moved the splitting of input files into the datatype classes
The BWA tool in NGS mapping does just what you want, just for inputs. The
general idea is to use a conditional element and define your extra output in
a when block.
John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail:
We routinely put large compressed fastq files into data libraries by that
method (linking, no copy) and it is very fast, since the patch that stopped it
decompressing the files.
You should probably make sure you specify the file format (fastqsanger) so
Galaxy does not attempt to sniff the file
There are several files in Datatypes with doctest tests in them. Is there a
convenient wrapper script to run them all?
John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.commailto:jdu...@illumina.com
Can we introduce new file types via tools in the tool shed? It seems Galaxy can
load them if they are in the datatypes configuration file. Does tool
installation automate the editing of that file?
John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
a datatype.
John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.commailto:jdu...@illumina.com
From: Greg Von Kuster [mailto:g...@bx.psu.edu]
Sent: Wednesday, October 05, 2011 6:25 PM
To: Duddy, John
Cc: galaxy
. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com
-Original Message-
From: Peter Cock [mailto:p.j.a.c...@googlemail.com]
Sent: Thursday, October 06, 2011 1:28 AM
To: Duddy, John
Cc: Greg Von Kuster; galaxy-dev
.
John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com
-Original Message-
From: Peter Cock [mailto:p.j.a.c...@googlemail.com]
Sent: Thursday, October 06, 2011 8:16 AM
To: Duddy, John
Cc: Greg Von
additional thought and feedback, as doing so is
always helpful
Thanks!
Greg
On Oct 5, 2011, at 11:48 PM, Duddy, John wrote:
One of the things we're facing is the sheer size of a whole human genome at
30x coverage. An effective way to deal with that is by compressing the FASTQ
files
You mention that you moved it to an NFS volume - but it seems you also moved to
a grid configuration using PBS?
If that's the case, what you are seeing might be an issue with NFS attribute
caching or write caching, which causes files created from one machine to not
appear until some time later
From: Chorny, Ilya
Sent: Wednesday, November 02, 2011 11:50 AM
To: Duddy, John
Cc: Nate Coraor (n...@bx.psu.edu); galaxy-dev@lists.bx.psu.edu
Subject: Looks like actual user breaks splitting
Hey John,
Any thoughts?
Ilya
Traceback (most recent call last):
File /home/galaxy/ichorny/galaxy-central
, November 02, 2011 12:24 PM
To: Duddy, John
Cc: Chorny, Ilya; galaxy-dev@lists.bx.psu.edu
Subject: Re: Looks like actual user breaks splitting
John, Ilya,
I get further with sequence type inputs but it looks like
JobWrapper.get_output_datasets_and_fnames() is not returning the right
thing when
To: Duddy, John
Cc: Chorny, Ilya; galaxy-dev@lists.bx.psu.edu
Subject: Re: Looks like actual user breaks splitting
Hi John,
It looks like the first issue is related to the change from
get_output_fnames() - compute_outputs(). When
outputs_to_working_directory = False (default) this method
stores
...@googlemail.com]
Sent: Tuesday, November 08, 2011 3:29 PM
To: Duddy, John
Cc: Greg Von Kuster; galaxy-dev@lists.bx.psu.edu; Nate Coraor
Subject: Re: [galaxy-dev] Tool shed and datatypes
On Thu, Oct 6, 2011 at 5:45 PM, Duddy, John jdu...@illumina.com wrote:
GZIP files are definitely our plan. I just
-
From: Peter Cock [mailto:p.j.a.c...@googlemail.com]
Sent: Tuesday, November 08, 2011 3:29 PM
To: Duddy, John
Cc: Greg Von Kuster; galaxy-dev@lists.bx.psu.edu; Nate Coraor
Subject: Re: [galaxy-dev] Tool shed and datatypes
On Thu, Oct 6, 2011 at 5:45 PM, Duddy, John jdu...@illumina.com wrote:
GZIP
Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com
-Original Message-
From: Peter Cock [mailto:p.j.a.c...@googlemail.com]
Sent: Tuesday, November 08, 2011 4:04 PM
To: Duddy, John
Cc: Greg Von Kuster; galaxy-dev
33 matches
Mail list logo