Re: [galaxy-user] Changing Bowtie parameters in TopHat

2011-11-17 Thread Jeremy Goecks
> Thanks for your help. I'm mapping reads from one organism to a related but 
> different organism, so some of the parameters I'd like to adjust are to relax 
> mapping stringency -specifically:
> 
> -n 3 (allow 3 mismatches in seed)
> -e 250 (allow cummulative phred score of 250 [or some other value depending 
> on read length] for mismatches in remainder of read)
> 
> I'd also like to only report alignments that are unambiguously mapped to a 
> single location, so:
> 
> -m 1
> --best on
> --strata on
> 
> It sounds like I need to read the documentation again, but it didn't look at 
> first glance like I could specify these things.

Yes, reading the documentation is highly recommended. 

You can definitely specify -m, but you may have to think creatively about how 
to modify Tophat's available parameters to meet your needs. You might also 
contact the Tophat authors directly and see if they have any suggestions: 
tophat.cuffli...@gmail.com

Good luck,
J.


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Using Galaxy Cloudman for a workshop

2011-11-17 Thread Jeremy Goecks

> Jeremy (from the team) ran a similar workshop several months ago and used 
> some resource intensive tools (e.g., Tophat). We were concerned about the 
> same scalability issues so we just started 4 separate clusters and divided 
> the users across those. The approach worked well and it turned out we did not 
> see any scalability issues. I think we went way overboard with 4 clusters but 
> the approach did demonstrate an additional 'coolness' of the project allowing 
> one to spin up 4 complete, identical clusters in a matter of minutes...
> So, I feel you could replicate a similar approach but could probably go with 
> 2 clusters only? Jeremy can hopefully provide some first hand comments as 
> well.

Yes, this approach worked well when I did it. I created all the data and 
workflows needed for the workshop on one Galaxy instance, shared/cloned that 
instance and set up 3 additional instances, and divided users evenly amongst 
the instances. 2-3 clusters will probably meet your needs with 50 participants.

Scalability issues are more likely to arise on the back end than the front end, 
so you'll want to ensure that you have enough compute nodes. BWA uses four 
nodes by default--Enis, does the cloud config change this parameter?--so you'll 
want 4x50 or 200 total nodes if you want everyone to be able to run a BWA job 
simultaneously.

Good luck,
J. 


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] GATK

2011-11-17 Thread Jennifer Jackson

Hi Metge,

Thank you for your kind comments!

The GATK tool wrappers are still in an early beta stage of development. 
Because of this, we are not ready to support set-up in local instances yet.


If you have functionality input for the version on our Test server, 
general feedback is welcomed on the galaxy-...@bx.psu.edu mailing list.


Take care,

Jen
Galaxy team

On 11/16/11 3:29 PM, Metge, Franziska wrote:


Dear happy users of Galaxy,

We are running Galaxy locally. It's a very fine tool! By now everything
works fine, except when I try to run any GATK program. I usually get
this error message:


# ERROR
--
# ERROR A USER ERROR has occurred (version 1.3-14-g348f2db):
# ERROR The invalid arguments or inputs must be corrected before the
GATK can proceed
# ERROR Please do not post this error to the GATK forum
# ERROR
# ERROR See the documentation (rerun with -h) for this tool to view
allowable command-line arguments.
# ERROR Visit our wiki for extensive documentation
http://www.broadinstitute.org/gsa/wiki
# ERROR Visit our forum to view answers to commonly asked questions
http://getsatisfaction.com/gsa
# ERROR
# ERROR MESSAGE: The fasta file you specified (/tmp/tmpp0oxJu/hg19)
does not exist.
# ERROR
--

my picard_index.loc line for the hg19 reference looks like this:


hg19 hg19 hg19 hg19 /drive1/galaxy/reference/hg19/sam_index/hg19_ref.fa gatk


also the bam file I am submitting to GATK has the reference genome
specified in it's attributes.

Could please anyone help me.
Thank you
Franzi



___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/


--
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org/wiki/Support
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/


[galaxy-user] Tools appearing without inputs in workflow editor

2011-11-17 Thread Paul-Michael Agapow
A peculiar bug, presented for your input:

 

A colleague has built a few tools and is trying to incorporate them into
a workflow. However:

1.   When he creates an new empty workflow, edits it and adds the
tools, they appear without inputs (i.e. the icons/boxes on the diagram
have no input "sockets")

2.   When he tries to save a history using these tools as a
workflow, a error results in which Galaxy complains about an
unrecognised name, being one of the tool inputs. Depending on which tool
he includes in the workflow, the missing name occurs in whichever tool
is earlier in the workflow. (Traceback available on request)

 

Puzzling. And even more puzzling when I tried to debug this:

1.   Other tools (not written by him) behave in the workflows as
expected.

2.   Pulling a copy of his config file across to my own instance of
Galaxy and creating a dummy tool to house it, it worked perfectly 

3.   We're running the same version of Galaxy, albeit different
versions of Python (2.5 vs 2.6.4)

4.   The tools work fine in normal analysis mode.

5.   Line-ending unix, XML seems to parse fine. 

 

I'm stymied here and unsure what to look at next. Ideas?

 



Paul Agapow (paul-michael.aga...@hpa.org.uk)

Bioinformatics, Health Protection Agency

 



-
**
The information contained in the EMail and any attachments is
confidential and intended solely and for the attention and use of
the named addressee(s). It may not be disclosed to any other person
without the express authority of the HPA, or the intended
recipient, or both. If you are not the intended recipient, you must
not disclose, copy, distribute or retain this message or any part
of it. This footnote also confirms that this EMail has been swept
for computer viruses, but please re-sweep any attachments before
opening or saving. HTTP://www.HPA.org.uk
**___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] GATK

2011-11-17 Thread Daniel Blankenberg
Hi Franzi,

You have one too many hg19 in there. The fields go like:
 

so:
hg19hg19hg19 /drive1/galaxy/reference/hg19/sam_index/hg19_ref.fa
gatk

But do note that these tool integrations are still undergoing active 
development. Please report bugs if you encounter any.


Thanks for using Galaxy,

Dan


On Nov 16, 2011, at 6:29 PM, Metge, Franziska wrote:

> 
> Dear happy users of Galaxy,
> 
> We are running Galaxy locally. It's a very fine tool! By now everything works 
> fine, except when I try to run any GATK program. I usually get this error 
> message:
> 
> 
> # ERROR 
> --
> # ERROR A USER ERROR has occurred (version 1.3-14-g348f2db):
> # ERROR The invalid arguments or inputs must be corrected before the GATK 
> can proceed
> # ERROR Please do not post this error to the GATK forum
> # ERROR
> # ERROR See the documentation (rerun with -h) for this tool to view 
> allowable command-line arguments.
> # ERROR Visit our wiki for extensive documentation 
> http://www.broadinstitute.org/gsa/wiki
> # ERROR Visit our forum to view answers to commonly asked questions 
> http://getsatisfaction.com/gsa
> # ERROR
> # ERROR MESSAGE: The fasta file you specified (/tmp/tmpp0oxJu/hg19) does 
> not exist.
> # ERROR 
> --
> 
> my picard_index.loc line for the hg19 reference looks like this:
> 
> 
> hg19hg19hg19hg19
> /drive1/galaxy/reference/hg19/sam_index/hg19_ref.fa gatk
> 
> 
> also the bam file I am submitting to GATK has the reference genome specified 
> in it's attributes.
> 
> Could please anyone help me.
> Thank you
> Franzi
> 
> ___
> The Galaxy User list should be used for the discussion of
> Galaxy analysis and other features on the public server
> at usegalaxy.org.  Please keep all replies on the list by
> using "reply all" in your mail client.  For discussion of
> local Galaxy instances and the Galaxy source code, please
> use the Galaxy Development list:
> 
>  http://lists.bx.psu.edu/listinfo/galaxy-dev
> 
> To manage your subscriptions to this and other Galaxy lists,
> please use the interface at:
> 
>  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Using Galaxy Cloudman for a workshop

2011-11-17 Thread Clare Sloggett
Hi Enis, Jeremy,

Thanks, that's really helpful! Jeremy do you remember what kind of
instance you used per node, e.g. Amazon's 'large' (2 core / 4ECU /
7.5GB) or 'xlarge' (4 core / 8ECU / 15GB)? This would be a good sanity
check for me.

Enis when you say a lot of memory, would you consider the 15GB nodes
'a lot'? I would generally consider that a lot in the context of a web
server but not so much in NGS, so it's all relative! Amazon does have
double-memory instances available.

Also, when you say a lot of memory especially for the master node -
does this imply that I can choose a different specification for the
master node than the compute nodes? I thought they had to be all
identical, but just checking.

Thanks,
Clare

On Fri, Nov 18, 2011 at 12:56 AM, Jeremy Goecks  wrote:
>
>> Jeremy (from the team) ran a similar workshop several months ago and used 
>> some resource intensive tools (e.g., Tophat). We were concerned about the 
>> same scalability issues so we just started 4 separate clusters and divided 
>> the users across those. The approach worked well and it turned out we did 
>> not see any scalability issues. I think we went way overboard with 4 
>> clusters but the approach did demonstrate an additional 'coolness' of the 
>> project allowing one to spin up 4 complete, identical clusters in a matter 
>> of minutes...
>> So, I feel you could replicate a similar approach but could probably go with 
>> 2 clusters only? Jeremy can hopefully provide some first hand comments as 
>> well.
>
> Yes, this approach worked well when I did it. I created all the data and 
> workflows needed for the workshop on one Galaxy instance, shared/cloned that 
> instance and set up 3 additional instances, and divided users evenly amongst 
> the instances. 2-3 clusters will probably meet your needs with 50 
> participants.
>
> Scalability issues are more likely to arise on the back end than the front 
> end, so you'll want to ensure that you have enough compute nodes. BWA uses 
> four nodes by default--Enis, does the cloud config change this parameter?--so 
> you'll want 4x50 or 200 total nodes if you want everyone to be able to run a 
> BWA job simultaneously.
>
> Good luck,
> J.
>
>
>
>



-- 
E: s...@unimelb.edu.au
P: 03 903 53357
M: 0414 854 759

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Using Galaxy Cloudman for a workshop

2011-11-17 Thread Ravi Madduri
Clare
I wonder if you would be open to share the details of your setup and what you 
had to do for this once you are done. I think it would be incredibly useful to 
community (at least it would be for me ;-)) 

Thanks!
On Nov 17, 2011, at 5:14 PM, Clare Sloggett wrote:

> Hi Enis, Jeremy,
> 
> Thanks, that's really helpful! Jeremy do you remember what kind of
> instance you used per node, e.g. Amazon's 'large' (2 core / 4ECU /
> 7.5GB) or 'xlarge' (4 core / 8ECU / 15GB)? This would be a good sanity
> check for me.
> 
> Enis when you say a lot of memory, would you consider the 15GB nodes
> 'a lot'? I would generally consider that a lot in the context of a web
> server but not so much in NGS, so it's all relative! Amazon does have
> double-memory instances available.
> 
> Also, when you say a lot of memory especially for the master node -
> does this imply that I can choose a different specification for the
> master node than the compute nodes? I thought they had to be all
> identical, but just checking.
> 
> Thanks,
> Clare
> 
> On Fri, Nov 18, 2011 at 12:56 AM, Jeremy Goecks  
> wrote:
>> 
>>> Jeremy (from the team) ran a similar workshop several months ago and used 
>>> some resource intensive tools (e.g., Tophat). We were concerned about the 
>>> same scalability issues so we just started 4 separate clusters and divided 
>>> the users across those. The approach worked well and it turned out we did 
>>> not see any scalability issues. I think we went way overboard with 4 
>>> clusters but the approach did demonstrate an additional 'coolness' of the 
>>> project allowing one to spin up 4 complete, identical clusters in a matter 
>>> of minutes...
>>> So, I feel you could replicate a similar approach but could probably go 
>>> with 2 clusters only? Jeremy can hopefully provide some first hand comments 
>>> as well.
>> 
>> Yes, this approach worked well when I did it. I created all the data and 
>> workflows needed for the workshop on one Galaxy instance, shared/cloned that 
>> instance and set up 3 additional instances, and divided users evenly amongst 
>> the instances. 2-3 clusters will probably meet your needs with 50 
>> participants.
>> 
>> Scalability issues are more likely to arise on the back end than the front 
>> end, so you'll want to ensure that you have enough compute nodes. BWA uses 
>> four nodes by default--Enis, does the cloud config change this 
>> parameter?--so you'll want 4x50 or 200 total nodes if you want everyone to 
>> be able to run a BWA job simultaneously.
>> 
>> Good luck,
>> J.
>> 
>> 
>> 
>> 
> 
> 
> 
> -- 
> E: s...@unimelb.edu.au
> P: 03 903 53357
> M: 0414 854 759
> 
> ___
> The Galaxy User list should be used for the discussion of
> Galaxy analysis and other features on the public server
> at usegalaxy.org.  Please keep all replies on the list by
> using "reply all" in your mail client.  For discussion of
> local Galaxy instances and the Galaxy source code, please
> use the Galaxy Development list:
> 
>  http://lists.bx.psu.edu/listinfo/galaxy-dev
> 
> To manage your subscriptions to this and other Galaxy lists,
> please use the interface at:
> 
>  http://lists.bx.psu.edu/

Ravi Madduri
madd...@mcs.anl.gov


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Using Galaxy Cloudman for a workshop

2011-11-17 Thread Clare Sloggett
On Fri, Nov 18, 2011 at 12:56 AM, Jeremy Goecks  wrote:

> Scalability issues are more likely to arise on the back end than the front 
> end, so you'll want to ensure that you have enough compute nodes. BWA uses 
> four nodes by default--Enis, does the cloud config change this parameter?--so 
> you'll want 4x50 or 200 total nodes if you want everyone to be able to run a 
> BWA job simultaneously.
>

Actually, one other question - this paragraph makes me realise that I
don't really understand how Galaxy is distributing jobs. I had thought
that each job would only use one node, and in some cases take
advantage of multiple cores within that node. I'm taking a "node" to
be a set of cores with their own shared memory, so in this case a VM
instance, is this right? If some types of jobs can be distributed over
multiple nodes, can I configure, in Galaxy, how many nodes they should
use?

Thanks again,
Clare

-- 
E: s...@unimelb.edu.au
P: 03 903 53357
M: 0414 854 759

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Using Galaxy Cloudman for a workshop

2011-11-17 Thread Jeremy Goecks

> On Fri, Nov 18, 2011 at 12:56 AM, Jeremy Goecks  
> wrote:
> 
>> Scalability issues are more likely to arise on the back end than the front 
>> end, so you'll want to ensure that you have enough compute nodes. BWA uses 
>> four nodes by default--Enis, does the cloud config change this 
>> parameter?--so you'll want 4x50 or 200 total nodes if you want everyone to 
>> be able to run a BWA job simultaneously.
>> 
> 
> Actually, one other question - this paragraph makes me realise that I
> don't really understand how Galaxy is distributing jobs. I had thought
> that each job would only use one node, and in some cases take
> advantage of multiple cores within that node. I'm taking a "node" to
> be a set of cores with their own shared memory, so in this case a VM
> instance, is this right? If some types of jobs can be distributed over
> multiple nodes, can I configure, in Galaxy, how many nodes they should
> use?

You're right -- my word choices were poor. Replace 'node' with 'core' in my 
paragraph to get an accurate suggestion for resources.

Galaxy uses a job scheduler--SGE on the cloud--to distribute jobs to different 
cluster nodes. Jobs that require multiple cores typically run on a single node. 
Enis can chime in on whether CloudMan supports job submission over multiple 
nodes; this would require setup of an appropriate parallel environment and a 
tool that can make use of this environment.

Good luck,
J.


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/