Re: [galaxy-user] Nucleotide analysis - GC percentage

2011-04-15 Thread Douglas Allan
please remove me from mailing list - thanks

On 14-04-2011, at 9:26 AM, Guru Ananda wrote:

> Thanks for pointing this out, Brad. Both geecee and infoseq are in fact 
> available on Galaxy under EMBOSS section.
> 
> Guru.
> 
> On Thu, Apr 14, 2011 at 12:13 PM, Brad Chapman  wrote:
> Peter and Guru;
> 
> [Computing GC]
> 
> > I'll be working with simple sequence files (FASTA, or even FASTQ,
> > SFF, etc) rather than BED files, but I'll keep that in mind.
> 
> Emboss has some utilities that do this. infoseq and geecee, and
> there are also programs for exploring CpG islands:
> 
> http://emboss.sourceforge.net/apps/release/6.3/emboss/apps/nucleic_cpg_islands_group.html
> 
> Brad
> 
> 
> 
> -- 
> Graduate student, Bioinformatics and Genomics
> Makova lab/Galaxy team
> 505 Wartik lab
> University Park PA 16802
> g...@psu.edu
> 
> ___
> The Galaxy User list should be used for the discussion of
> Galaxy analysis and other features on the public server
> at usegalaxy.org.  Please keep all replies on the list by
> using "reply all" in your mail client.  For discussion of
> local Galaxy instances and the Galaxy source code, please
> use the Galaxy Development list:
> 
>  http://lists.bx.psu.edu/listinfo/galaxy-dev
> 
> To manage your subscriptions to this and other Galaxy lists,
> please use the interface at:
> 
>  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Nucleotide analysis - GC percentage

2011-04-15 Thread Peter Cock
On Thu, Apr 14, 2011 at 6:25 PM, Jeremy Goecks  wrote:
>> Now why does a tool search on the public Galaxy instance for GC
>> not suggest this tool?
>>
>> Name: geecee
>> Description: Calculates fractional GC content of nucleic acid sequences
>>
>> Does this mean the description isn't searched? It would seem like
>> a sensible idea to me to include that...
>>
>> Searching for "geecee" works, but unless you're familiar with this
>> EMBOSS tool no-one will think of that.
>
>
> Peter,
>
> The tool search doesn't start until you type in three characters,
> so typing 'GC' does not initiate a search. Typing 'gc 'gc content' works. Perhaps a tooltip or help text is needed.
>
> J.

I see that now, and yes, perhaps a caption on the search
box would help...

Also typing C, C, enter doesn't work - that does surprise me.

There is still something amiss with the search apparently not
using the tool description line, for instance neither "acid" nor
"nucleic" nor "factional" show the EMBOSS geecee tool.

If the search is indexing on the tool's main help text, then
for the EMBOSS tools it would help to have an executive
summary with key words in it, rather than just a link to the
EMBOSS webpage for each tool.

Peter
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Nucleotide analysis - GC percentage

2011-04-14 Thread Hiram Clawson
The kent program hgGcPercent will measure what you want to measure from your 
sequences.

--Hiram


hgGcPercent - Calculate GC Percentage in 20kb windows
usage:
   hgGcPercent [options] database nibDir
 nibDir can be a .2bit file, a directory that contains a
 database.2bit file, or a directory that contains *.nib files.
 Loads gcPercent table with counts from sequence.
options:
   -win= - change windows size (default 2)
   -noLoad - do not load mysql table - create bed file
   -file= - output to  (stdout OK) (implies -noLoad)
   -chr= - process only chrN from the nibDir
   -noRandom - ignore randome chromosomes from the nibDir
   -noDots - do not display ... progress during processing
   -doGaps - process gaps correctly (default: gaps are not counted as GC)
   -wigOut - output wiggle ascii data ready to pipe to wigEncode
   -overlap=N - overlap windows by N bases (default 0)
   -verbose=N - display details to stderr during processing
   -bedRegionIn=input.bed   Read in a bed file for GC content in specific 
regions and write to bedRegionsOut
   -bedRegionOut=output.bed Write a bed file of GC content in specific regions 
from bedRegionIn

example:
  calculate GC percent in 5 base windows using a 2bit assembly (dp2):
hgGcPercent -wigOut -doGaps -win=5 -file=stdout -verbose=0 \
  dp2 /cluster/data/dp2 \
| wigEncode stdin gc5Base.wig gc5Base.wib

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Nucleotide analysis - GC percentage

2011-04-14 Thread Jeremy Goecks
> Now why does a tool search on the public Galaxy instance for GC
> not suggest this tool?
> 
> Name: geecee
> Description: Calculates fractional GC content of nucleic acid sequences
> 
> Does this mean the description isn't searched? It would seem like
> a sensible idea to me to include that...
> 
> Searching for "geecee" works, but unless you're familiar with this
> EMBOSS tool no-one will think of that.


Peter,

The tool search doesn't start until you type in three characters, so typing 
'GC' does not initiate a search. Typing 'gchttp://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Nucleotide analysis - GC percentage

2011-04-14 Thread Tychele

Hi Brad,

These tools are also in galaxy under the EMBOSS section.  "geecee"  
will tell you the percentage of GC in FASTA sequences. It basically  
outputs the sequence name and then the GC content as below:


#Sequence   GC content
Sequence1  0.44
Hope this helps!

Tychele


On Apr 14, 2011, at 12:13 PM, Brad Chapman wrote:


Peter and Guru;

[Computing GC]


I'll be working with simple sequence files (FASTA, or even FASTQ,
SFF, etc) rather than BED files, but I'll keep that in mind.


Emboss has some utilities that do this. infoseq and geecee, and
there are also programs for exploring CpG islands:

http://emboss.sourceforge.net/apps/release/6.3/emboss/apps/nucleic_cpg_islands_group.html

Brad
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/




___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Nucleotide analysis - GC percentage

2011-04-14 Thread Guru Ananda
Thanks for pointing this out, Brad. Both geecee and infoseq are in fact
available on Galaxy under EMBOSS section.

Guru.

On Thu, Apr 14, 2011 at 12:13 PM, Brad Chapman  wrote:

> Peter and Guru;
>
> [Computing GC]
>
> > I'll be working with simple sequence files (FASTA, or even FASTQ,
> > SFF, etc) rather than BED files, but I'll keep that in mind.
>
> Emboss has some utilities that do this. infoseq and geecee, and
> there are also programs for exploring CpG islands:
>
>
> http://emboss.sourceforge.net/apps/release/6.3/emboss/apps/nucleic_cpg_islands_group.html
>
> Brad
>



-- 
Graduate student, Bioinformatics and Genomics
Makova lab/Galaxy team
505 Wartik lab
University Park PA 16802
g...@psu.edu
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Nucleotide analysis - GC percentage

2011-04-14 Thread Peter Cock
On Thu, Apr 14, 2011 at 5:13 PM, Brad Chapman  wrote:
> Peter and Guru;
>
> [Computing GC]
>
>> I'll be working with simple sequence files (FASTA, or even FASTQ,
>> SFF, etc) rather than BED files, but I'll keep that in mind.
>
> Emboss has some utilities that do this. infoseq and geecee, and
> there are also programs for exploring CpG islands:
>
> http://emboss.sourceforge.net/apps/release/6.3/emboss/apps/nucleic_cpg_islands_group.html
>
> Brad
>

Good idea Brad :)

Now why does a tool search on the public Galaxy instance for GC
not suggest this tool?

Name: geecee
Description: Calculates fractional GC content of nucleic acid sequences

Does this mean the description isn't searched? It would seem like
a sensible idea to me to include that...

Searching for "geecee" works, but unless you're familiar with this
EMBOSS tool no-one will think of that.

Peter
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Nucleotide analysis - GC percentage

2011-04-14 Thread Brad Chapman
Peter and Guru;

[Computing GC]

> I'll be working with simple sequence files (FASTA, or even FASTQ,
> SFF, etc) rather than BED files, but I'll keep that in mind.

Emboss has some utilities that do this. infoseq and geecee, and
there are also programs for exploring CpG islands:

http://emboss.sourceforge.net/apps/release/6.3/emboss/apps/nucleic_cpg_islands_group.html

Brad
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Nucleotide analysis - GC percentage

2011-04-14 Thread Peter Cock
On Thu, Apr 14, 2011 at 4:15 PM, Guru Ananda  wrote:
> Hi Peter,
> There isn't a built-in Galaxy tool to compute GC%, yet.

Thanks Guru.

> You could perhaps use UCSC's hgGcPercent binary, which lets you
> compute GC% for BED intervals. You can find the same here:
> http://genome.ucsc.edu/FAQ/FAQdownloads#download27

I'll be working with simple sequence files (FASTA, or even FASTQ,
SFF, etc) rather than BED files, but I'll keep that in mind.

Peter

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Nucleotide analysis - GC percentage

2011-04-14 Thread Guru Ananda
Hi Peter,

There isn't a built-in Galaxy tool to compute GC%, yet. You could perhaps
use UCSC's hgGcPercent binary, which lets you compute GC% for BED intervals.
You can find the same here:
http://genome.ucsc.edu/FAQ/FAQdownloads#download27

Thanks,
Guru.

On Thu, Apr 14, 2011 at 9:11 AM, Peter Cock wrote:

> Hi all,
>
> Are there any built in Galaxy tools that I have missed to do with GC
> percentage (or indeed, AT percentage)?
>
> I'm thinking of a tool to calculate the GC percentage (and perhaps
> related statistics like counts/percentages of A, C, G, T), and perhaps
> a related tool to filter on GC. Possible use cases include filtering
> NGS reads to remove high/low GC reads from a contaminate.
>
> Slightly more complicated, right now I want to calculate the GC (or in
> fact AT) percentage from the first and last ~20 (configurable) bases.
> In this case I am looking for (and filtering on) AT rich ends of
> contigs which may be indicative of viral sequences. A very similar
> task would be looking for (and filtering on) poly A tails of mRNA, or
> if sequenced from the reverse strand, a poly T start.
>
> Peter
> ___
> The Galaxy User list should be used for the discussion of
> Galaxy analysis and other features on the public server
> at usegalaxy.org.  Please keep all replies on the list by
> using "reply all" in your mail client.  For discussion of
> local Galaxy instances and the Galaxy source code, please
> use the Galaxy Development list:
>
>  http://lists.bx.psu.edu/listinfo/galaxy-dev
>
> To manage your subscriptions to this and other Galaxy lists,
> please use the interface at:
>
>  http://lists.bx.psu.edu/
>



-- 
Graduate student, Bioinformatics and Genomics
Makova lab/Galaxy team
505 Wartik lab
University Park PA 16802
g...@psu.edu
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-user] Nucleotide analysis - GC percentage

2011-04-14 Thread Peter Cock
Hi all,

Are there any built in Galaxy tools that I have missed to do with GC
percentage (or indeed, AT percentage)?

I'm thinking of a tool to calculate the GC percentage (and perhaps
related statistics like counts/percentages of A, C, G, T), and perhaps
a related tool to filter on GC. Possible use cases include filtering
NGS reads to remove high/low GC reads from a contaminate.

Slightly more complicated, right now I want to calculate the GC (or in
fact AT) percentage from the first and last ~20 (configurable) bases.
In this case I am looking for (and filtering on) AT rich ends of
contigs which may be indicative of viral sequences. A very similar
task would be looking for (and filtering on) poly A tails of mRNA, or
if sequenced from the reverse strand, a poly T start.

Peter
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/