[galaxy-user] ftp not working

2014-03-27 Thread David Matthews
Hi,

I'm still not able to use ftp - keeps saying connection refused either through 
the command line or through cyberduck - is anyone else seeing this?

Best Wishes,
David.

_
Dr David A. Matthews

Senior Lecturer in Virology
Room E49
Department of Cellular and Molecular Medicine,
School of Medical Sciences
University Walk,
University of Bristol
Bristol.
BS8 1TD
U.K.

Tel. +44 117 3312058
Fax. +44 117 3312091

d.a.matth...@bristol.ac.uk





___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

[galaxy-user] Tophat error

2012-03-14 Thread David Matthews
Hi,

JUst running a TopHat job which returned the following error:

Executing: /gpfs/cluster/isys/galaxy/Software/bin/bowtie-inspect 
/local/tmp5Ywx45/dataset_942  ./tophat_out/tmp/dataset_942.fa
[Tue Mar 13 12:45:08 2012] Checking for Bowtie
Bowtie version:  0.12.7.0
[Tue Mar 13 12:45:08 2012] Checking for Samtools
Samtools Version: 0.1.18
[Tue Mar 13 12:45:08 2012] Generating SAM header for 
/local/tmp5Ywx45/dataset_942
format:  fastq
quality scale:   phred33 (default)
[Tue Mar 13 12:45:21 2012] Preparing reads
left reads: min. length=56, count=29523921
right reads: min. length=56, count=29543412
[Tue Mar 13 13:07:54 2012] Mapping left_kept_reads against dataset_942 with 
Bowtie 
[Tue Mar 13 13:45:26 2012] Processing bowtie hits
[Tue Mar 13 14:11:28 2012] Mapping left_kept_reads_seg1 against dataset_942 
with Bowtie (1/2)
[Tue Mar 13 14:43:27 2012] Mapping left_kept_reads_seg2 against dataset_942 
with Bowtie (2/2)
[Tue Mar 13 14:57:50 2012] Mapping right_kept_reads against dataset_942 with 
Bowtie 
[Tue Mar 13 15:37:46 2012] Processing bowtie hits
[Tue Mar 13 16:04:28 2012] Mapping right_kept_reads_seg1 against dataset_942 
with Bowtie (1/2)
[Tue Mar 13 16:37:18 2012] Mapping right_kept_reads_seg2 against dataset_942 
with Bowtie (2/2)
[Tue Mar 13 16:50:40 2012] Searching for junctions via segment mapping
Traceback (most recent call last):
  File /gpfs/cluster/isys/galaxy/Software/bin/tophat, line 3063, in module
sys.exit(main())
  File /gpfs/cluster/isys/galaxy/Software/bin/tophat, line 3029, in main
user_supplied_deletions)
  File /gpfs/cluster/isys/galaxy/Software/bin/tophat, line 2681, in 
spliced_alignment
[maps[initial_reads[left_reads]].unspliced_bwt, 
maps[initial_reads[left_reads]].seg_maps[-1]],
TypeError: list indices must be integers, not str
Does anyone know what this kind of error is?


Best Wishes,
David.

__
Dr David A. Matthews

Senior Lecturer in Virology
Room E49
Department of Cellular and Molecular Medicine,
School of Medical Sciences
University Walk,
University of Bristol
Bristol.
BS8 1TD
U.K.

Tel. +44 117 3312058
Fax. +44 117 3312091

d.a.matth...@bristol.ac.uk






___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-user] Tophat error

2012-03-14 Thread David Matthews
Hi,

JUst running a TopHat job which returned the following error:

Executing: /gpfs/cluster/isys/galaxy/Software/bin/bowtie-inspect 
/local/tmp5Ywx45/dataset_942  ./tophat_out/tmp/dataset_942.fa
[Tue Mar 13 12:45:08 2012] Checking for Bowtie
Bowtie version:  0.12.7.0
[Tue Mar 13 12:45:08 2012] Checking for Samtools
Samtools Version: 0.1.18
[Tue Mar 13 12:45:08 2012] Generating SAM header for 
/local/tmp5Ywx45/dataset_942
format:  fastq
quality scale:   phred33 (default)
[Tue Mar 13 12:45:21 2012] Preparing reads
left reads: min. length=56, count=29523921
right reads: min. length=56, count=29543412
[Tue Mar 13 13:07:54 2012] Mapping left_kept_reads against dataset_942 with 
Bowtie 
[Tue Mar 13 13:45:26 2012] Processing bowtie hits
[Tue Mar 13 14:11:28 2012] Mapping left_kept_reads_seg1 against dataset_942 
with Bowtie (1/2)
[Tue Mar 13 14:43:27 2012] Mapping left_kept_reads_seg2 against dataset_942 
with Bowtie (2/2)
[Tue Mar 13 14:57:50 2012] Mapping right_kept_reads against dataset_942 with 
Bowtie 
[Tue Mar 13 15:37:46 2012] Processing bowtie hits
[Tue Mar 13 16:04:28 2012] Mapping right_kept_reads_seg1 against dataset_942 
with Bowtie (1/2)
[Tue Mar 13 16:37:18 2012] Mapping right_kept_reads_seg2 against dataset_942 
with Bowtie (2/2)
[Tue Mar 13 16:50:40 2012] Searching for junctions via segment mapping
Traceback (most recent call last):
  File /gpfs/cluster/isys/galaxy/Software/bin/tophat, line 3063, in module
sys.exit(main())
  File /gpfs/cluster/isys/galaxy/Software/bin/tophat, line 3029, in main
user_supplied_deletions)
  File /gpfs/cluster/isys/galaxy/Software/bin/tophat, line 2681, in 
spliced_alignment
[maps[initial_reads[left_reads]].unspliced_bwt, 
maps[initial_reads[left_reads]].seg_maps[-1]],
TypeError: list indices must be integers, not str
Does anyone know what this kind of error is?

Best Wishes,
David.



__
Dr David A. Matthews

Senior Lecturer in Virology
Room E49
Department of Cellular and Molecular Medicine,
School of Medical Sciences
University Walk,
University of Bristol
Bristol.
BS8 1TD
U.K.

Tel. +44 117 3312058
Fax. +44 117 3312091

d.a.matth...@bristol.ac.uk






___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-user] Make a vcf file

2012-02-14 Thread David Matthews
Hi,

This may be a dense question, but how do we generate a vcf file from the public 
version of Galaxy? Am I missing something obvious?


Best Wishes,
David.

__
Dr David A. Matthews

Senior Lecturer in Virology
Room E49
Department of Cellular and Molecular Medicine,
School of Medical Sciences
University Walk,
University of Bristol
Bristol.
BS8 1TD
U.K.

Tel. +44 117 3312058
Fax. +44 117 3312091

d.a.matth...@bristol.ac.uk






___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-user] BLAST+ on the test site

2012-01-04 Thread David Matthews
Hi,

I;m wanting to run BLASTp on the test site to compare it to some test runs here 
on our local copy but the tool does not run saying Index file named 
'blastdb_p.loc' is required by tool but not available. Is this me doing 
something wrong or is it something missing at the test site?


Best Wishes,
David.

__
Dr David A. Matthews

Senior Lecturer in Virology
Room E49
Department of Cellular and Molecular Medicine,
School of Medical Sciences
University Walk,
University of Bristol
Bristol.
BS8 1TD
U.K.

Tel. +44 117 3312058
Fax. +44 117 3312091

d.a.matth...@bristol.ac.uk



___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] selecting reads at random from fastq file

2011-11-09 Thread David Matthews
Hi,

This may be a bit dumb or missing the point but just selecting the first 5 
million is kind of random isn't it? I mean where the reads map and what they 
are from is not known to you and they were not collected by the sequencer in a 
manner that is influenced by the nature of the sample?


Best Wishes,
David.

__
Dr David A. Matthews

Senior Lecturer in Virology
Room E49
Department of Cellular and Molecular Medicine,
School of Medical Sciences
University Walk,
University of Bristol
Bristol.
BS8 1TD
U.K.

Tel. +44 117 3312058
Fax. +44 117 3312091

d.a.matth...@bristol.ac.uk






On 9 Nov 2011, at 09:44, Hans-Rudolf Hotz wrote:

 Hi Paul, Hi Peter
 
 You might also wanna look at the 'FastqSampler' function in the Bioconductor 
 'ShortRead' package
 http://bioconductor.org/packages/release/bioc/html/ShortRead.html
 
 We are working (as part of our NGS pipeline redesign) on adding more 
 Bioconductor functionalities to Galaxy. Unfortunately, it is very low on my 
 pile of stuff to do, so it will take a while till it appears in the 'Tool 
 Shed'.
 
 
 Regards, Hans
 
 
 
 On 11/08/2011 11:45 PM, Peter Cock wrote:
 On Tue, Nov 8, 2011 at 10:26 PM, Austin Paulausti...@usc.edu  wrote:
 Hi Peter,
 
 Thanks for the suggestion.  For example, I have a fastq file with 50 million
 reads and I want to randomly select 5 million of them. It seems biopython
 would very easily select a single or a handful of reads with the
 Bio.SeqIO.index() function.  Would it also be able to do the job I am
 interested in?
 
 Austin
 
 I think so, but you'd have to use Bio.SeqIO.index_db() which stores
 the index in an SQLite dictionary rather than in memory which isn't
 really viable here (unless you have a 64bit big memory machine?).
 I don't think I've tried it with quite that many reads though...
 
 Alternatively, if I understood her correctly, Jennifer pointed out you
 can do this in Galaxy but it will take a lot of IO:
 
 1. Convert FASTQ to tabular (4 lines per record -  1 line per record)
 2. Randomly select lines (each line is now a record so safe)
 3. Convert tabular back to FASTQ
 
 It should work though, and requires no additional programming.
 
 Peter
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
   http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
   http://lists.bx.psu.edu/
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
 http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] SNPeff tool?

2011-11-08 Thread David Matthews
Hi,

I've had a few email chats with the author of snpEff and the fly in the 
ointment from my perspective is getting the vcf files it needs through Galaxy. 
As I understand it there is no way currently of getting the BAM/SAM files into 
the right input format so snpEff can use it within a Galaxy setup. So, whatever 
you do you'll still need one or two command line steps. We have a copy of 
snpEff here at Bristol on our Galaxy and when we did that we then realised 
there was no Galaxy method (that we could think of) for getting the input file 
ready for snpEFF to use. This is a pity as its actually a very nice piece of 
software with a nice professional looking output.

Best Wishes,
David.

__
Dr David A. Matthews

Senior Lecturer in Virology
Room E49
Department of Cellular and Molecular Medicine,
School of Medical Sciences
University Walk,
University of Bristol
Bristol.
BS8 1TD
U.K.

Tel. +44 117 3312058
Fax. +44 117 3312091

d.a.matth...@bristol.ac.uk






On 8 Nov 2011, at 13:55, Dannon Baker wrote:

 Hi Laura,
 
 While the SNPeff developers have made Galaxy wrappers available, this is not 
 a tool we currently have installed for use on the Galaxy server at 
 main.g2.bx.psu.edu.  Off the top of my head, I don't know of any other public 
 Galaxy servers that offer this tool, but if you have access to a local or 
 cloud galaxy server you could use the provided wrapper to install the tool 
 for use there.
 
 Thanks!
 
 -Dannon
 
 
 
 On Nov 8, 2011, at 6:40 AM, Laura Elizabeth Spoor wrote:
 
 Hi,
 
 I use the Galaxy server and was wondering how to use SNPeff tool? I have 
 seen that it can be integrating with Galaxy on their website 
 (http://snpeff.sourceforge.net/images/snpEff_galaxy.png) but cannot see it 
 on the server? Is it something that can be run on the server?
 
 Best Wishes,
 
 Laura
 
 -- 
 The University of Edinburgh is a charitable body, registered in
 Scotland, with registration number SC005336.
 
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
 http://lists.bx.psu.edu/
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] SNPeff tool?

2011-11-08 Thread David Matthews
Hi,

Yes, I see that you can generate the VCF files that way but there is no 
seamless way of doing it entirely from within galaxy - i.e. you need to come 
out of galaxy at some point (or am I missing something?).


Best Wishes,
David.

__
Dr David A. Matthews

Senior Lecturer in Virology
Room E49
Department of Cellular and Molecular Medicine,
School of Medical Sciences
University Walk,
University of Bristol
Bristol.
BS8 1TD
U.K.

Tel. +44 117 3312058
Fax. +44 117 3312091

d.a.matth...@bristol.ac.uk






On 8 Nov 2011, at 14:39, Chorny, Ilya wrote:

 I got it working just fine on my local server. Could you expand on your vcf 
 issue? I generate the vcf using gatk.
 
 Sent from my iPhone
 
 On Nov 8, 2011, at 6:36 AM, David Matthews 
 d.a.matth...@bristol.ac.ukmailto:d.a.matth...@bristol.ac.uk wrote:
 
 Hi,
 
 I've had a few email chats with the author of snpEff and the fly in the 
 ointment from my perspective is getting the vcf files it needs through 
 Galaxy. As I understand it there is no way currently of getting the BAM/SAM 
 files into the right input format so snpEff can use it within a Galaxy setup. 
 So, whatever you do you'll still need one or two command line steps. We have 
 a copy of snpEff here at Bristol on our Galaxy and when we did that we then 
 realised there was no Galaxy method (that we could think of) for getting the 
 input file ready for snpEFF to use. This is a pity as its actually a very 
 nice piece of software with a nice professional looking output.
 
 Best Wishes,
 David.
 
 __
 Dr David A. Matthews
 
 Senior Lecturer in Virology
 Room E49
 Department of Cellular and Molecular Medicine,
 School of Medical Sciences
 University Walk,
 University of Bristol
 Bristol.
 BS8 1TD
 U.K.
 
 Tel. +44 117 3312058
 Fax. +44 117 3312091
 
 d.a.matth...@bristol.ac.ukmailto:d.a.matth...@bristol.ac.uk
 
 
 
 
 
 
 On 8 Nov 2011, at 13:55, Dannon Baker wrote:
 
 Hi Laura,
 
 While the SNPeff developers have made Galaxy wrappers available, this is not 
 a tool we currently have installed for use on the Galaxy server at 
 main.g2.bx.psu.edu.  Off the top of my head, I don't know of any other public 
 Galaxy servers that offer this tool, but if you have access to a local or 
 cloud galaxy server you could use the provided wrapper to install the tool 
 for use there.
 
 Thanks!
 
 -Dannon
 
 
 
 On Nov 8, 2011, at 6:40 AM, Laura Elizabeth Spoor wrote:
 
 Hi,
 
 I use the Galaxy server and was wondering how to use SNPeff tool? I have seen 
 that it can be integrating with Galaxy on their website 
 (http://snpeff.sourceforge.net/images/snpEff_galaxy.png) but cannot see it on 
 the server? Is it something that can be run on the server?
 
 Best Wishes,
 
 Laura
 
 --
 The University of Edinburgh is a charitable body, registered in
 Scotland, with registration number SC005336.
 
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.orghttp://usegalaxy.org.  Please keep all replies on the list 
 by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
 http://lists.bx.psu.edu/
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.orghttp://usegalaxy.org.  Please keep all replies on the list 
 by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
 http://lists.bx.psu.edu/
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.orghttp://usegalaxy.org.  Please keep all replies on the list 
 by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
 http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy

[galaxy-user] Downloading large files from galaxy

2011-09-13 Thread David Matthews
Hi,

I seem to be having problems downloading large files from galaxy - the request 
times out at about 1GB and I'm downloading 2-3GB. Am I doing something wrong?

David


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] miRNA NGS data processing

2011-08-09 Thread David Matthews
Hi,

Tophat may still be an option for you. You can filter out spliced reads by 
filtering column 6 (the CIGAR column) for reads that only map directly (i.e. 
c6=='56M' if you have a 56bp paired end read). But I agree with Jen that most 
likely it is a sort problem.


Best Wishes,
David.

__
Dr David A. Matthews

Senior Lecturer in Virology
Room E49
Department of Cellular and Molecular Medicine,
School of Medical Sciences
University Walk,
University of Bristol
Bristol.
BS8 1TD
U.K.

Tel. +44 117 3312058
Fax. +44 117 3312091

d.a.matth...@bristol.ac.uk






On 9 Aug 2011, at 07:27, yao chen wrote:

 Hi Mete,
 
 I am not sure it is the sort problem. I find cufflinks in galaxy is 
 unstable. I have bam files from Tophat which I can run cufflinks a few days 
 agao. 
 
 But these days when I run cufflinks with these bam files, the error shows. 
 Strangely, it can work some time. I don't know the reason.
 
 ChenYao
 
 2011/8/9 Jennifer Jackson j...@bx.psu.edu
 Hi Mete,
 
 This FAQ has a workflow for sorting a Bowtie (or any) SAM file for Cufflinks:
 http://main.g2.bx.psu.edu/u/jeremy/p/transcriptome-analysis-faq#faq2
 
 Thanks!
 
 Jen
 Galaxy team
 
 
 On 8/4/11 10:27 AM, Mete Civelek wrote:
 Hi,
 
 I'm trying to get read counts or FPKM values for my miRNA NGS data on
 Galaxy. I have aligned the reads using Bowtie, but it appears that
 Cufflinks gives an error when run on the Bowtie alignments (This might
 have something to do with Bowtie's BAM file not being sorted). I know
 that Tophat alignments work well with Cufflinks, but I'm not sure if it
 would be possible to use Tophat for my data since miRNA don't have
 splice junctions. I've tried without success to parameterize Tophat to
 completely avoid assigning splice junctions (by setting the max intron
 length to 1). Is there a way I can get the Bowtie alignment to work with
 Cufflinks on Galaxy? Or perhaps there's a way I can parametrize Tophat
 as to get no splice junctions?
 
 Thanks,
 
 Mete
 
 
 
 IMPORTANT WARNING: This email (and any attachments) is only intended for
 the use of the person or entity to which it is addressed, and may
 contain information that is privileged and confidential. You, the
 recipient, are obligated to maintain it in a safe, secure and
 confidential manner. Unauthorized redisclosure or failure to maintain
 confidentiality may subject you to federal and state penalties. If you
 are not the intended recipient, please immediately notify us by return
 email, and delete this message from your computer.
 
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
   http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
   http://lists.bx.psu.edu/
 
 -- 
 Jennifer Jackson
 http://usegalaxy.org
 http://galaxyproject.org/Support
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] generating a new fasta from a pileup

2011-08-01 Thread David Matthews
Hi John,

That would be totally fantastic - many thanks!


Best Wishes,
David.

__
Dr David A. Matthews

Senior Lecturer in Virology
Room E49
Department of Cellular and Molecular Medicine,
School of Medical Sciences
University Walk,
University of Bristol
Bristol.
BS8 1TD
U.K.

Tel. +44 117 3312058
Fax. +44 117 3312091

d.a.matth...@bristol.ac.uk






On 30 Jul 2011, at 16:35, John Nash wrote:

 I have some code which can do most of the requested things. Let me figure out 
 how to galaxy around it, and I'll submit it.
 
 John
 
 Sent from my mobile device
 
 On 2011-07-30, at 12:47 AM, Jennifer Jackson j...@bx.psu.edu wrote:
 
 Hello David,
 
 Generating a consensus fasta sequence from a BAM or Pile-up file is not yet 
 possible in Galaxy. To date, the Tool Shed also does not have a 
 wrapped/novel tool for this function either.
 
 If you or another user were to create such a wrapped tool, it would be most 
 welcome. As would a tool that would replace the corresponding region of the 
 reference genome with the variant fasta sequence to create a novel reference 
 for alignments.
 
 Both great ideas that have been discussed a few times on the list and here 
 among our team. If you wanted to open a bitbucket ticket, that would be one 
 way to share exactly what you had in mind and give you a ticket to watch for 
 if/when tools like this are added. Or, I can open one (or possibly two, one 
 for each function) for you, just let me know.
 
 https://bitbucket.org/galaxy/galaxy-central/issues?status=newstatus=open
 
 Thanks for the great feedback, sorry there wasn't a solution (yet!),
 
 Best,
 
 Jen
 Galaxy team
 
 
 On 7/22/11 12:56 PM, David Matthews wrote:
 Hi
 
 On a separate issue, I have been having trouble generating a corrected 
 fasta file based on a pileup. I have a dataset that is a resequenced genome 
 and I want to correct the fasta file based on the consensus and then re run 
 the alignments to see how it affects things. However, I cannot for the life 
 of me figure out how to do it in Galaxy. Any help appreciated!
 
 David
 
 
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/
 
 -- 
 Jennifer Jackson
 http://usegalaxy.org
 http://galaxyproject.org/Support
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
 http://lists.bx.psu.edu/
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] calculating percent coverage over the target genome

2011-07-28 Thread David Matthews
Hi Jen,

Many thanks for this, on a related subject do you know of a way to correct a 
FASTA file on the basis of a pileup (or even just on the BAM file)?


Best Wishes,
David.

__
Dr David A. Matthews

Senior Lecturer in Virology
Room E49
Department of Cellular and Molecular Medicine,
School of Medical Sciences
University Walk,
University of Bristol
Bristol.
BS8 1TD
U.K.

Tel. +44 117 3312058
Fax. +44 117 3312091

d.a.matth...@bristol.ac.uk






On 25 Jul 2011, at 17:52, Jennifer Jackson wrote:

 Hello David,
 
 To calculate coverage, please see the tool Regional Variation - Feature 
 coverage. Query and target must both be in Interval/BED format. Query data 
 in Interval/BED format is possible in most of the dataflow paths through the 
 tools and from external sources. The reference genome file will likely need 
 to be imported and formatted.
 
 This is simple example history where I pulled the chromInfo file from UCSC 
 and formatted, extracted a subset of genes in BED format, and ran the 
 Feature Coverage tool (both directions, see datasets 8 and 9).
 
 http://main.g2.bx.psu.edu/u/jen-bx-galaxy-edu/h/galaxy-user-calculating-percent-coverage-over-the-target-genome-7-22
 
 Hopefully this helps,
 
 Jen
 Galaxy team
 
 On 7/22/11 12:32 PM, David Matthews wrote:
 Hi
 
 Does anyone know how to calculate how much of a genome was covered by an 
 alignment irrespective of the depth at each base?
 
 Cheers
 David
 
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
   http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
   http://lists.bx.psu.edu/
 
 -- 
 Jennifer Jackson
 http://usegalaxy.org/
 http://galaxyproject.org/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-user] calculating percent coverage over the target genome

2011-07-22 Thread David Matthews
Hi

Does anyone know how to calculate how much of a genome was covered by an 
alignment irrespective of the depth at each base?

Cheers
David


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-user] Looking for new transcripts with cufflinks

2011-07-04 Thread David Matthews
Hi,

I am working with HeLa cells and want to know how to get cufflinks etc. to 
highlight if a region of the genome is being transcribed that is not in the 
ensembl gtf. I know that cufflinks highlights with class code j regions that 
do not match a known gene and therefore may be novel but most of these arise 
from transcription on or near known genes. Does anyone know how to look for 
transcription that is clearly distinct from known genes? This is a wild goose 
chase but worth a peek just in case...


Best Wishes,
David.

__
Dr David A. Matthews

Senior Lecturer in Virology
Room E49
Department of Cellular and Molecular Medicine,
School of Medical Sciences
University Walk,
University of Bristol
Bristol.
BS8 1TD
U.K.

Tel. +44 117 3312058
Fax. +44 117 3312091

d.a.matth...@bristol.ac.uk



___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Looking for new transcripts with cufflinks

2011-07-04 Thread David Matthews
Of course - Doh! Many thanks!!


On 4 Jul 2011, at 18:47, Oliver, Gavin wrote:

 u represents unknown intergenic transcripts.
 
 
 -Original Message-
 From: galaxy-user-boun...@lists.bx.psu.edu on behalf of David Matthews
 Sent: Mon 04/07/2011 17:48
 To: galaxy-user@lists.bx.psu.edu
 Subject: [galaxy-user] Looking for new transcripts with cufflinks
 
 Hi,
 
 I am working with HeLa cells and want to know how to get cufflinks etc. to 
 highlight if a region of the genome is being transcribed that is not in the 
 ensembl gtf. I know that cufflinks highlights with class code j regions 
 that do not match a known gene and therefore may be novel but most of these 
 arise from transcription on or near known genes. Does anyone know how to look 
 for transcription that is clearly distinct from known genes? This is a wild 
 goose chase but worth a peek just in case...
 
 
 Best Wishes,
 David.
 
 __
 Dr David A. Matthews
 
 Senior Lecturer in Virology
 Room E49
 Department of Cellular and Molecular Medicine,
 School of Medical Sciences
 University Walk,
 University of Bristol
 Bristol.
 BS8 1TD
 U.K.
 
 Tel. +44 117 3312058
 Fax. +44 117 3312091
 
 d.a.matth...@bristol.ac.uk
 
 
 
 
 
 The contents of this message and any attachments to it are confidential and 
 may be legally privileged. If you have received this message in error, you 
 should delete it from your system immediately and advise the sender.
 
 Almac Group (UK) Limited, registered no. NI061368.  Almac Sciences Limited, 
 registered no. NI041550.  Almac Discovery Limited, registered no. NI046249.  
 Almac Pharma Services Limited, registered no. NI045055.  Almac Clinical 
 Services Limited, registered no. NI041905.  Almac Clinical Technologies 
 Limited, registered no. NI061202.  Almac Diagnostics Limited, registered no. 
 NI043067.  All preceding companies are registered in Northern Ireland with a 
 registered office address of Almac House, 20 Seagoe Industrial Estate, 
 Craigavon, BT63 5QD, UK.
 
 Almac Sciences (Scotland) Limited, registered in Scotland no. SC154034.
 
 Almac Clinical Services LLC, Almac Clinical Technologies LLC and Almac 
 Diagnostics LLC are Delaware limited liability companies and Almac Group 
 Incorporated is a Delaware Corporation.  More information on the Almac Group 
 can be found on the Almac website: www.almacgroup.com
 


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Cufflinks advice

2011-06-15 Thread David Matthews
Hi,

As a first guess I would say that your chromosome names do not match somewhere 
along the line. If you look at your sam file and the fasta of the genome you 
are working with (and the gtf file as well if you are using it) you may find, 
for example, one refers to chromosome 1 as chr1 whilst the other refers to 
chromosome 1 as 1 or even Chr1 or some other way of referring to the 
chromosome - any of these mismatches can cause you to get an empty output. If 
you are using a built in index it may be you need to change your chromosome 
names from 1 to chr1 for example. Amazingly, the names of human chromosomes 
are apparently not yet standardised across all databases for the human genome 
(and I presume this may be the case for other genomes as well). 

Best Wishes,
David.

__
Dr David A. Matthews

Senior Lecturer in Virology
Room E49
Department of Cellular and Molecular Medicine,
School of Medical Sciences
University Walk,
University of Bristol
Bristol.
BS8 1TD
U.K.

Tel. +44 117 3312058
Fax. +44 117 3312091

d.a.matth...@bristol.ac.uk






On 15 Jun 2011, at 02:22, Michael Gooch wrote:

 I attempted to run cufflinks on some RNA sequencing data. It seemed to 
 complete without any errors, but the output files were empty. I am trying to 
 figure out if I did something wrong or whether my data needs some additional 
 processing before cufflinks will be able to use it. (Or whether the data is 
 unsuitable for  cufflinks.) The data is paired end reads.
 
 M. Gooch
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
 http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-user] Converting transcriptomes to proteomes

2011-06-09 Thread David Matthews
Dear Galaxy users,

I am trying to modify the human proteome based on my transcriptomeics data. In 
short I want to use my transcriptomics data to identify snps and from that 
identify coding changes that result from the snps. Ultimately I'd like to 
create a customised canonical proteome based on my transcriptomic data. Does 
anyone know how this might be done in Galaxy? I have started by running a 
pileup and so on but I am not a human geneticist (I am a virologist) so I may 
be making some fundamental errors!!

Any help is gratefully received!


Best Wishes,
David.

__
Dr David A. Matthews

Senior Lecturer in Virology
Room E49
Department of Cellular and Molecular Medicine,
School of Medical Sciences
University Walk,
University of Bristol
Bristol.
BS8 1TD
U.K.

Tel. +44 117 3312058
Fax. +44 117 3312091

d.a.matth...@bristol.ac.uk






___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Aligning against Multiple Reference Sequences

2011-06-09 Thread David Matthews
Hi John,

Probably the simplest thing for you to do would be to concatenate the two 
genomes one after the other using the concatenate tool under text 
manipulation. This will generate a new organism with apparently two 
chromosomes one from bacteria A and one from bacteria B. When you run tophat or 
bowtie the sam file will indicate which chromosome (i.e. which bacteria) it 
assigned the read to.

Hope this helps.


Best Wishes,
David.

__
Dr David A. Matthews

Senior Lecturer in Virology
Room E49
Department of Cellular and Molecular Medicine,
School of Medical Sciences
University Walk,
University of Bristol
Bristol.
BS8 1TD
U.K.

Tel. +44 117 3312058
Fax. +44 117 3312091

d.a.matth...@bristol.ac.uk






On 9 Jun 2011, at 15:18, John David Osborne wrote:

 Are there any tools in Galaxy to align short reads against multiple reference 
 sequences?
  
 I have a dozen microbial genomes sequenced for which there are 2 reference 
 genomes already sequenced. We have tried aligning each of these individually 
 against either of the reference genomes - some align better against the first 
 reference genome, some align better against the second reference genome. 
 Ideally though I would like to be able to align against both at the same 
 time. Is this possible?
  
 I have found a tool called GenomeMapper and hints of 2 other tools in 
 development that do something like this, but nothing for Galaxy yet.
  
 How do others proceed with this type of problem? Workflows appreciated! :)
  
  -John
  
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] RNA seq analysis

2011-05-06 Thread David Matthews
Hi,

I have done exactly the same kind of thing for adenovirus so I can help with 
it. In answer to question 1 you do not need to index it will be done for you 
when tophat is called. Secondly you should leave the 40 multihits as it is and 
post analysis filter out the multihits - this will allow you to determine if 
you do have a multihit problem or not and if so whether it is a big problem and 
where it is on the genome. I have a workflow on Galaxy which you can use called 
Bristol workflow to get sorted unique proper pair mapped reads. If you plug 
in your sam file it should give you files listing only unique hits and those 
which map more than once. This workflow assumes you have paired end data but it 
can be modified to work with single end reads as well.

Hope this helps.


Best Wishes,
David.

__
Dr David A. Matthews

Senior Lecturer in Virology
Room E49
Department of Cellular and Molecular Medicine,
School of Medical Sciences
University Walk,
University of Bristol
Bristol.
BS8 1TD
U.K.

Tel. +44 117 3312058
Fax. +44 117 3312091

d.a.matth...@bristol.ac.uk






On 6 May 2011, at 17:09, puvan...@umn.edu wrote:

 Hi
 
 I have a couple of questions regarding RNA seq analysis. My questions are
 1.I need to use a viral genome (very small, ~2kb ) as a reference genome and 
 it is not available in Galaxy. I guess I can use this data from my history. I 
 have a fasta file but I am not sure whether I have to do some kind of 
 indexing or not.
 
 2. In Tophat, default for maximum number of alignments to be allowed is 40. 
 What my understanding is a single read can be aligned maximum 40 different 
 places. I am wondering why this is 40. Is there any specific reason? If I 
 need unique mapping, I have to use 1 instead of 40. Am I correct?
 
 
 Thanks
 
 SP
 
 
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
 http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-user] SNPs, Indels and so on in virus genomes

2011-05-03 Thread David Matthews
Hi,

I recently sent out an email asking if anyone knew much about analysis of SNPs 
etc and how to visualise them. I got some very useful answers and planned to 
return to the problem when I got a better chance to work on this in some depth. 
I now have the time but, like an idiot, I've accidentally deleted those email 
replies! So can I please ask again, does anyone have experience of SNP analysis 
and, especially, visualisation that can hold my hand whilst I work this out 
(apologies to the ones who replied last time but can you get in touch again !)?


Best Wishes,
David.

__
Dr David A. Matthews

Senior Lecturer in Virology
Room E49
Department of Cellular and Molecular Medicine,
School of Medical Sciences
University Walk,
University of Bristol
Bristol.
BS8 1TD
U.K.

Tel. +44 117 3312058
Fax. +44 117 3312091

d.a.matth...@bristol.ac.uk




___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Around Pittsburgh on April 6? Attend the Intro to Galaxy Sessions @ Pitt

2011-03-31 Thread David Matthews
I would be very keen to see this as a webcast or similar - I'd even stay up 
late to watch it!


On 31 Mar 2011, at 20:19, Dave Clements wrote:

 Hello all,
 
 Dan Blankenberg will be giving two workshops on Galaxy at the University of 
 Pittsburgh on April 6.  The presentations are open to the public.  See below 
 for details and please contact Dan, or Carrie Iwema at Pitt, if you have any 
 questions.
 
 Thanks,
 
 Dave C.
 
 Intro to Galaxy
 http://galaxy.psu.edu/ 
 
 Dan Blankenberg, PhD
 Center for Comparative Genomics  Bioinformatics
 Penn State University
 
 Galaxy allows you to do analyses you cannot do anywhere else without the need 
 to install or download anything. 
 You can analyze multiple alignments, compare genomic annotations, profile 
 metagenomic samples  more...
 
 
 Wednesday 6th April
 
 10 am – 12 pmIntro to Galaxy (general interest)
 
 2 pm - 4 pmWorking w/NGS Data (advanced users)
 
 
 
 University of Pittsburgh
 
 Falk Library
 
 Conference Room B
 
 
 You are welcome to bring your laptop.
 
 
 Carrie L. Iwema, PhD, MLS
 Information Specialist in Molecular Biology
 
 Health Sciences Library System
 University of Pittsburgh
 200 Scaife Hall
 3550 Terrace St
 Pittsburgh, PA  15261
 
 412-383-6887
 412-648-8819 (fax)
 iw...@pitt.edu
 www.hsls.pitt.edu/molbio
 
 -- 
 http://galaxy.psu.edu/gcc2011/
 http://getgalaxy.org
 http://usegalaxy.org/
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Pseudo Autosomal regions in Chrs X and Y

2011-03-29 Thread David Matthews
Fantastic, many thanks!


Best Wishes,
David.

__
Dr David A. Matthews

Senior Lecturer in Virology
Room E49
Department of Cellular and Molecular Medicine,
School of Medical Sciences
University Walk,
University of Bristol
Bristol.
BS8 1TD
U.K.

Tel. +44 117 3312058
Fax. +44 117 3312091

d.a.matth...@bristol.ac.uk






On 29 Mar 2011, at 00:19, Jennifer Jackson wrote:

 Hi David,
 
 The PAR regions are documented at UCSC on the hg19 genome gateway page (and 
 for some other recent genomes). Start at the main page, click into Genomes, 
 select hg19, then scroll down to credits:
 
 http://genome.ucsc.edu/
 
 quote:
 
 The Y chromosome in this assembly contains two pseudoautosomal regions (PARs) 
 that were taken from the corresponding regions in the X chromosome and are 
 exact duplicates:
 
 chrY:10001-2649520 and chrY:59034050-59363566
 chrX:60001-2699520 and chrX:154931044-155260560
 
 Hopefully this helps!
 Jen
 
 On 3/28/11 2:04 PM, David Matthews wrote:
 Hi,
 
 Again, thanks for the feedback. I made my own female hg19 by deleting chrY 
 from my copy of hg19 so thats OK. It still leaves the problem of how to 
 analyse male transcriptomes since maps to PAR1 and 2 genes get reported as 
 multimap reads which can end up being filtered out depending on how you 
 analyse your transcriptome. If I knew with certainty where PAR1 and 2 are on 
 chrY of hg19 I was planning to replace the nucleotides with N's on chrY so 
 that they would no longer show up as a multimap problem - do you (or anyone 
 else) happen to know the co-ordinates on hg19?
 
 Cheers
 David
 
 
 On 23 Mar 2011, at 14:19, Jennifer Jackson wrote:
 
 Hi David,
 
 Right now we don't have anything built-in to filter out this type of 
 duplication automatically.
 
 As a potential option, did you know that we offer a Canonical Female 
 build for certain genomes? This may help with some of the duplication 
 issues, if the loss of novel Y is OK for your project.
 
 Please see:
 https://bitbucket.org/galaxy/galaxy-central/wiki/GenomeData
 
 Thanks for bringing up a good point!
 
 Best,
 Jen
 
 
 On 3/10/11 8:44 AM, David Matthews wrote:
 Hi All again,
 
 A separate point about the analysis of cufflinks data is the subject of
 the Pseudo Autosomal Regions in X and Y - this will make a mess of gene
 expression analysis in some cases especially because tophat will assign
 a read to both places which therefore makes it a multihit read (which
 you might then filter out) or it may double the true levels of reported
 expression. Anyone had experience/thoughts on this?
 
 Best Wishes,
 David.
 
 __
 Dr David A. Matthews
 
 Senior Lecturer in Virology
 Room E49
 Department of Cellular and Molecular Medicine,
 School of Medical Sciences
 University Walk,
 University of Bristol
 Bristol.
 BS8 1TD
 U.K.
 
 Tel. +44 117 3312058
 Fax. +44 117 3312091
 
 d.a.matth...@bristol.ac.ukmailto:d.a.matth...@bristol.ac.uk
 
 
 
 
 
 
 
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
   http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
   http://lists.bx.psu.edu/
 
 --
 Jennifer Jackson
 http://usegalaxy.org
 http://galaxyproject.org
 
 
 
 -- 
 Jennifer Jackson
 http://usegalaxy.org
 http://galaxyproject.org

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] A genbank to gtf converter

2011-03-28 Thread David Matthews
Hi Jen,

Many thanks for the reply. Sadly my programming is not up to anything like a 
gbk to gtf converter! The main reason I want one is that as a virologist this 
would be very useful since many viruses do not have a gtf file but do have 
genbank submissions. I know of a site that has some viruses listed together 
with GFF files but alas I cannot find a GFF to GTF converter - nightmare!!

I'll keep looking for one and if I find it I'll let you know.

Cheers
David


On 23 Mar 2011, at 18:02, Jennifer Jackson wrote:

 Hello David,
 
 This is a great idea that the team has been considering adding, but nothing 
 immediate is planned. There are some external teams that are working on 
 outside development, and this is on their list, to.
 
 If interested in what that project is doing, please see this thread:
 http://lists.bx.psu.edu/pipermail/galaxy-dev/2011-March/004692.html
 
 For now, if the data resides in a track at UCSC (many are, especially for 
 vertebrate genomes and it is updated daily), using the Table browser can 
 allow you to export the data in GTF and push to Galaxy with the Get Data 
 tool. Since some of the data can be large, using BX Main (our local UCSC 
 mirror) may be the best source.
 
 To do this, navigate to the target genome and track (RefSeq under Gene 
 Predictions, others under Mrna  EST), and choose output format GTF - gene 
 transfer format. Please note that the gene_id attribute in the 9th field 
 will not be populated with the gene name (will be same as transcript_id). 
 This is just how UCSC does it right now (on their list to get the full GTF 
 output set up in the TB, as far as we know). But, to get that info now, go 
 back in and reexport the same table data again as all fields from selected 
 table into Galaxy and the gene name will be in the data field named name2. 
 The text manipulation tools can help to format the data.
 
 A workflow would be a good option once you have the tool path worked out, so 
 that it can be reused without having to do it all again, for future similar 
 genbank datasets. You may even want to publish the workflow for others to 
 use, as it is very popular request, maybe add published page to explain how 
 to use/prep data for input.
 
 Apologies for the current inconvenience, but hopefully this can get you going 
 until a more direct method is implemented directly in Galaxy main.
 
 Great idea that many other users are also very interested in. Any 
 contributions (page, workflow) would be most welcomed. A tool that does the 
 extraction directly from Genbank would also be welcomed in the Tool Shed, if 
 you want to contribute.
 http://community.g2.bx.psu.edu/
 
 Best,
 
 Jen
 Galaxy team
 
 
 On 3/14/11 1:15 PM, David Matthews wrote:
 Hi again,
 
 Does anyone know of a genbank to gtf converter? I have heard such things 
 exist but never found one...
 
 Cheers
 David
 
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
   http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
   http://lists.bx.psu.edu/
 
 -- 
 Jennifer Jackson
 http://usegalaxy.org
 http://galaxyproject.org


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-user] Tophat version

2011-03-14 Thread David Matthews
Hi,

Just wondering when the tophat portion of Galaxy will be updated? Its currently 
version 1.1.1 and there is now a version 1.2.0 (in fact I think there have been 
4 updates).

Cheers
David


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] downstream analysis of cuffdiff out put

2011-03-10 Thread David Matthews
Hi All,

I agree with this problem and solution. I have a lot of cufflinks, cuffcompare 
and cuffdiff output but I am struggling to relate what this means in terms of 
the real world! I have seen partek software attempt to visualise some of the 
data it generates which appears to be using the FMI data in the cufflinks suite 
but beyond that I struggle. I did have an email conversation with Cole Trapnell 
which eventually centred on the idea that you just have to trust the analysis 
and then go away and do the RT-PCR to check it all out!
So for tools I think:

1. A tool that shows you the layout of known isoforms for a gene and the FMI 
data for each isoform. 

er. thats it for now from me!

But I also struggle to understand what all the other outputs really mean! What 
does the CDS.diff output tell us? What dies the promoters.diff output tell us? 
I know what the cufflinks manual says but I struggle to convert this in my head 
to what is happening to an actual gene so if anyone has a power point example 
on a specific gene of what the data is saying in terms of how this relates to 
changes in protein production - that would be great! I'm hoping someone out 
there has had to lecture on this to students and they have done a powerpoint 
presentation and are willing to show it to the galaxy community.

Another point about the analysis of cufflinks data is the subject of the Pseudo 
Autosomal Regions in X and Y - this will make a mess of gene expression 
analysis in some cases especially because tophat will assign a read to both 
sites and make it a multihit read (which you might then filter out) or it may 
double the true levels of reported expression.. Anyone had thoughts on this?

Best Wishes,
David.

__
Dr David A. Matthews

Senior Lecturer in Virology
Room E49
Department of Cellular and Molecular Medicine,
School of Medical Sciences
University Walk,
University of Bristol
Bristol.
BS8 1TD
U.K.

Tel. +44 117 3312058
Fax. +44 117 3312091

d.a.matth...@bristol.ac.uk






On 10 Mar 2011, at 15:55, Jeremy Goecks wrote:

 Jagat,
 
 Please send queries such as these to the galaxy-user mailing list (cc'd); 
 there are many users on the list who can contribute to this discussion, and 
 there are many additional users that will benefit from this discussion.
 
 I was wondering if you can point me to a documentation or URL to guide how 
 to perform the downstream analysis once we have cuffdiff out put.
 
 In general, I agree that tools are needed to further process 
 cufflinks/compare/diff outputs, but I'm not aware of any that are publicly 
 available. Let's open this issue up for discussion and see if we can reach a 
 consensus about tools might be useful. Everyone, please feel free to 
 contribute ideas/tools; note that the Galaxy Tool Shed is a nice place for 
 sharing tools you've built for Galaxy:
 
 http://community.g2.bx.psu.edu/
 
 Just like any mRNA-seq experiment to achieve following objectives:
 
 1.   Reconstruct  all transcripts of a particular gene and corresponding 
 Cuffdiff  significantly expressed transcripts as called by cuffdiff.
 2.   What are different isoforms
 3.   Location of splicing
 
 From various output files which unique ID can be matched  from one file say 
 Cuffdiff.expr (transcript/ isoform/Splicing)  to  other file - 
 transcript.gtf  corresponding to each sample or combined GTF file.
 
 I've got a script that does this for the cuffdiff isoform expression testing 
 file and a GTF file; I'll wrap it up and add it to Galaxy in the next couple 
 weeks. It would probably be useful to have similar scripts for the other 
 expression testing files as well. Also, it would be nice to be able to take 
 the FPKM values generated by Cuffdiff and attach them to their respective 
 transcripts as attributes.
 
 Best,
 J. 
 
 ___
 The Galaxy User list should be used for the discussion of
 Galaxy analysis and other features on the public server
 at usegalaxy.org.  Please keep all replies on the list by
 using reply all in your mail client.  For discussion of
 local Galaxy instances and the Galaxy source code, please
 use the Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other Galaxy lists,
 please use the interface at:
 
  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-user] Pseudo Autosomal regions in Chrs X and Y

2011-03-10 Thread David Matthews
Hi All again,

A separate point about the analysis of cufflinks data is the subject of the 
Pseudo Autosomal Regions in X and Y - this will make a mess of gene expression 
analysis in some cases especially because tophat will assign a read to both 
places which therefore makes it a multihit read (which you might then filter 
out) or it may double the true levels of reported expression. Anyone had 
experience/thoughts on this?

Best Wishes,
David.

__
Dr David A. Matthews

Senior Lecturer in Virology
Room E49
Department of Cellular and Molecular Medicine,
School of Medical Sciences
University Walk,
University of Bristol
Bristol.
BS8 1TD
U.K.

Tel. +44 117 3312058
Fax. +44 117 3312091

d.a.matth...@bristol.ac.uk






___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] how to find out the gene_ID correspond to CUFF ID

2011-03-01 Thread David Matthews
Hi,

Yes, column 1 refers to the chromosome name and it must be the same throughout 
(i.e. your hg19 reference file must call the chromosomes 1,2, 3 etc). A simpler 
solution is to use a copy of hg19 that lists the chromosomes as 1, 2, 3 etc 
instead of Chr1, Chr2 etc. Unfortunately I'm only in intermittent contact with 
the web - I might be able to help you properly next week when I am back at 
work. However, I've just publicly shared a history containing a hg19 file, a 
female hg19 (missing chromosome Y) and an ensembl gtf file that all work 
together (i.e. all use the same names for the chromosomes!) called Bristol 
hg19... just look under shared data. However, you will probably need to 
repeat your tophat alignments using your reads and these files together. 

Good luck!

David



On 1 Mar 2011, at 20:06, Ying Zhang wrote:

 Dear Vasu:
 
 thank you for your information!
 
 I have checked the reference and do not find a specific column that include
 chromosome information, do you mean the first column(seqname)? Do you
 happen to
 have one with correct format and I can used for reference annotation? Thanks a
 lot! I onlg have limited experience in computing so I do not know how
 to format
 this file.
 
 Best
 
 Ying
 
 Quoting vasu punj pu...@yahoo.com:
 
 I believe you need to format the Ensemble file Chromosome columns is
 not correct.
  
 Vasu
 
 --- On Tue, 3/1/11, Ying Zhang ying.zhang.yz...@yale.edu wrote:
 
 
 From: Ying Zhang ying.zhang.yz...@yale.edu
 Subject: Re: [galaxy-user] how to find out the gene_ID correspond to CUFF ID
 To: David Matthews d.a.matth...@bristol.ac.uk
 Cc: galaxy-u...@bx.psu.edu
 Date: Tuesday, March 1, 2011, 10:59 AM
 
 
 Dear David:
 
 I followed your advices and downloaded reference sequence  from
 Emsemble, then I
 uploaded this file into galaxy, and then I run the cufflinks using
 the file as a
 reference annotation, however I got error when I am running, the
 following the
 error message gave to me:
 
 An error occurred running this job: cufflinks v0.9.3
 cufflinks -I 30 -F 0.05 -j 0.05 -p 8 -Q 0 -G
 /galaxy/main_database/files/002/122/dataset_2122219.dat -r
 /galaxy/data/hg19/sam_index/hg19.fa
 Error running cufflinks. [11:47:14] Loading reference and sequence.
 GFF warning: mergi
 
 Do you have any idea of what is going wrong here?
 
 Best
 
 Ying
 
 
 Quoting David Matthews d.a.matth...@bristol.ac.uk:
 
 Hi,
 
 Yeah, thats a good idea too!! I did not know about that tool, shows
 what I know (!) - thanks for the info!
 
 Cheers
 David
 
 
 
 On 1 Mar 2011, at 04:51, Jeremy Goecks wrote:
 
 Ying, you could also try using the tools 'Fetch closest
 non-overlapping feature' and 'Intersect' to find genes nearby
 transcripts/genes/TSSes of interest; for both tools, you'll want a
 reference annotation, either from UCSC or Ensembl.
 
 Best,
 J.
 
 On Feb 28, 2011, at 6:10 PM, David Matthews wrote:
 
 Hi,
 
 You need to supply a gene annotation file with cufflink to easily
 get the gene-id information. Without it, cufflinks simply tries
 its best to figure out what genes are present. The ensemble gtf
 file is quite a comprehensive one - there is a link to it on the
 cufflinks manual page.
 
 Good luck!
 David
 
 
 
 On 28 Feb 2011, at 21:33, Ying Zhang wrote:
 
 Dear Everyone:
 
 I have got one output file after I run Cufflink which contain
 gene expression
 information. However, I found out for each gene_ID, it has the
 format like,
 CUFF.1151175, do you have idea of how to find out the offical gene ID
 correspond to this CUFF ID? Thank you very much!
 
 Best
 
 Ying Zhang, M.D., Ph.D.
 Postdoctoral Associate
 Department of Genetics,
 Yale University School of Medicine
 300 Cedar Street,S320
 New Haven, CT 06519
 Tel: (203)737-2616
 Fax: (203)737-2286
 ___
 The Galaxy User list should be used for the discussion
 of Galaxy analysis and other features on the public
 server at usegalaxy.org. For discussion of local Galaxy
 instances and the Galaxy source code, please use the
 Galaxy Development list:
 
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other
 Galaxy lists, please use the interface at:
 
 http://lists.bx.psu.edu/
 
 
 ___
 The Galaxy User list should be used for the discussion
 of Galaxy analysis and other features on the public
 server at usegalaxy.org. For discussion of local Galaxy
 instances and the Galaxy source code, please use the
 Galaxy Development list:
 
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other
 Galaxy lists, please use the interface at:
 
 http://lists.bx.psu.edu/
 
 
 
 
 
 Ying Zhang, M.D., Ph.D.
 Postdoctoral Associate
 Department of Genetics,
 Yale University School of Medicine
 300 Cedar Street,S320
 New Haven, CT 06519
 Tel: (203)737-2616
 Fax: (203)737-2286
 ___
 The Galaxy User list should be used

Re: [galaxy-user] how to find out the gene_ID correspond to CUFF ID

2011-02-28 Thread David Matthews
Hi,

You need to supply a gene annotation file with cufflink to easily get the 
gene-id information. Without it, cufflinks simply tries its best to figure out 
what genes are present. The ensemble gtf file is quite a comprehensive one - 
there is a link to it on the cufflinks manual page.

Good luck!
David



On 28 Feb 2011, at 21:33, Ying Zhang wrote:

 Dear Everyone:
 
 I have got one output file after I run Cufflink which contain gene expression
 information. However, I found out for each gene_ID, it has the format like,
 CUFF.1151175, do you have idea of how to find out the offical gene ID
 correspond to this CUFF ID? Thank you very much!
 
 Best
 
 Ying Zhang, M.D., Ph.D.
 Postdoctoral Associate
 Department of Genetics,
 Yale University School of Medicine
 300 Cedar Street,S320
 New Haven, CT 06519
 Tel: (203)737-2616
 Fax: (203)737-2286
 ___
 The Galaxy User list should be used for the discussion
 of Galaxy analysis and other features on the public
 server at usegalaxy.org. For discussion of local Galaxy
 instances and the Galaxy source code, please use the
 Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other
 Galaxy lists, please use the interface at:
 
  http://lists.bx.psu.edu/


___
The Galaxy User list should be used for the discussion
of Galaxy analysis and other features on the public
server at usegalaxy.org. For discussion of local Galaxy
instances and the Galaxy source code, please use the
Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other
Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-user] RNA seq analysis

2011-02-23 Thread David Matthews
Hi Jeremy,

I thought I'd write to get a discussion of a workflow for people doing RNA seq 
that I have found very useful and addresses some issues in mapping mRNA derived 
RNA-seq paired end data to the genome using tophat. Here is the approach I use 
(I have a human mRNA sample deep sequenced with a 56bp paired end read on an 
illumina generating 29 million reads):

1. Align to hg19 (in my case) using tophat and allowing up to 40 hits for each 
sequence read
2. In samtools filter for read is unmapped, mate is mapped and mate is 
mapped in a proper pair
3. Use group to group the filtered sam file on c1 (which is the 
bio-sequencer read number) and set an operation to count on c1 as well. This 
provides a list of the reads and how many times they map to the human genome, 
because you have filtered the set for reads that have a mate pair there will be 
an even number for each read. For most of the reads the number will be 2 
(indicating the forward read maps once and the reverse read maps once and in a 
proper pair) but for reads that map ambiguously the number will be multiples of 
2. If you count these up I find that 18 million reads map once, 1.3 million map 
twice, 400,000 reads map 3 times and so on until you get down to 1 read mapping 
30 times, 1 read mapping 31 times and so on...
4. Filter the reads to remove any reads that map more than 2 times.
5. Use compare two datasets to compare your new list of reads that map only 
twice to pull out all the reads in your sam file that only map twice (i.e. the 
mate pairs).
6. You'll need to sort the sam file before you can use it with other 
applications like IGV.

What you end up with is a sam file where all the reads map to one site only and 
all the reads map as a proper pair. This may seem similar to setting tophat to 
ignore non-unique reads. However, it is not. This approach gives you 10-15% 
more reads. I think it is because if tophat finds (for example) that the 
forward read maps to one site but the reverse read maps to two sites it throws 
away the whole read. By filtering the sam file to restrict it to only those 
mappings that make sense you increase the number of unique reads by getting rid 
of irrational mappings.

Has anyone else found this? Does this make sense to anyone else? Am I making a 
huge mistake somewhere?

A nice aspect of this (or at least I think so!) is that by filtering in this 
manner you can also create a sam file of non-unique mappings which you can 
monitor. This can be useful if one or more genes has a problem of generating a 
lot of non-unique maps which may give problems accurately estimating its 
expression. Also, you also get a list of how many multi hits you have in your 
data so you know the scale of the problem.

Best Wishes,
David.

__
Dr David A. Matthews

Senior Lecturer in Virology
Room E49
Department of Cellular and Molecular Medicine,
School of Medical Sciences
University Walk,
University of Bristol
Bristol.
BS8 1TD
U.K.

Tel. +44 117 3312058
Fax. +44 117 3312091

d.a.matth...@bristol.ac.uk




___
The Galaxy User list should be used for the discussion
of Galaxy analysis and other features on the public
server at usegalaxy.org. For discussion of local Galaxy
instances and the Galaxy source code, please use the
Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other
Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] get wig file after tophat

2011-02-22 Thread David Matthews
HI,

The option you need in IGV tools is count. You set a window size and this 
gives you a tdf file from your sorted bam (or sam) file which is nice and quick 
to view on IGV.


Best Wishes,
David.

__
Dr David A. Matthews

Senior Lecturer in Virology
Room E49
Department of Cellular and Molecular Medicine,
School of Medical Sciences
University Walk,
University of Bristol
Bristol.
BS8 1TD
U.K.

Tel. +44 117 3312058
Fax. +44 117 3312091

d.a.matth...@bristol.ac.uk






On 22 Feb 2011, at 15:52, Ying Zhang wrote:

 Dear David:
 
 thank you very much for helping me!
 
 I have download the IGV and I do find the IGVtools, however, I am not sure 
 which
 tool I should use for generate a tdf file, the tile function will generate a 
 tdf
 file, but the input file format does not include bam or sam file, instead it
 need wig file. But I have no wig file to put in. So I am wondering whether you
 need to use other tool first. I really appreciate your help! Thank you very
 much!
 
 Best
 
 Ying
 
 Quoting David Matthews d.a.matth...@bristol.ac.uk:
 
 Hi,
 
 You can get an equivalent visualisation from the IGV viewer by the Broad 
 Institute - its under IGV tools and generates a tdf file from bam or sam 
 files. This also gives a quick and easy way of looking at depth at any 
 particular site and is very accessible.
 
 Cheers
 David
 
 
 On 21 Feb 2011, at 21:44, Jeremy Goecks wrote:
 
 Hi all,
 
 Ann is correct - Tophat does not produce .wig files when run anymore. 
 However, it's fairly easy to use Galaxy to make a wiggle-like coverage file 
 from a BAM file:
 
 (a) run the pileup tool on your BAM to create a pileup file;
 (b) cut columns 1 and 4 to get your coverage file.
 
 A final note: it's often difficult to visualize coverage files because 
 they're so large. You might be better off visualizing the BAM file and 
 using the coverage file for statistics.
 
 Best,
 J.
 
 Hello,
 
 I think I know the answer (sort of) to this question.
 
 This may be because newer versions of tophat stopped running the wiggles
 program, which is still part of the tophat distribution and is the program
 that makes the coverage.wig file.
 
 A later version of tophat might bring this back, however - there's a note 
 to
 this effect in the tophat python code.
 
 So if you can run wiggles, you can make the coverage.wig file on your 
 own.
 
 A student here at UNC Charlotte (Adam Baxter) made a few changes to the
 wiggles source code that would allow you to use it with samtools to make 
 a
 coverage.wig file from the accepted_hits.bam file that TopHat creates.
 
 If you (or anyone else) would like a copy, please email Adam, who is cc'ed
 on this email.
 
 We would be happy to help add it to Galaxy if this would be of interest to
 you or other Galaxy users.
 
 If there is any way we can be of assistance, please let us know!
 
 Very best wishes,
 
 Ann Loraine
 
 
 On 2/21/11 3:39 PM, Ying Zhang ying.zhang.yz...@yale.edu wrote:
 
 Hi:
 
 I am using tophat in galaxy to analyze my paired-end RNA-seq data and 
 find out
 that after the tophat analysis, we can not get the wig file from it 
 anymore
 which is used to be able to. Do you have any idea of how to still be able 
 to
 get the wig file after tophat analysis? Thanks a lot!
 
 Best
 
 Ying Zhang, M.D., Ph.D.
 Postdoctoral Associate
 Department of Genetics,
 Yale University School of Medicine
 300 Cedar Street,S320
 New Haven, CT 06519
 Tel: (203)737-2616
 Fax: (203)737-2286
 ___
 The Galaxy User list should be used for the discussion
 of Galaxy analysis and other features on the public
 server at usegalaxy.org. For discussion of local Galaxy
 instances and the Galaxy source code, please use the
 Galaxy Development list:
 
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other
 Galaxy lists, please use the interface at:
 
 http://lists.bx.psu.edu/
 
 --
 Ann Loraine
 Associate Professor
 Dept. of Bioinformatics and Genomics, UNCC
 North Carolina Research Campus
 600 Laureate Way
 Kannapolis, NC 28081
 704-250-5750
 www.transvar.org
 
 
 ___
 The Galaxy User list should be used for the discussion
 of Galaxy analysis and other features on the public
 server at usegalaxy.org. For discussion of local Galaxy
 instances and the Galaxy source code, please use the
 Galaxy Development list:
 
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other
 Galaxy lists, please use the interface at:
 
 http://lists.bx.psu.edu/
 
 
 ___
 The Galaxy User list should be used for the discussion
 of Galaxy analysis and other features on the public
 server at usegalaxy.org. For discussion of local Galaxy
 instances and the Galaxy source code, please use the
 Galaxy Development list:
 
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions

Re: [galaxy-user] get wig file after tophat

2011-02-21 Thread David Matthews
Hi,

You can get an equivalent visualisation from the IGV viewer by the Broad 
Institute - its under IGV tools and generates a tdf file from bam or sam files. 
This also gives a quick and easy way of looking at depth at any particular site 
and is very accessible.

Cheers
David


On 21 Feb 2011, at 21:44, Jeremy Goecks wrote:

 Hi all,
 
 Ann is correct - Tophat does not produce .wig files when run anymore. 
 However, it's fairly easy to use Galaxy to make a wiggle-like coverage file 
 from a BAM file:
 
 (a) run the pileup tool on your BAM to create a pileup file;
 (b) cut columns 1 and 4 to get your coverage file.
 
 A final note: it's often difficult to visualize coverage files because 
 they're so large. You might be better off visualizing the BAM file and using 
 the coverage file for statistics.
 
 Best,
 J.
 
 Hello,
 
 I think I know the answer (sort of) to this question.
 
 This may be because newer versions of tophat stopped running the wiggles
 program, which is still part of the tophat distribution and is the program
 that makes the coverage.wig file.
 
 A later version of tophat might bring this back, however - there's a note to
 this effect in the tophat python code.
 
 So if you can run wiggles, you can make the coverage.wig file on your own.
 
 A student here at UNC Charlotte (Adam Baxter) made a few changes to the
 wiggles source code that would allow you to use it with samtools to make a
 coverage.wig file from the accepted_hits.bam file that TopHat creates.
 
 If you (or anyone else) would like a copy, please email Adam, who is cc'ed
 on this email.
 
 We would be happy to help add it to Galaxy if this would be of interest to
 you or other Galaxy users.
 
 If there is any way we can be of assistance, please let us know!
 
 Very best wishes,
 
 Ann Loraine
 
 
 On 2/21/11 3:39 PM, Ying Zhang ying.zhang.yz...@yale.edu wrote:
 
 Hi:
 
 I am using tophat in galaxy to analyze my paired-end RNA-seq data and find 
 out
 that after the tophat analysis, we can not get the wig file from it anymore
 which is used to be able to. Do you have any idea of how to still be able to
 get the wig file after tophat analysis? Thanks a lot!
 
 Best
 
 Ying Zhang, M.D., Ph.D.
 Postdoctoral Associate
 Department of Genetics,
 Yale University School of Medicine
 300 Cedar Street,S320
 New Haven, CT 06519
 Tel: (203)737-2616
 Fax: (203)737-2286
 ___
 The Galaxy User list should be used for the discussion
 of Galaxy analysis and other features on the public
 server at usegalaxy.org. For discussion of local Galaxy
 instances and the Galaxy source code, please use the
 Galaxy Development list:
 
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other
 Galaxy lists, please use the interface at:
 
 http://lists.bx.psu.edu/
 
 -- 
 Ann Loraine
 Associate Professor
 Dept. of Bioinformatics and Genomics, UNCC
 North Carolina Research Campus
 600 Laureate Way
 Kannapolis, NC 28081
 704-250-5750
 www.transvar.org
 
 
 ___
 The Galaxy User list should be used for the discussion
 of Galaxy analysis and other features on the public
 server at usegalaxy.org. For discussion of local Galaxy
 instances and the Galaxy source code, please use the
 Galaxy Development list:
 
 http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other
 Galaxy lists, please use the interface at:
 
 http://lists.bx.psu.edu/
 
 
 ___
 The Galaxy User list should be used for the discussion
 of Galaxy analysis and other features on the public
 server at usegalaxy.org. For discussion of local Galaxy
 instances and the Galaxy source code, please use the
 Galaxy Development list:
 
  http://lists.bx.psu.edu/listinfo/galaxy-dev
 
 To manage your subscriptions to this and other
 Galaxy lists, please use the interface at:
 
  http://lists.bx.psu.edu/


___
The Galaxy User list should be used for the discussion
of Galaxy analysis and other features on the public
server at usegalaxy.org. For discussion of local Galaxy
instances and the Galaxy source code, please use the
Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other
Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-user] Stalled cuffdiff run

2011-01-21 Thread David Matthews
Hi Jeremy,

I have a stalled cufflinks run - its been queued all day - any idea why its 
stalled?

Cheers
David



__
Dr David A. Matthews

Senior Lecturer in Virology
Room E49
Department of Cellular and Molecular Medicine,
School of Medical Sciences
University Walk,
University of Bristol
Bristol.
BS8 1TD
U.K.

Tel. +44 117 3312058

d.a.matth...@bristol.ac.uk




___
galaxy-user mailing list
galaxy-user@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-user