[galaxy-user] Mapping to only 3 genes / targeted resequencing / SOLiD4 / short reads
Hi! Following situation: 10 barcoded samples. Each sample consists of a mix of the sequences 3 independent genes (á 2 alleles). I would like to map the SOLiD4 reads only to the sequences of those 3 genes, patient by patient. First, the 10 barcoded samples have to be separated from each other. Then, the short reads have to be mapped to the sequences of the 3 genes, which are available in FASTA-format (single) or multi-FASTA-format (all sequences in one file). Is this possible using the available GALAXY tools? How? Thank you in advance. Jose ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-user] CpG Masking changing alignment length?
Hi All, I have a general question about CpG masking. I have a .maf file, when I use the maf to fasta tool, it gives me an alignment of 2,735,329 bp. But if I CpG mask the .maf file (restricted definition) then I use the maf to fasta tool, I get a very different alignment length, 5,572,544 bp. It would be great to know what is the cause of these differences. THANKS! Mike ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] CpG Masking changing alignment length?
Hi, Sorry, a correction (I had the lengths reversed): Original .maf maf to fasta length is 5572544 Original .maf CpG mask maf to fasta length is 2735329 Thanks, Mike On Thu, Mar 10, 2011 at 8:53 AM, Michael E. Steiper michaelstei...@gmail.com wrote: Hi All, I have a general question about CpG masking. I have a .maf file, when I use the maf to fasta tool, it gives me an alignment of 2,735,329 bp. But if I CpG mask the .maf file (restricted definition) then I use the maf to fasta tool, I get a very different alignment length, 5,572,544 bp. It would be great to know what is the cause of these differences. THANKS! Mike ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Fwd: Question about FTP
Hi Noushin, For unix ftp, use the same information that you use when logging into your Galaxy account on main. Perhaps the problem is the password? Should also be the same as the one you use for your account on Galaxy main, too. Using the same user/pass is how we put the files into your Get Data - Upload area. For example: unix% ftp main.g2.bx.psu.edu bunch of ftp output Name (main.g2.bx.psu.edu:local_you): you@your_email.edu 331 Password required for you@your_email.edu Password: 230 User you@your_email.edu logged in Remote system type is UNIX. Using binary mode to transfer files. ftp Now, use get or mget or whatever ftp commands you want to use. Hopefully this helps, but please let us know if you need more help, Jen Galaxy team On 3/7/11 8:00 AM, Cathy Riemer wrote: - Forwarded message from Noushin Ghaffarinoushin.ghaff...@gmail.com - Date: Mon, 7 Mar 2011 09:09:13 -0600 From: Noushin Ghaffarinoushin.ghaff...@gmail.com To: dbad...@bio.cse.psu.edu Subject: Question about FTP Dear Galaxy team, Firstly, thank you very much for the great tools. I am working on a large dataset and need to upload it to Galaxy via FTP. I used ftp://main.g2.bx.psu.edu but I cannot login. I used my email, my public name on Galaxy and just simple my name, but none of them worked. Can you please help me to know how can I login to upload my file? Here is my inofrmation: email: noushin.ghaff...@gmail.com public name: me-on-galaxy I appreciate you help and time in advance. Noushin - End forwarded message - ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-user] Export data to filesystem
Hello, I was wondering if there was a way to export a dataset to the file system? Basically I think it would be advantageous if someone could copy a dataset to an export folder, they could then FTP this data away or work with it locally? Thanks for the help, James ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] downstream analysis of cuffdiff out put
Jagat, Please send queries such as these to the galaxy-user mailing list (cc'd); there are many users on the list who can contribute to this discussion, and there are many additional users that will benefit from this discussion. I was wondering if you can point me to a documentation or URL to guide how to perform the downstream analysis once we have cuffdiff out put. In general, I agree that tools are needed to further process cufflinks/compare/diff outputs, but I'm not aware of any that are publicly available. Let's open this issue up for discussion and see if we can reach a consensus about tools might be useful. Everyone, please feel free to contribute ideas/tools; note that the Galaxy Tool Shed is a nice place for sharing tools you've built for Galaxy: http://community.g2.bx.psu.edu/ Just like any mRNA-seq experiment to achieve following objectives: 1. Reconstruct all transcripts of a particular gene and corresponding Cuffdiff significantly expressed transcripts as called by cuffdiff. 2. What are different isoforms 3. Location of splicing From various output files which unique ID can be matched from one file say Cuffdiff.expr (transcript/ isoform/Splicing) to other file - transcript.gtf corresponding to each sample or combined GTF file. I've got a script that does this for the cuffdiff isoform expression testing file and a GTF file; I'll wrap it up and add it to Galaxy in the next couple weeks. It would probably be useful to have similar scripts for the other expression testing files as well. Also, it would be nice to be able to take the FPKM values generated by Cuffdiff and attach them to their respective transcripts as attributes. Best, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-user] Question regarding quality filtering of 454 amplicons
Hi, I have a question for you guys regarding quality filtering. I have a data set of double MID tagged 454 amplicons, from which I wish to select high quality sequences above Q20. The 454 quality filtering system seems to work differently from that given for the Illumina sequencing i.e. 454 filtering takes high quality segments, while Illumina (FASTQ) can select high quality full reads based on certain parameters. OK, so I know that the total length of my amplicon, including primers and barcodes is around 260bp. If I then set the 454 quality filtering tool to extract contiguous high quality sequence of 260, it gives me back around 45% of my raw data as hitting this criterion i.e. All 260bp are above Q20. I don¹t necessarily need this high stringency as most bases may not be informative. But if I convert my 454 data to FASTQ format and then run the Illumina filtering system which also allows me to set the number of bases allowed to deviate from the Q20 criteria, I get back over 90% of my data (allowing 10bp to deviate from Q20). I then need to go ahead and convert back to 454 format. Can you tell me if this is OK? Will I loose /confuse information somewhere along these conversions? It seems that if I do this, my barcodes are removed, as amplicons do not sort properly when I parse them through my barcode filtering program. Does anyone know of a program to filter 454 data based on average sequence quality score, which doesn¹t involve Linux and the Roche off instrument program (I have no experience in Linux! ) Thanks! -- Jack Lighten, Ph.D. Candidate, Bentzen Lab, Room 6078, Department of Biology, Dalhousie University, Halifax, NS, B3H 4J1 Canada Office:(902) 494-1398 Email: jackie.ligh...@dal.ca Profile: www.marinebiodiversity.ca/CHONe/Members/lightenj/profile/bio ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] downstream analysis of cuffdiff out put
Hi All, I agree with this problem and solution. I have a lot of cufflinks, cuffcompare and cuffdiff output but I am struggling to relate what this means in terms of the real world! I have seen partek software attempt to visualise some of the data it generates which appears to be using the FMI data in the cufflinks suite but beyond that I struggle. I did have an email conversation with Cole Trapnell which eventually centred on the idea that you just have to trust the analysis and then go away and do the RT-PCR to check it all out! So for tools I think: 1. A tool that shows you the layout of known isoforms for a gene and the FMI data for each isoform. er. thats it for now from me! But I also struggle to understand what all the other outputs really mean! What does the CDS.diff output tell us? What dies the promoters.diff output tell us? I know what the cufflinks manual says but I struggle to convert this in my head to what is happening to an actual gene so if anyone has a power point example on a specific gene of what the data is saying in terms of how this relates to changes in protein production - that would be great! I'm hoping someone out there has had to lecture on this to students and they have done a powerpoint presentation and are willing to show it to the galaxy community. Another point about the analysis of cufflinks data is the subject of the Pseudo Autosomal Regions in X and Y - this will make a mess of gene expression analysis in some cases especially because tophat will assign a read to both sites and make it a multihit read (which you might then filter out) or it may double the true levels of reported expression.. Anyone had thoughts on this? Best Wishes, David. __ Dr David A. Matthews Senior Lecturer in Virology Room E49 Department of Cellular and Molecular Medicine, School of Medical Sciences University Walk, University of Bristol Bristol. BS8 1TD U.K. Tel. +44 117 3312058 Fax. +44 117 3312091 d.a.matth...@bristol.ac.uk On 10 Mar 2011, at 15:55, Jeremy Goecks wrote: Jagat, Please send queries such as these to the galaxy-user mailing list (cc'd); there are many users on the list who can contribute to this discussion, and there are many additional users that will benefit from this discussion. I was wondering if you can point me to a documentation or URL to guide how to perform the downstream analysis once we have cuffdiff out put. In general, I agree that tools are needed to further process cufflinks/compare/diff outputs, but I'm not aware of any that are publicly available. Let's open this issue up for discussion and see if we can reach a consensus about tools might be useful. Everyone, please feel free to contribute ideas/tools; note that the Galaxy Tool Shed is a nice place for sharing tools you've built for Galaxy: http://community.g2.bx.psu.edu/ Just like any mRNA-seq experiment to achieve following objectives: 1. Reconstruct all transcripts of a particular gene and corresponding Cuffdiff significantly expressed transcripts as called by cuffdiff. 2. What are different isoforms 3. Location of splicing From various output files which unique ID can be matched from one file say Cuffdiff.expr (transcript/ isoform/Splicing) to other file - transcript.gtf corresponding to each sample or combined GTF file. I've got a script that does this for the cuffdiff isoform expression testing file and a GTF file; I'll wrap it up and add it to Galaxy in the next couple weeks. It would probably be useful to have similar scripts for the other expression testing files as well. Also, it would be nice to be able to take the FPKM values generated by Cuffdiff and attach them to their respective transcripts as attributes. Best, J. ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-user] Pseudo Autosomal regions in Chrs X and Y
Hi All again, A separate point about the analysis of cufflinks data is the subject of the Pseudo Autosomal Regions in X and Y - this will make a mess of gene expression analysis in some cases especially because tophat will assign a read to both places which therefore makes it a multihit read (which you might then filter out) or it may double the true levels of reported expression. Anyone had experience/thoughts on this? Best Wishes, David. __ Dr David A. Matthews Senior Lecturer in Virology Room E49 Department of Cellular and Molecular Medicine, School of Medical Sciences University Walk, University of Bristol Bristol. BS8 1TD U.K. Tel. +44 117 3312058 Fax. +44 117 3312091 d.a.matth...@bristol.ac.uk ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Import data in RGenetics
Hi Sylvain, This issue has been fixed in changeset 5207:72d560d3e7fd and will be available the next time that the main server is updated, which should be within the next few weeks. Thanks for reporting this error and please let us know if we can provide additional assistance. Thanks for using Galaxy, Dan On Feb 3, 2011, at 6:05 PM, Sylvain Baulande wrote: Dear Ross, Thank you form your prompt answer. unfortunately I still get an error message which is : An error occurred running this job: A required composite data file was not provided (RgeneticsData.ped) I did exactly what you mentioned except that my ped and map files have been uploaded using the ftp procedure. Do you have any clues ? Thank you so much for your help, Sylvain 2011/2/3 Ross ross.laza...@gmail.com Hi, Sylvian, The plink/rgenetics lped and pbed (compressed) formats are special 'composite' Galaxy datatypes because the map and pedigree/genotype files need to be kept together correctly inside Galaxy. As a result, the upload tool requires that the file type be specified so all of the components can be properly uploaded and stored together. For example, to upload pbed data from your local desktop, choose 'Upload file' from the Get Data tools. When the upload form appears, the trick is that you *must* change the default 'Autodetect' in the first (filetype) select box to the specific rgenetics datatype - either 'pbed' as the format for compressed plink data (or 'lped' for uncompressed plink genotype data) as the very first step. Type the first few letters into the first box, and select the right one from the list that appears. Once this is done, you will see that the upload tool form will change to show three separate file upload inputs - one each for the plink xxx.bim xxx.bed and xxx.fam where xxx is the name you set when you ran plink to create the files, or for uncompressed linkage format two separate file upload inputs - the plink .ped and .map files. Now you can browse for the corresponding file for each input box from your local machine - be careful not to mix them up as the upload tool is unable to tell unfortunately. At the bottom of the form, I suggest you then change the genome build to the appropriate one (eg hg18 or hg19). Finally, I'd recommend that you change the 'metadata value for basename' (which will be the new dataset name) to something that will remind you what the data are - something more meaningful than the default 'rgenetics'. Click 'execute' to upload the data and create the new dataset in your history. Compressed (pbed) format is preferred so the upload is quicker. Note that some tools will autoconvert between lped and pbed so there is a delay the first time some tools are run on a new dataset. There are built in converters (use the pencil icon) also if you need them. I hope this helps - thanks for using Galaxy and Rgenetics - please let us know how you go and feel free to contact me if you have other questions. On Fri, Feb 4, 2011 at 6:20 AM, BAULANDE Sylvain 211527 Partnerchip sylvain.baula...@cea.fr wrote: dear Galaxy users, I would like to import genotyping data in Rgenetics and I can't succeed. I have ped file and map file, I try to import them in lped format but it didn't work ... Anybody with experience can help me to solve this issue ? Many thanks in advance, Best regards, Sylvain ___ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user -- Ross Lazarus MBBS MPH Associate Professor, HMS; Director of Bioinformatics, Channing Laboratory; 181 Longwood Ave., Boston MA 02115, USA. Tel: +1 617 505 4850 Head, Medical Bioinformatics, BakerIDI; PO Box 6492, St Kilda Rd Central; Melbourne, VIC 8008, Australia; Tel: +61 385321444 ___ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Problems with the Groomer
Hi Felix, Is this file still an issue? Or have you identified the sequences with a mismatch between the sequence and qual score length? We can always take a look, too. Just share a history link and note which dataset is giving the error. (Options - Share or Publish). You can email just to me to keep your data private and I can share with the developers here if needed. Thanks! Jen Galaxy team On 2/16/11 6:21 AM, Felix Hammer wrote: Hi, I'm experiencing some strange problems with the fastq groomer. Trying to groom my files I get the following error: Traceback (most recent call last): File /galaxy/home/g2main/galaxy_main/tools/fastq/fastq_groomer.py, line 37, in if __name__ == __main__: main() File /galaxy/home/g2main/galaxy_main/tools/fastq/fastq_groomer.py, line 18, in main for read_count, fastq_read in enumerate( fastqReader( open( input_filename ), format = input_type ) ): File /galaxy/home/g2main/galaxy_main/lib/galaxy_utils/sequence/fastq.py, line 452, in __iter__ yield self.next() File /galaxy/home/g2main/galaxy_main/lib/galaxy_utils/sequence/fastq.py, line 448, in next rval.assert_sequence_quality_lengths() File /galaxy/home/g2main/galaxy_main/lib/galaxy_utils/sequence/fastq.py, line 142, in assert_sequence_quality_lengths assert qual_len == seq_len, Invalid FASTQ file: quality score length (%i) does not match sequence length (%i) % ( qual_len, seq_len ) AssertionError: Invalid FASTQ file: quality score length (63) does not match sequence length (36) I've double checked the file and it should be ok. Any ideas? thx, Felix ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Cleaning of fastq files
Felix, Great that you solved the issue, we appreciate your letting us know! Would you like to open a request at bitbucket for adding in the tool? https://bitbucket.org/galaxy/galaxy-central/issues?status=newstatus=open Or, I can open a ticket if you would like, just let me know. (Apologies if you already opened one, I searched and didn't find a ticket for this). Thanks! Jen On 3/9/11 4:32 AM, Felix Hammer wrote: Hey Jen, I've already solved the cleaning problem using Seqclean. Seqclean only takes fasta as input. So if you are dealing with fastq files, you have to split them into quality and fasta, clean the fasta file, trim the quality strings by yourself and put everything back together ... (If someone knows a better solution, plz tell me!) It would be really cool if there was a Galaxy Tool that just takes fastq and cleans it. Thx, Felix Hello Felix, The tools under NGS: QC and manipulation - Generic FASTQ manipulation should be able to help, in particular Manipulate FASTQ reads on various attributes will allow you to enter a regular expression that could trim poly-A tails (the same way a perl script could, for example). The tool has link to more documentation about how to construct expressions). Or, if you know the length of the insert sequence you want to retain, Filter FASTA would be a good choice. Please give these a try and let us know if we can help more. Best! Jen Galaxy team On 2/23/11 4:34 AM, Felix Hammer wrote: Hi, is there a way to clean fastq files (filter Poly-A etc.) with Galaxy? Haven't found anything so far. Also if you generally know good tools plz answer. Have seen lots of stuff for fasta and qual files but not for fastq. Thx, Felix ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/