[galaxy-user] Unable to upload data via ftp site
Hi I am a registered user of Galaxy. I am trying to load my plant species genome (Cicer arietinum) via FTP using my registered account varsha.parde...@rmit.edu.au and password. However, the connection has failed several times. Kindly help me to solve this problem I also request you to load Cicer arietinum genome sequence in galaxy dataset The sequence is available http://www.ncbi.nlm.nih.gov/assembly/525138/ Thank you Kind regards Varsha -- Dr. Varsha Pardeshi Research Fellow Health Innovations Research Institute School of Applied Sciences RMIT University Building 223, Level 1 Plenty Road, Bundoora. Victoria. 3083. Australia. Lab.: +61 3 99257140, Office: +61 3 99257113 Fax.: +61 3 9925 7110 Mobile: +61 3 0416183650 ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-user] bam to bigwig
Hello, I am trying to convert a BAM file to Bigwig using the Convert Format option under Attributes (I click on the pencil next to the file name in my history) The conversion fails with the error message: 11L3_v3 is not found in chromosome sizes file. 11L3_v3 is a genomic sequence ID for the genome that the BAM file represents. The genome I need is not in the list of Database/build option in Galaxy. How do I get my conversion to work? I have uploaded the fasta file for my genome into my history but I do not see a way to point the conversion tool to that file. Am I on the right track? Cheers from a Galaxy Newbie! Susanne ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
[galaxy-user] Help
Hi all, we are two new galaxy users. We have developed 2 new tools and we would connect them into a new workflow. We are able to import both tools and to link them into a workflow but we aren't able to pass the output of the first tool as the input of the second tool. The first tool calls a bash script that produces a simple string (this is the path of the file generated by the script). This is the xnl file of the first tool: tool id=infnTools_ConcatenateArgumentsTool name=Concatenate Arguments and generate file Tool descriptionConcatenate arguments strings and generate file/description command interpreter=bashconcatenateArgumentsAndPutFile.sh $inputArguments/command inputs param name=inputArguments type=text label=ARGUMENTS optional=false/ /inputs outputs data format=string name=output / /outputs /tool This is the bash script of the first tool (concatenateArgumentsAndPutFile.sh): #!/bin/bash echo ARGUMENTS: $@ export PathFile=/tmp/$RANDOM$RANDOM echo $@ $PathFile echo PathFile: $PathFile The xml of the second tool is the following: tool id=infnTools_InsertBiomasTools name=InsertJobs and check the status of Biomas descriptionInsertJobs Biomas Tool and check the status/description command interpreter=bashinsertAndCheckBiomasJobs.sh $input /command inputs param format=string name=input type=data label=Insert path file/ /inputs outputs data format=tabular name=output/ /outputs /tool When we run the workflow the output of the first tool isn't seen as input of the second tool. Into the galaxy history we see this value for the input of the second tool: /home/pasquale/galaxy-dist/database/files/000/dataset_83.dat Also this file is emtpy. How we can resolve the problem? Thanks and best regard Pasquale Alfonso Dott. Pasquale Notarangelo INFN Istituto Nazionale di Fisica Nucleare - Sezione Bari Via Orabona, 4 - 70126 Bari, Italy Tel. ufficio: +39 080-5443194 Interno ufficio: 3194 Mail: pasquale.notarang...@ba.infn.it Skype: pasquale.notarangelo_1985 Msn: pasqualenotarang...@hotmail.it Gmail: notarangelo@gmail.com ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] all FPKMs are 0 in the tmap files produced by cuffcompare
Hello, It looks like the data is mapping as novel - not linked with the reference annotation. There can be a few factors that can cause this to occur for part of a dataset (often desirable) but when it occurs for an entire dataset, there is often a data mismatch or parameter issue. The first item I always check is that the reference genomes are a match between inputs. Do this by confirming that the identifiers in the reference GFF file are the same as those in the Tophat BAM output (convert to SAM, with headers, to see the chromosome names). For the GFF file, the tool Join, Subtract and Group - Group on the first column, chromosome name, with the action count distinct will isolate these. But the real problem could be in the parameters, see below: On 1/11/14 10:43 PM, Yang Bi wrote: Dear all: I am new to Galaxy and I followed online tutorials/tips to analyze my RNA seq data for alternative splicing. I used tophat for illumina to align my sequencing data after QC/filtering. Other than setting min intron to 20, I used the default settings. Then I feed the accepted hit files to cufflink. I set Min isoform fraction to 0, use annotation (tair10 gff3) as guide and choose yes for perform bias correction (locally cached tair10). My guess is that this Cufflinks run had the same issue - have you checked it? The 'Min isoform fraction' set to 0 may be problematic (I have never run Cufflinks this way). It may seem that this is a setting that is permissive - to capture even very small expression levels - but it may have had the reverse effect of not assigning any reads. (The Tophat run with min intron at 20 is pretty low/sensitive - but with a smaller genome this probably will not cause memory issues with the mapping. Was this set based on the genome having transcripts with known, characterized introns this short? I didn't check, but you can in the reference GFF file.). Maybe double check the above Cufflinks run, confirm the results were as expected, then try the default in Cufflinks to see how that works out (0.1)? As a first pass test? If you want to make this more sensitive in subsequent run, you could try 0.01 - although how significant those results are, given this genome and your specific input data, would need to be evaluated. After that, if you are still having trouble, please feel free to share a history link and we can try to help (copy and email a share link from the public server, direct to me, to keep your data private). Here is how: https://wiki.galaxyproject.org/Support#Shared_and_Published_data Hopefully the parameter change works, or a reference genome issue is found and corrected, but if not, I'll watch for your email, Jen Galaxy team I merged the assembled transcripts with cuffmerge and use cuffcompare to compare the resultant merged assembled transcript to the reference annotation file tair10 gff3. I choose yes for use sequence data and locally cached tair10 as the reference list. I get this for the transcript accuracy analysis: # Cuffcompare v2.1.1 | Command line was: #cuffcompare -o cc_output -r /galaxy-repl/main/files/007/386/dataset_7386886.dat -s /galaxy/data/Arabidopsis_thaliana_TAIR10/sam_index/Arabidopsis_thaliana_TAIR10.fa ./input1 # #= Summary for dataset: ./input1 : # Query mRNAs : 72778 in 51779 loci (57559 multi-exon transcripts) #(12679 multi-transcript loci, ~1.4 transcripts per locus) # Reference mRNAs : 42163 in 33350 loci (30127 multi-exon) # Corresponding super-loci: 33140 #| Sn | Sp | fSn | fSp Base level:100.062.7 - - Exon level:104.659.5 100.060.5 Intron level:100.055.5 100.056.5 Intron chain level: 98.351.5 100.060.3 Transcript level: 98.757.294.854.9 Locus level: 99.464.0 100.064.1 Matching intron chains: 29618 Matching loci: 33147 Missed exons: 1/169820 ( 0.0%) Novel exons: 128021/298149 ( 42.9%) Missed introns: 0/127896 ( 0.0%) Novel introns: 102614/230568 ( 44.5%) Missed loci: 1/33350 ( 0.0%) Novel loci:2962/51779 ( 5.7%) Total union super-loci across all input datasets: 51779 For the tmap file, all my FPKMs are 0: ref_gene_id ref_id class_code cuff_gene_idcuff_id FMI FPKM FPKM_conf_loFPKM_conf_hicov len major_iso_idref_match_len AT1G01010 AT1G01010.1 = AT1G01010 TCONS_0001 0 0.000.000.000.001688 TCONS_0001 1688 AT1G01040 AT1G01040.1 = AT1G01040 TCONS_0002 0 0.000.000.000.006251 TCONS_0002 6251 AT1G01040 AT1G01040.2 = AT1G01040 TCONS_0003 0 0.00
Re: [galaxy-user] bam to bigwig
Hello Susanne, First, add the genome to the list of Custom Builds for your account. The form to do this is under User - Custom Builds. The .fasta version of the genome is one entry option, so go ahead and use that. Pick a name and a unique key that will not conflict with other genomes already in Galaxy (a full list can be viewed by clicking on the link around the middle of this form, Show loaded, system-installed builds). Once the load execution is started, this will take some time to process - how long depends roughly on the size of the genome. After added, you will be able to assign the build to datasets just like any other builds that are system-installed. Assign this to your dataset, then try the tool again. I am also running a test as a double check that there are no problems with the method (have not attempted this since our move to the new hardware a few months ago), but do not anticipate problems. Should an issue occur, I will write you to follow up. Meanwhile, you should go ahead and proceed as well. Having your custom genome set up this way is useful for other reasons (visualization, general data tracking, etc.). Best, Jen Galaxy team On 1/13/14 6:56 AM, Susanne Warrenfeltz wrote: Hello, I am trying to convert a BAM file to Bigwig using the Convert Format option under Attributes (I click on the pencil next to the file name in my history) The conversion fails with the error message: 11L3_v3 is not found in chromosome sizes file. 11L3_v3 is a genomic sequence ID for the genome that the BAM file represents. The genome I need is not in the list of Database/build option in Galaxy. How do I get my conversion to work? I have uploaded the fasta file for my genome into my history but I do not see a way to point the conversion tool to that file. Am I on the right track? Cheers from a Galaxy Newbie! Susanne ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] Unable to upload data via ftp site
Hi Varsha, With regard to the genome addition, I have added this to the genome request ticket in Trello to be reviewed. Currently, a large batch of genomes is undergoing a significant processing cycle for near-term release, and another is already scheduled to start in the beginning of Feb (most are not in this particular ticket, but will be announced when released). This means that the time line is on the order of a few months for those marked as consider (e.g. review) in the ticket. https://trello.com/c/kzVklAIE But, the best news is that you do not have to wait for us. The genome can be used with nearly all tools as a custom genome right away. Instructions for prep and usage are included here in our wiki: https://wiki.galaxyproject.org/Support#Custom_reference_genome Take care, Jen Galaxy team On 1/12/14 4:41 PM, Varsha Pardeshi wrote: Hi I am a registered user of Galaxy. I am trying to load my plant species genome (Cicer arietinum) via FTP using my registered account varsha.parde...@rmit.edu.au mailto:varsha.parde...@rmit.edu.au and password. However, the connection has failed several times. Kindly help me to solve this problem I also request you to load Cicer arietinum genome sequence in galaxy dataset The sequence is available http://www.ncbi.nlm.nih.gov/assembly/525138/ Thank you Kind regards Varsha -- Dr. Varsha Pardeshi Research Fellow Health Innovations Research Institute School of Applied Sciences RMIT University Building 223, Level 1 Plenty Road, Bundoora. Victoria. 3083. Australia. Lab.: +61 3 99257140, Office: +61 3 99257113 Fax.: +61 3 9925 7110 Mobile: +61 3 0416183650 ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] Help
Hi Pasquale, From a quick look (and I am not the tool-building expert of our team!), I suspect that the problem is with the format assigned to the output of the first tool, and input of the second tool. Specifically, format=string is problematic, unless you have also defined this in your local install. Even then, having it contain a path to a file deviates from the regular usage (if I have understood your snippet of code correctly). Our wiki for tool configuration is located here. The wiki has examples, but you can also look at tools in the source code or Tool shed repos to see how format is used. https://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax I don't want to send you away from this list, since I know that you already emailed earlier, but the galaxy-...@bx.psu.edu mailing list is where most tool development questions are discussed. That said, when troubleshooting, individual scripts are not often corrected by the community if the answer is already in the wiki, existing code base, or in a prior discussion. So, making use of these resources is the first place to start. There is a search tool for development topics that can be of great use to locate the bits for you that can be helpful: http://galaxyproject.org/search/ Try a search with Admin Development - I found in the first few hits this link, which includes the tool config link above plus many other related resources listed at the bottom: https://wiki.galaxyproject.org/Admin/Tools/Add%20Tool%20Tutorial Hopefully this helps a little, and others reading the post are welcome to add in more of course! Jen Galaxy team On 1/13/14 7:31 AM, Pasquale Notarangelo wrote: Hi all, we are two new galaxy users. We have developed 2 new tools and we would connect them into a new workflow. We are able to import both tools and to link them into a workflow but we aren't able to pass the output of the first tool as the input of the second tool. The first tool calls a bash script that produces a simple string (this is the path of the file generated by the script). This is the xnl file of the first tool: tool id=infnTools_ConcatenateArgumentsTool name=Concatenate Arguments and generate file Tool descriptionConcatenate arguments strings and generate file/description command interpreter=bashconcatenateArgumentsAndPutFile.sh $inputArguments/command inputs param name=inputArguments type=text label=ARGUMENTS optional=false/ /inputs outputs data format=string name=output / /outputs /tool This is the bash script of the first tool (concatenateArgumentsAndPutFile.sh): #!/bin/bash echo ARGUMENTS: $@ export PathFile=/tmp/$RANDOM$RANDOM echo $@ $PathFile echo PathFile: $PathFile The xml of the second tool is the following: tool id=infnTools_InsertBiomasTools name=InsertJobs and check the status of Biomas descriptionInsertJobs Biomas Tool and check the status/description command interpreter=bashinsertAndCheckBiomasJobs.sh $input /command inputs param format=string name=input type=data label=Insert path file/ /inputs outputs data format=tabular name=output/ /outputs /tool When we run the workflow the output of the first tool isn't seen as input of the second tool. Into the galaxy history we see this value for the input of the second tool: /home/pasquale/galaxy-dist/database/files/000/dataset_83.dat Also this file is emtpy. How we can resolve the problem? Thanks and best regard Pasquale Alfonso Dott. Pasquale Notarangelo INFN Istituto Nazionale di Fisica Nucleare - Sezione Bari Via Orabona, 4 - 70126 Bari, Italy Tel. ufficio: +39 080-5443194 Interno ufficio: 3194 Mail: pasquale.notarang...@ba.infn.it Skype: pasquale.notarangelo_1985 Msn: pasqualenotarang...@hotmail.it Gmail: notarangelo@gmail.com ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the
[galaxy-user] add T. brucei 927 genome to Galaxy main?
It would be very nice to see the Trypanosoma brucei TREU927 genome added to Galaxy: http://tritrypdb.org/common/downloads/Current_Release/TbruceiTREU927/fasta/data/TriTrypDB-6.0_TbruceiTREU927_Genome.fasta Thanks Susanne ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] bam to bigwig
Clicked submit Thanks! From: Jennifer Jackson [mailto:j...@bx.psu.edu] Sent: Monday, January 13, 2014 2:39 PM To: Susanne Warrenfeltz; galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] bam to bigwig Hello Susanne, First, add the genome to the list of Custom Builds for your account. The form to do this is under User - Custom Builds. The .fasta version of the genome is one entry option, so go ahead and use that. Pick a name and a unique key that will not conflict with other genomes already in Galaxy (a full list can be viewed by clicking on the link around the middle of this form, Show loaded, system-installed builds). Once the load execution is started, this will take some time to process - how long depends roughly on the size of the genome. After added, you will be able to assign the build to datasets just like any other builds that are system-installed. Assign this to your dataset, then try the tool again. I am also running a test as a double check that there are no problems with the method (have not attempted this since our move to the new hardware a few months ago), but do not anticipate problems. Should an issue occur, I will write you to follow up. Meanwhile, you should go ahead and proceed as well. Having your custom genome set up this way is useful for other reasons (visualization, general data tracking, etc.). Best, Jen Galaxy team On 1/13/14 6:56 AM, Susanne Warrenfeltz wrote: Hello, I am trying to convert a BAM file to Bigwig using the Convert Format option under Attributes (I click on the pencil next to the file name in my history) The conversion fails with the error message: 11L3_v3 is not found in chromosome sizes file. 11L3_v3 is a genomic sequence ID for the genome that the BAM file represents. The genome I need is not in the list of Database/build option in Galaxy. How do I get my conversion to work? I have uploaded the fasta file for my genome into my history but I do not see a way to point the conversion tool to that file. Am I on the right track? Cheers from a Galaxy Newbie! Susanne ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] add T. brucei 927 genome to Galaxy main?
Hi Susanne, I wasn't able to locate this exact assembly at NCBI - is it there? Is there a known reason why it isn't? We have a general guideline to include genomes that are published there, although there is some leeway for open source, finished (or at least not early stage draft) genomes from other sources. I'll need to review (credits, usage, status, etc.). Please have a look and write back with more details of what you know about this one (can be direct to me), so I don't miss anything. Thanks! Jen Galaxy team On 1/13/14 9:23 AM, Susanne Warrenfeltz wrote: It would be very nice to see the Trypanosoma brucei TREU927 genome added to Galaxy: http://tritrypdb.org/common/downloads/Current_Release/TbruceiTREU927/fasta/data/TriTrypDB-6.0_TbruceiTREU927_Genome.fasta Thanks Susanne ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] all FPKMs are 0 in the tmap files produced by cuffcompare
Hi Jen: Thank you for the prompt reply. RPKMs produced by cufflink look normal (from an assembled transcript file): Seqname Source Feature Start End Score Strand Frame Attributes chr1Cufflinks transcript 11960 13178 1000. . gene_id CUFF.180; transcript_id CUFF.180.1; FPKM 6.5441928094; frac 1.00; conf_lo 3.594986; conf_hi 8.987465; cov 2.413218; full_read_support yes; chr1Cufflinks exon11960 13178 1000. . gene_id CUFF.180; transcript_id CUFF.180.1; exon_number 1; FPKM 6.5441928094; frac 1.00; conf_lo 3.594986; conf_hi 8.987465; cov 2.413218; chr1Cufflinks transcript 453653141000+ . gene_id CUFF.178; transcript_id CUFF.178.1; FPKM 11.0556332840; frac 1.00; conf_lo 3.645830; conf_hi 13.216134; cov 4.076844; full_read_support no; chr1Cufflinks exon453646051000+ . gene_id CUFF.178; transcript_id CUFF.178.1; exon_number 1; FPKM 11.0556332840; frac 1.00; conf_lo 3.645830; conf_hi 13.216134; cov 4.076844; chr1Cufflinks exon470650951000+ . gene_id CUFF.178; transcript_id CUFF.178.1; exon_number 2; FPKM 11.0556332840; frac 1.00; conf_lo 3.645830; conf_hi 13.216134; cov 4.076844; chr1Cufflinks exon517453141000+ . gene_id CUFF.178; transcript_id CUFF.178.1; exon_number 3; FPKM 11.0556332840; frac 1.00; conf_lo 3.645830; conf_hi 13.216134; cov 4.076844; I checked the chromosome names and I realized that the BAM outputs use lower cases for RNAME, eg. chr1 while my gff3 file uses initial capital letters for seqId, eg Chr1. Could this be the problem? What is the fastest way to convert the capital C in my gff3 file to lower case? Thank you very much Yang - 原始邮件 - 发件人: Jennifer Jackson j...@bx.psu.edu 收件人: Yang Bi bey...@stanford.edu, galaxy-user@lists.bx.psu.edu 发送时间: 星期一, 2014年 1 月 13日 上午 10:56:39 主题: Re: [galaxy-user] all FPKMs are 0 in the tmap files produced by cuffcompare Hello, It looks like the data is mapping as novel - not linked with the reference annotation. There can be a few factors that can cause this to occur for part of a dataset (often desirable) but when it occurs for an entire dataset, there is often a data mismatch or parameter issue. The first item I always check is that the reference genomes are a match between inputs. Do this by confirming that the identifiers in the reference GFF file are the same as those in the Tophat BAM output (convert to SAM, with headers, to see the chromosome names). For the GFF file, the tool Join, Subtract and Group - Group on the first column, chromosome name, with the action count distinct will isolate these. But the real problem could be in the parameters, see below: On 1/11/14 10:43 PM, Yang Bi wrote: Dear all: I am new to Galaxy and I followed online tutorials/tips to analyze my RNA seq data for alternative splicing. I used tophat for illumina to align my sequencing data after QC/filtering. Other than setting min intron to 20, I used the default settings. Then I feed the accepted hit files to cufflink. I set Min isoform fraction to 0, use annotation (tair10 gff3) as guide and choose yes for perform bias correction (locally cached tair10). My guess is that this Cufflinks run had the same issue - have you checked it? The 'Min isoform fraction' set to 0 may be problematic (I have never run Cufflinks this way). It may seem that this is a setting that is permissive - to capture even very small expression levels - but it may have had the reverse effect of not assigning any reads. (The Tophat run with min intron at 20 is pretty low/sensitive - but with a smaller genome this probably will not cause memory issues with the mapping. Was this set based on the genome having transcripts with known, characterized introns this short? I didn't check, but you can in the reference GFF file.). Maybe double check the above Cufflinks run, confirm the results were as expected, then try the default in Cufflinks to see how that works out (0.1)? As a first pass test? If you want to make this more sensitive in subsequent run, you could try 0.01 - although how significant those results are, given this genome and your specific input data, would need to be evaluated. After that, if you are still having trouble, please feel free to share a history link and we can try to help (copy and email a share link from the public server, direct to me, to keep your data private). Here is how: https://wiki.galaxyproject.org/Support#Shared_and_Published_data Hopefully the parameter change works, or a reference genome issue is found and corrected, but if not, I'll watch for your email, Jen Galaxy team I merged the assembled transcripts with cuffmerge and use cuffcompare to compare the resultant merged assembled transcript to the reference annotation file tair10 gff3. I choose
Re: [galaxy-user] all FPKMs are 0 in the tmap files produced by cuffcompare
Hello Yang, Glad the problem was isolated - the mismatched chromosomes is definitely something to be fixed. The tools in 'Text Manipulation can help. The tool Change Case of selected columns can change the case for you. Click on the pencil icon after running the tool to reassign the datatype correctly as needed. Take care, Jen Galaxy team On 1/13/14 6:31 PM, Yang Bi wrote: Hi Jen: Thank you for the prompt reply. RPKMs produced by cufflink look normal (from an assembled transcript file): Seqname Source Feature Start End Score Strand Frame Attributes chr1Cufflinks transcript 11960 13178 1000. . gene_id CUFF.180; transcript_id CUFF.180.1; FPKM 6.5441928094; frac 1.00; conf_lo 3.594986; conf_hi 8.987465; cov 2.413218; full_read_support yes; chr1Cufflinks exon11960 13178 1000. . gene_id CUFF.180; transcript_id CUFF.180.1; exon_number 1; FPKM 6.5441928094; frac 1.00; conf_lo 3.594986; conf_hi 8.987465; cov 2.413218; chr1Cufflinks transcript 453653141000+ . gene_id CUFF.178; transcript_id CUFF.178.1; FPKM 11.0556332840; frac 1.00; conf_lo 3.645830; conf_hi 13.216134; cov 4.076844; full_read_support no; chr1Cufflinks exon453646051000+ . gene_id CUFF.178; transcript_id CUFF.178.1; exon_number 1; FPKM 11.0556332840; frac 1.00; conf_lo 3.645830; conf_hi 13.216134; cov 4.076844; chr1Cufflinks exon470650951000+ . gene_id CUFF.178; transcript_id CUFF.178.1; exon_number 2; FPKM 11.0556332840; frac 1.00; conf_lo 3.645830; conf_hi 13.216134; cov 4.076844; chr1Cufflinks exon517453141000+ . gene_id CUFF.178; transcript_id CUFF.178.1; exon_number 3; FPKM 11.0556332840; frac 1.00; conf_lo 3.645830; conf_hi 13.216134; cov 4.076844; I checked the chromosome names and I realized that the BAM outputs use lower cases for RNAME, eg. chr1 while my gff3 file uses initial capital letters for seqId, eg Chr1. Could this be the problem? What is the fastest way to convert the capital C in my gff3 file to lower case? Thank you very much Yang -- Jennifer Hillman-Jackson http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/