Re: [galaxy-user] Filter fastq by percentage of ambiguous (N) bases
Hello Anto, There is no specific tool that I know of to do this based off read content, but you could use the very low quality score (2) assigned to ambiguous bases and the tool 'Filter by quality' to do a filter by percentage. Be aware that other bases may have scores assigned to this lower value, but these would very likely not be of practical usage anyway. You could clip these end first, then do the filter, discarding any that have very short usable sequence left. If the data is Illumina, is likely a sign of a sequence that failed vendor quality checks, and these are no longer removed by default as of Casava 1.8+. Creating regular expression with the Select tool is another option, but this probably more effort than it is worth to construct. But, your choice. A google will bring up syntax advice. Ideally the first will do the job, Jen Galaxy team On 7/29/13 3:17 AM, Anto Praveen Rajkumar Rajamani wrote: Hello, I like to filter my fastq files (50 bp single end Illumina RNA seq reads) by a maximum threshold (10%) of ambiguous (N) bases. I can see that the CLIP tool removes all reads with one or more N bases. Is there a way to remove only the reads with five or more N bases using Galaxy? Thank you. Best wishes, Anto ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] Filter Fastq
On Thu, May 9, 2013 at 12:27 PM, Casey,Richard richard.ca...@colostate.edu wrote: Hi, We have two Filter FASTQ jobs running on the Galaxy public server. Both jobs have been running for more than four days. This seems like an excessive amount of runtime. Do Filter FASTQ jobs normally take this long to run? Which FASTQ filtering tool exactly are you referring too? The one called Filter FASTQ reads by quality score and length in the left hand column, tool ID fastq_filter? How big were the input files? Peter ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-user] Filter Fastq
Hi Richard, The public Main Galaxy server has been very busy lately. To let you know, the size of the job will not determine when a job will start, only the time it was started/queued with respect to other user's jobs also queued and in less common circumstances the specific type of job (that requires a particular cluster node type - this was not the case for your job). How long a job executes (will be in the yellow running state) is related to the size of the inputs, the type of job, and parameters. I see that these have now failed with a cluster error - please re-run the failed jobs one more time. If you continue to have problems, please submit one of the error datasets as a bug report, leaving all inputs/outputs undeleted. http://wiki.galaxyproject.org/Support#Reporting_tool_errors Very sorry that this was causing confusion. And thanks Peter for the help! Jen Galaxy team On 5/9/13 4:27 AM, Casey,Richard wrote: Hi, We have two Filter FASTQ jobs running on the Galaxy public server. Both jobs have been running for more than four days. This seems like an excessive amount of runtime. Do Filter FASTQ jobs normally take this long to run? Richard ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using reply all in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/