On Tue, Dec 21, 2021 at 01:55:15PM +0100, Lo?c Meunier wrote: > I'm currently using the text-formatted alignment produced with the tview > command to discard reads that do not fully cover my region of interest. > > For this, I filter out the alignment lines which contain a space character, > my assumption being that these delimitate reads. However, as I observe some > oddities with this filtering method, could you give me more information > about the meaning of the space characters in the tview command output?
The tview output is designed for human readability and as a pictorial-style representation of the sequence alignments. It's absolutely not the right format for parsing / filtering, and besides the white space used may well differ between curses library and/or terminal type. I'd suggest giving up with this avenue and looking at the pileup command instead. Note modern versions of this have a bunch of command line options for filtering out things that you may not be interested in, such as indels or read start/end markers. That can make parsing easier. James -- James Bonfield (j...@sanger.ac.uk) The Sanger Institute, Hinxton, Cambs, CB10 1SA -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help