On Tue, 2 Aug 2022, Thomas Juettemann wrote:
I came across a "transcript-based" VCF file, meaning a variant can be
present multiple times but belonging to a different transcript. See
"FIle 1" below as an example. I am finding myself in the unfortunate
situation of having to intersect ("File 2") and retain all records
with the same position and REF/ALT ("Desired output").
Long shot: Is that possible?
Does "bcftools isec" (https://www.htslib.org/doc/bcftools.html#isec) do
what you want? The "Extract and write records from A shared by both A and
B using exact allele match" example in the manual page sounds like it
might:
bcftools isec -p dir -n=2 -w1 A.vcf.gz B.vcf.gz
If not, you can't find anything else, and you only want to do a few of
them, it might be possible to break out pysam and write something. If you
want to do lots, then a C program would probably be the way forward - it
doesn't look like it would be too difficult.
Rob Davies r...@sanger.ac.uk
The Sanger Institute http://www.sanger.ac.uk/
Hinxton, Cambs., Tel. +44 (1223) 834244
CB10 1SA, U.K. Fax. +44 (1223) 494919
--
The Wellcome Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help