On Tue, 2 Aug 2022, Thomas Juettemann wrote:

I came across a "transcript-based" VCF file, meaning a variant can be
present multiple times but belonging to a different transcript. See
"FIle 1" below as an example. I am finding myself in the unfortunate
situation of having to intersect ("File 2")  and retain all records
with the same position and REF/ALT ("Desired output").
Long shot: Is that possible?

Does "bcftools isec" (https://www.htslib.org/doc/bcftools.html#isec) do what you want? The "Extract and write records from A shared by both A and B using exact allele match" example in the manual page sounds like it might:

   bcftools isec -p dir -n=2 -w1 A.vcf.gz B.vcf.gz

If not, you can't find anything else, and you only want to do a few of them, it might be possible to break out pysam and write something. If you want to do lots, then a C program would probably be the way forward - it doesn't look like it would be too difficult.

Rob Davies              r...@sanger.ac.uk
The Sanger Institute    http://www.sanger.ac.uk/
Hinxton, Cambs.,        Tel. +44 (1223) 834244
CB10 1SA, U.K.          Fax. +44 (1223) 494919


--
The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to