Hi Rob,
Thanks for looking into it. Unfortunately isec keeps only the first record.
Digging through Github, it seems it is a known limitation:
https://github.com/samtools/bcftools/issues/665#issuecomment-323372893

Best,
Thomas

On Wed, 3 Aug 2022 at 17:50, Robert Davies <r...@sanger.ac.uk> wrote:
>
> On Tue, 2 Aug 2022, Thomas Juettemann wrote:
>
> > I came across a "transcript-based" VCF file, meaning a variant can be
> > present multiple times but belonging to a different transcript. See
> > "FIle 1" below as an example. I am finding myself in the unfortunate
> > situation of having to intersect ("File 2")  and retain all records
> > with the same position and REF/ALT ("Desired output").
> > Long shot: Is that possible?
>
> Does "bcftools isec" (https://www.htslib.org/doc/bcftools.html#isec) do
> what you want?  The "Extract and write records from A shared by both A and
> B using exact allele match" example in the manual page sounds like it
> might:
>
>     bcftools isec -p dir -n=2 -w1 A.vcf.gz B.vcf.gz
>
> If not, you can't find anything else, and you only want to do a few of
> them, it might be possible to break out pysam and write something.  If you
> want to do lots, then a C program would probably be the way forward - it
> doesn't look like it would be too difficult.
>
> Rob Davies              r...@sanger.ac.uk
> The Sanger Institute    http://www.sanger.ac.uk/
> Hinxton, Cambs.,        Tel. +44 (1223) 834244
> CB10 1SA, U.K.          Fax. +44 (1223) 494919
>
>
> --
>  The Wellcome Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE.


_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to