Hi Marielle!

I replied on your post
https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Molecular_biology#Let.27s_Do_SNPs.21
but here as well quickly.

I also want this data in wikidata, but, after just attending the American
Society for Human Genetics annual meeting, my suggestion is to be slightly
patient with this.  A lot of people are working hard to standardize the
nomenclature for variant identification (facing all the problems you
describe above) and I don't think it will take a long time for it to
stabilize.  (Famous last words.. but a lot of people are building tools
that depend on this happening).  Once this is accomplished, we ought to be
able to use the standard ids to anchor all the wikidata items for variants.


In my opinion, this is a battle best fought over at the Human Genome
Variation Society forum (http://www.hgvs.org/mutnomen/) and then applied
within wikidata rather than the other way around.

In the meantime, I'd encourage you to keep working on modeling all the
claims you would want to see that use variant entities as you have already
started doing.

my two cents..
-Ben


On Sun, Oct 26, 2014 at 1:59 PM, Marielle Volz <[email protected]>
wrote:

> This is awesome!
>
> I'd love to have all SNPs on as well, and I started a discussion about
> this on Wikiproject MB:
>
> https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Molecular_biology#Let.27s_Do_SNPs.21
>
> I think this would be amazing, because single nucleotide polymorphisms
> relate the genes to human diseases and traits, which are currently
> both on Wikidata.
>
> So for instance, we now have the gene
> https://www.wikidata.org/wiki/Q18028243 which encodes the protein
> product  https://www.wikidata.org/wiki/Q1738190, and we have the SNP
> https://www.wikidata.org/wiki/Q18341737 IN that gene, which is
> implicated in the disease https://www.wikidata.org/wiki/Q5712506.
>
> This way we can get a fuller picture from wikidata how changes in
> genes and gene products are related to the traits and diseases on
> wikidata.
>
> There are some things I'm really not sure how to handle however- each
> SNP is a *location*, and in a diploid organism, each location has two
> values, each of 4 different options (AGTC) and then each combination
> of values may result in the same protein or a different one. So in the
> case of the Kell antigen system, the rs8176058 location can be either
> A or G. A nucleotide of A in this location codes for the 'K' antigen
> or protein, and G encodes the 'k' antigen. This presents difficulties
> with representing the information in a single "table" because common
> variations AT the location have information that needs to be grouped
> together.
>
> In this case, it's simply the presence of an A or G that determines
> the gene product, but of course this gets more complicated, where we
> might not know strictly the "value" of A or G individually but may
> only have "values" for each genotype (AG, AA, or GG) that may need to
> be represented. And these genotypes might not always point to a
> specific gene product, but may instead point to a qualitative trait
> "increased risk of glaucoma" or a quantitative trait "vision was .2
> diopters greater on average".
>
> The two options are:
>
> create a separate WD item for each "option"- i.e. "rs8176058-A" to
> contain information about variation A at location rs8176058 (or, in
> the case when information is known about the genotype, "AG genotype on
> rs8176058")
>
> OR
>
> allow each option "A" or "AG" to be annotated with various fields. The
> complication is that each annotation may be needed to be annotated
> itself (and I don't think that's possible on WD) if we have multiple
> pieces of quantitative information associated with one genotype. Hard
> to say.
>
> To see how this data is represented in table form elsewhere, you can
> browse the GWAS catalog:
>
> http://www.genome.gov/page.cfm?pageid=26525384&clearquery=1#result_table
>
> Importing that might be a good start. There it looks something like this:
>
> Risk allele: rs1230666-A
> Effect: .0269 [0.014-0.039] unit increase
> Implicated in: Serum thyroid peroxidase antibody levels
> p-value: 2 x 10-8
> reference: Medici M
> February 27, 2014
> PLoS Genet
> Identification of novel genetic Loci associated with thyroid
> peroxidase antibodies and clinical thyroid disease.
>
> On Fri, Oct 24, 2014 at 1:24 AM, Lydia Pintscher
> <[email protected]> wrote:
> > Hey folks :)
> >
> > Blog post is now available at
> >
> http://blog.wikimedia.de/2014/10/22/establishing-wikidata-as-the-central-hub-for-linked-open-life-science-data/
> > Thanks Benjamin and Andra!
> >
> >
> > Cheers
> > Lydia
> >
> > --
> > Lydia Pintscher - http://about.me/lydia.pintscher
> > Product Manager for Wikidata
> >
> > Wikimedia Deutschland e.V.
> > Tempelhofer Ufer 23-24
> > 10963 Berlin
> > www.wikimedia.de
> >
> > Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
> >
> > Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> > unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> > Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
> >
> > _______________________________________________
> > Wikidata-l mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>
> _______________________________________________
> Wikidata-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>
_______________________________________________
Wikidata-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to