Thank you! I am sure that this will help the Wikidata team to make the right decision. Also, very interesting numbers.
One stupid question: due to the length of these identifiers, and since they are not simple intransparent identifiers but rather encode semantics - if I understand it correctly - could a single such identifier be encoding content or ideas which are potentially covered by copyright or patent law? Is there some background available on that? On Fri, Sep 23, 2016 at 3:27 AM Egon Willighagen <egon.willigha...@gmail.com> wrote: > > Sebastian, great you found time for it! I didn't :/ (Stats are worth a > tweet, IMHO :) > > Egon > > On Fri, Sep 23, 2016 at 12:20 PM, Sebastian Burgstaller < > sebastian.burgstal...@gmail.com> wrote: > >> Hi Denny, >> Sorry, I missed this email. just did the calculation for InChI string >> lengths on the 92 Mio PubChem compounds: >> 99% 99.9% 100% >> 311 676 4502 >> >> That said, there is not upper limit for the length, but 4502 is the >> longest string in the PubChem database. The other IDs, canonical and >> isomeric SMILES have the same distribution shape, but are overall >> slightly shorter. >> >> Best, >> Sebastian >> >> On Sun, Sep 18, 2016 at 9:19 PM, Denny Vrandečić <vrande...@gmail.com> >> wrote: >> > Can you figure out what a good limit would be for these two use cases? >> I.e. >> > what would support 99%, 99.9%, and 100%? >> > >> > >> > On Sun, Sep 18, 2016, 12:27 Egon Willighagen < >> egon.willigha...@gmail.com> >> > wrote: >> >> >> >> Hi all, >> >> >> >> sorry for joining the party late... >> >> >> >> On Tue, Sep 13, 2016 at 11:39 AM, Sebastian Burgstaller >> >> <sebastian.burgstal...@gmail.com> wrote: >> >> > I think this topic might have been discussed many months ago. For >> >> > certain data types in the chemical compound space (P233, canonical >> >> > smiles, P2017 isomeric smiles and P234 Inchi key) a higher character >> >> > limit than 400 would be really helpful (1500 to 2000 chars (I sense >> >> > that this might cause problems with SPARQL)). Are there any plans on >> >> > implementing this? In general, for quality assurance, many string >> >> > property types would profit from a fixed max string length. >> >> >> >> 400 characters is not a lot for chemicals... InChIs can be a lot >> >> larger indeed. 2k would allow us to capture a lot more chemicals. BTW, >> >> this also applies to the canonical SMILES, which also doesn't have an >> >> upper bound. Tannic acid (Q427956) is an example (which looking at the >> >> InChIKey came up when running the bot :) From working with ChEMBL as >> >> RDF I know it has InChIs of length > 1024, which was the max length in >> >> Virtuoso... I think it's important for the biology and chemistry to >> >> increase the limit. >> >> >> >> Egon >> >> >> >> -- >> >> E.L. Willighagen >> >> Department of Bioinformatics - BiGCaT >> >> Maastricht University (http://www.bigcat.unimaas.nl/) >> >> Homepage: http://egonw.github.com/ >> >> LinkedIn: http://se.linkedin.com/in/egonw >> >> Blog: http://chem-bla-ics.blogspot.com/ >> >> PubList: http://www.citeulike.org/user/egonw/tag/papers >> >> ORCID: 0000-0001-7542-0286 >> >> ImpactStory: https://impactstory.org/EgonWillighagen >> >> >> >> _______________________________________________ >> >> Wikidata mailing list >> >> Wikidata@lists.wikimedia.org >> >> https://lists.wikimedia.org/mailman/listinfo/wikidata >> > >> > >> > _______________________________________________ >> > Wikidata mailing list >> > Wikidata@lists.wikimedia.org >> > https://lists.wikimedia.org/mailman/listinfo/wikidata >> > >> >> _______________________________________________ >> Wikidata mailing list >> Wikidata@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikidata >> > > > > -- > E.L. Willighagen > Department of Bioinformatics - BiGCaT > Maastricht University (http://www.bigcat.unimaas.nl/) > Homepage: http://egonw.github.com/ > LinkedIn: http://se.linkedin.com/in/egonw > Blog: http://chem-bla-ics.blogspot.com/ > PubList: http://www.citeulike.org/user/egonw/tag/papers > ORCID: 0000-0001-7542-0286 > ImpactStory: https://impactstory.org/u/egonwillighagen > _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata >
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata