Sebastian, great you found time for it! I didn't :/ (Stats are worth a
tweet, IMHO :)

Egon

On Fri, Sep 23, 2016 at 12:20 PM, Sebastian Burgstaller <
sebastian.burgstal...@gmail.com> wrote:

> Hi Denny,
> Sorry, I missed this email. just did the calculation for InChI string
> lengths on the 92 Mio PubChem compounds:
>   99% 99.9%  100%
>   311   676  4502
>
> That said, there is not upper limit for the length, but 4502 is the
> longest string in the PubChem database. The other IDs, canonical and
> isomeric SMILES have the same distribution shape, but are overall
> slightly shorter.
>
> Best,
> Sebastian
>
> On Sun, Sep 18, 2016 at 9:19 PM, Denny Vrandečić <vrande...@gmail.com>
> wrote:
> > Can you figure out what a good limit would be for these two use cases?
> I.e.
> > what would support 99%, 99.9%, and 100%?
> >
> >
> > On Sun, Sep 18, 2016, 12:27 Egon Willighagen <egon.willigha...@gmail.com
> >
> > wrote:
> >>
> >> Hi all,
> >>
> >> sorry for joining the party late...
> >>
> >> On Tue, Sep 13, 2016 at 11:39 AM, Sebastian Burgstaller
> >> <sebastian.burgstal...@gmail.com> wrote:
> >> > I think this topic might have been discussed many months ago. For
> >> > certain data types in the chemical compound space (P233, canonical
> >> > smiles, P2017 isomeric smiles and P234 Inchi key) a higher character
> >> > limit than 400 would be really helpful (1500 to 2000 chars (I sense
> >> > that this might cause problems with SPARQL)). Are there any plans on
> >> > implementing this? In general, for quality assurance, many string
> >> > property types would profit from a fixed max string length.
> >>
> >> 400 characters is not a lot for chemicals... InChIs can be a lot
> >> larger indeed. 2k would allow us to capture a lot more chemicals. BTW,
> >> this also applies to the canonical SMILES, which also doesn't have an
> >> upper bound. Tannic acid (Q427956) is an example (which looking at the
> >> InChIKey came up when running the bot :) From working with ChEMBL as
> >> RDF I know it has InChIs of length > 1024, which was the max length in
> >> Virtuoso... I think it's important for the biology and chemistry to
> >> increase the limit.
> >>
> >> Egon
> >>
> >> --
> >> E.L. Willighagen
> >> Department of Bioinformatics - BiGCaT
> >> Maastricht University (http://www.bigcat.unimaas.nl/)
> >> Homepage: http://egonw.github.com/
> >> LinkedIn: http://se.linkedin.com/in/egonw
> >> Blog: http://chem-bla-ics.blogspot.com/
> >> PubList: http://www.citeulike.org/user/egonw/tag/papers
> >> ORCID: 0000-0001-7542-0286
> >> ImpactStory: https://impactstory.org/EgonWillighagen
> >>
> >> _______________________________________________
> >> Wikidata mailing list
> >> Wikidata@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/wikidata
> >
> >
> > _______________________________________________
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
> >
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>



-- 
E.L. Willighagen
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers
ORCID: 0000-0001-7542-0286
ImpactStory: https://impactstory.org/u/egonwillighagen
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to