Thank you! I am sure that this will help the Wikidata team to make the
right decision. Also, very interesting numbers.

One stupid question: due to the length of these identifiers, and since they
are not simple intransparent identifiers but rather encode semantics - if I
understand it correctly - could a single such identifier be encoding
content or ideas which are potentially covered by copyright or patent law?
Is there some background available on that?

On Fri, Sep 23, 2016 at 3:27 AM Egon Willighagen <egon.willigha...@gmail.com>
wrote:

>
> Sebastian, great you found time for it! I didn't :/ (Stats are worth a
> tweet, IMHO :)
>
> Egon
>
> On Fri, Sep 23, 2016 at 12:20 PM, Sebastian Burgstaller <
> sebastian.burgstal...@gmail.com> wrote:
>
>> Hi Denny,
>> Sorry, I missed this email. just did the calculation for InChI string
>> lengths on the 92 Mio PubChem compounds:
>>   99% 99.9%  100%
>>   311   676  4502
>>
>> That said, there is not upper limit for the length, but 4502 is the
>> longest string in the PubChem database. The other IDs, canonical and
>> isomeric SMILES have the same distribution shape, but are overall
>> slightly shorter.
>>
>> Best,
>> Sebastian
>>
>> On Sun, Sep 18, 2016 at 9:19 PM, Denny Vrandečić <vrande...@gmail.com>
>> wrote:
>> > Can you figure out what a good limit would be for these two use cases?
>> I.e.
>> > what would support 99%, 99.9%, and 100%?
>> >
>> >
>> > On Sun, Sep 18, 2016, 12:27 Egon Willighagen <
>> egon.willigha...@gmail.com>
>> > wrote:
>> >>
>> >> Hi all,
>> >>
>> >> sorry for joining the party late...
>> >>
>> >> On Tue, Sep 13, 2016 at 11:39 AM, Sebastian Burgstaller
>> >> <sebastian.burgstal...@gmail.com> wrote:
>> >> > I think this topic might have been discussed many months ago. For
>> >> > certain data types in the chemical compound space (P233, canonical
>> >> > smiles, P2017 isomeric smiles and P234 Inchi key) a higher character
>> >> > limit than 400 would be really helpful (1500 to 2000 chars (I sense
>> >> > that this might cause problems with SPARQL)). Are there any plans on
>> >> > implementing this? In general, for quality assurance, many string
>> >> > property types would profit from a fixed max string length.
>> >>
>> >> 400 characters is not a lot for chemicals... InChIs can be a lot
>> >> larger indeed. 2k would allow us to capture a lot more chemicals. BTW,
>> >> this also applies to the canonical SMILES, which also doesn't have an
>> >> upper bound. Tannic acid (Q427956) is an example (which looking at the
>> >> InChIKey came up when running the bot :) From working with ChEMBL as
>> >> RDF I know it has InChIs of length > 1024, which was the max length in
>> >> Virtuoso... I think it's important for the biology and chemistry to
>> >> increase the limit.
>> >>
>> >> Egon
>> >>
>> >> --
>> >> E.L. Willighagen
>> >> Department of Bioinformatics - BiGCaT
>> >> Maastricht University (http://www.bigcat.unimaas.nl/)
>> >> Homepage: http://egonw.github.com/
>> >> LinkedIn: http://se.linkedin.com/in/egonw
>> >> Blog: http://chem-bla-ics.blogspot.com/
>> >> PubList: http://www.citeulike.org/user/egonw/tag/papers
>> >> ORCID: 0000-0001-7542-0286
>> >> ImpactStory: https://impactstory.org/EgonWillighagen
>> >>
>> >> _______________________________________________
>> >> Wikidata mailing list
>> >> Wikidata@lists.wikimedia.org
>> >> https://lists.wikimedia.org/mailman/listinfo/wikidata
>> >
>> >
>> > _______________________________________________
>> > Wikidata mailing list
>> > Wikidata@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/wikidata
>> >
>>
>> _______________________________________________
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
>
> --
> E.L. Willighagen
> Department of Bioinformatics - BiGCaT
> Maastricht University (http://www.bigcat.unimaas.nl/)
> Homepage: http://egonw.github.com/
> LinkedIn: http://se.linkedin.com/in/egonw
> Blog: http://chem-bla-ics.blogspot.com/
> PubList: http://www.citeulike.org/user/egonw/tag/papers
> ORCID: 0000-0001-7542-0286
> ImpactStory: https://impactstory.org/u/egonwillighagen
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to