At 15:02 +0000 29/10/06, S Page wrote: > >I wanted to define an attribute Smiles:=string, where the string could > >contain >>quite a few characters (its more or less 1 character per atom in a >>molecule, and >>some molecules can easily have 1000+ atoms). Is there limit to how long >>such a string could be? > >Yes, the Type:String code limits strings to 255. There's a warning in the >factbox when you exceed this. It was discussed earlier, >http://sourceforge.net/mailarchive/message.php?msg_id=36371763 ; although you >could probably work around the MediaWiki limitations if you edit the PHP code >and alter your database, there's a performance hit. > >Can you explain or link to the underlying property for "Smiles" that you're >trying to represent, is it a chemical formula like NaCl? It's nice to >understand what people are trying to do.
With the advent of XML-serialisation of chemistry, CML (a project Peter Murray-Rust and I started in 1995, shortly after the inaugural WWW conference in CERN, and following discussions with TimBL and DR) the need for (globally) unique chemical IDs became apparent. SMILES had been the first successful attempt to serialise a molecular structure into a unique canonical string. Unfortunately, it was proprietary, and the Company that owned it, although they published the original spec, changed that spec in their own implementation. Around 8 "varieties" of SMILES emerged as a result. The newer InChI (International chemical identifier) has emerged as a solution to this problem. Typically, a "small" molecule is defined as having less than 1000 atoms. The InChI generated for such molecules has up to 2000 characters. It could be hashed down to a much shorter string of course, but this brings with it other problems, including information loss (the InChI currently can be used to restore the atom/connectivity of the original molecule). So SMILES, and now InChI represent an attribute of a molecule which constitutes a unique canonical (algorithmic) identifier for it. We use the InChI in an earlier semantic chemical project (using Sesame as the triple store/logic engine), and were hoping to use Semediawiki as the authoring/input environment for eg Sesame via SPARQL endpoints. The InChI in particular is key to this. I understand the reason for limiting an attribute to 256 characters. Perhaps we should define an entirely new datatype (InChI) which expands this limit (and put up with the performance hit?). Is this a sensible approach? (we can think of a number of other chemical datatypes; for example molecular formula). -- Henry Rzepa. +44 (020) 7594 5774 (Voice); +44 (0870) 132 3747 (eFax); [EMAIL PROTECTED] (iChat) http://www.ch.ic.ac.uk/rzepa/ Dept. Chemistry, Imperial College London, SW7 2AZ, UK. (Voracious anti-spam filter in operation for received email. If expected reply not received, please phone/fax). ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Semediawiki-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/semediawiki-user
