At 15:02 +0000 29/10/06, S Page wrote:
> >I wanted to define an attribute Smiles:=string, where the string could 
> >contain
>>quite a few characters (its more or less  1 character per atom in a
>>molecule, and
>>some molecules can easily have 1000+ atoms).  Is there  limit to how long
>>such a string could be?
>
>Yes, the Type:String code limits strings to 255.  There's a warning in the 
>factbox when you exceed this.  It was discussed earlier, 
>http://sourceforge.net/mailarchive/message.php?msg_id=36371763 ; although you 
>could probably work around the MediaWiki limitations if you edit the PHP code 
>and alter your database, there's a performance hit.
>
>Can you explain or link to the underlying property for "Smiles" that you're 
>trying to represent, is it a chemical formula like NaCl?  It's nice to 
>understand what people are trying to do.


With the advent of  XML-serialisation of chemistry, CML (a project  Peter
Murray-Rust and  I started in  1995, shortly after the inaugural  WWW
conference in  CERN, and following discussions with TimBL and
DR)  the need for (globally) unique  chemical IDs became apparent.  SMILES
had been the first successful attempt to serialise a molecular structure into
a unique canonical string.  Unfortunately, it was proprietary, and the
Company that owned it, although they published the original spec,
changed that spec in their own implementation.  Around 8 "varieties"
of SMILES emerged as a result.  The newer InChI (International
chemical identifier) has emerged as a solution to this problem.
Typically, a  "small" molecule is defined as having less than
1000 atoms.  The InChI generated for such molecules has
up to 2000 characters.   It could be hashed down to a much shorter
string of course, but this brings with it other problems, including
information loss (the InChI currently can be used to restore
the atom/connectivity of the original molecule).

So SMILES, and now InChI represent an attribute of a molecule
which constitutes a unique canonical (algorithmic) identifier for it. We use the
InChI in an earlier semantic chemical project (using Sesame as the
triple store/logic engine), and were hoping to use Semediawiki as
the authoring/input environment for eg Sesame via SPARQL endpoints.
The InChI in particular is key to this.

I understand the reason for limiting an attribute to  256 characters.
Perhaps we should define an entirely new datatype (InChI) which
expands this limit (and put up with the performance hit?).  Is
this a sensible approach? (we can think of a number of other chemical
datatypes;  for example molecular formula).
-- 

Henry Rzepa.
+44 (020) 7594 5774 (Voice); +44 (0870) 132 3747 (eFax); [EMAIL PROTECTED] 
(iChat)
 http://www.ch.ic.ac.uk/rzepa/ Dept. Chemistry, Imperial College London, SW7  
2AZ, UK.

(Voracious anti-spam filter in operation for received email.
If expected reply not received, please phone/fax).


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Semediawiki-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/semediawiki-user

Reply via email to