If you're going to rely on positions within arrays, why not just do it the
simple way?
{ "smiles": ["CCO"],
"2D": [1,1,2,2,3,3],
"3D": [1,1,1,2,2,2,3,3,3]
}
Smiles are a great representation of molecules (especially with
smarts/smirks regex), and, in cases where they can be used, I think they're
the best thing out there. However, they don't cover everything. I work with
metal-organic frameworks, which are large crystals that require more
extensibility than smiles offers (I still use _-separated smiles of the mof
constituents to hash the cif / json files, however). Also, my point in that
previous email is that referencing by index is bad, not good. It's less
direct than explicitly referencing items, which makes the format more
difficult to understand for new users + more prone to user error.
A SMILES contains exactly the same information as the atom/bond lists in a
much more compact form. If you want to avoid the aromaticity problem, just
use Kekule form, which makes it virtually identical to any other connection
table format, but in about 10x to 20x fewer bytes. SMILES are very easy to
parse, and there are dozens of parsers around.
What I truly like about smiles is that it's human readable + hashable,
which I see as the real goal. The shorter length is just a corollary of
that. Prove me wrong, but I think people make too big a deal about size of
molecule formats. I just bought a 2 TB hard disk drive for $70. WIth mongo
db + their json serialization, I estimated that I can put 200 million
verbose json mof structures on that drive. I only have a few thousand, so I
some room to spare.
This discussion has focussed on the syntax of JSON, but completely
overlooks the real problem with ALL chemical file formats: how do you
handle all of the cases where a simple connection-table ("ball and stick")
doesn't capture reality? Things like aromaticity, tautomers,
organo-metallic bonds, boron-hydrogen cages, distributed bonds (ferrocenes
and the like) ... these are the problems.
The point of json (and xml) is that they are *extensible*- that's why json
has exploded in the developer community. If you need handles for
aromaticity and metallic bonding, just add new properties to the json/xml.
Because of the extensibility, adding new properties will not break any
existing code. That's the advantage over all of the older table formats,
which weren't built to be extensible. And you see the repercussions in
scientific code all the time. (I was recently handed a project where
someone used heavy metals in molfiles to encode rotational data. That kind
of hack is exactly what json/xml fixes.)
There's also the advantage that many languages don't need a third-party
library to parse a json file. Or, if you do, it's *heavily* supported (ie.
gson for java).
Geoff - Outside of some fairly minor issues, xml translates easily to json.
Could the chemical xml specification just be translated to json?
On Fri, Jun 7, 2013 at 11:32 AM, Craig James <cja...@emolecules.com> wrote:
> Regarding using JSON as a new file format...
>
> This discussion has focussed on the syntax of JSON, but completely
> overlooks the real problem with ALL chemical file formats: how do you
> handle all of the cases where a simple connection-table ("ball and stick")
> doesn't capture reality? Things like aromaticity, tautomers,
> organo-metallic bonds, boron-hydrogen cages, distributed bonds (ferrocenes
> and the like) ... these are the problems.
>
> If we could solve these problems, it wouldn't much matter which file
> format we picked ... they'd all be equivalent and sufficient. Without
> solving these problems, a new file format doesn't really matter very much.
> All it does is make another parser with yet-another-interpretation of these
> hard problems.
>
> If JSON is a need, I suggest that you embed an existing chemical format
> (see my previous note that uses SMILES) into a JSON object.
>
> Craig
>
>
>
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. A cloud service to automate IT design, transition and operations
> 2. Dashboards that offer high-level views of enterprise services
> 3. A single system of record for all IT processes
> http://p.sf.net/sfu/servicenow-d2d-j
> _______________________________________________
> OpenBabel-discuss mailing list
> OpenBabel-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>
>
------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss