Re: [Fis] Chemical information: a field of fuzzy contours ?

2011-09-17 Thread joe.bren...@bluewin.ch





Dear Michel and FIS Colleagues,


This will be an interesting discussion, since the core nature and role of 
information will be involved. Here is just one first point: to me, as a
chemist, chemical information is only secondarily an object capable of being 
formalized, archived, etc. A formula has meaning for me in terms

of the potential reactions the molecule to which it refers can undergo, what it 
looked like when crystallized for the first time and so on.



Cheminformatics seems not to deal with such aspects of chemical information as 
part of a process of doing chemistry. Can this be captured by 
another system?


Best wishes,


Joseph



Ursprüngliche Nachricht
Von: petitjean.chi...@gmail.com
Datum: 16.09.2011 09:44
An: fis@listas.unizar.es
Betreff: [Fis] Chemical information: a field of fuzzy contours ?

Chemical information: a field of fuzzy contours ?
-

Before turning to chemistry, I would recall some facts that I noticed
on the FIS forum:
although many people consider that a unifying definition of
information science is possible (to be constructed),
a number of other people consider that there are many concepts of
information which are not necessarily
the facets of an unique concept, so that it could be better to speak
about information scienceS,
and not about information science.
I can read on http://en.wikipedia.org/wiki/Information_science
 Information science is an interdisciplinary science primarily
concerned with the
analysis, collection, classification, manipulation, storage, retrieval
and dissemination of information. 
and some fewer lines above:
 Information Science consists of having the knowledge and
understanding on how to collect, classify, manipulate, store, retrieve
and disseminate any type of information. 
Clearly, collecting, storing, and retrieving information let us think
that we must deal with databases.
The question where is information is neglected, although answering
it is enlighting:
no doubt that much information is stored in data banks.
There are strong connections of Information Science(s) with Data
Mining (DM) and Knowledge Discovery in Databases (KDD).

Is the situation clearer in chemistry ?

Undoubtly there is a field of chemical information.

The ACS (American Chemical Society) has a Division of Chemical
Information (CINF),
named as such in 1975, but which in fact goes back to 1943
(http://www.acscinf.org/).
CINF is active and organizes various meetings which can be retrieved on the web.
Visit also http://www.libsci.sc.edu/bob/chemnet/chchron.htm, an
informative website.

The ACS publishes the Journal of Chemical Information and Modeling
renamed so in 2005
after having been named Journal of Chemical Information and Computer
Sciences from 1975 to 2004,
itself being the continuation of the Journal of Chemical
Documentation from 1961 to 1974.
In fact, it is the same journal (one volume per year), which turned to
chemical information the same year that CINF received his actual name.

Interestingly, still in 1975, the main cheminformatics lab in France
(in fact the only one in France at this time) was renamed.
The old name was LCOP (Laboratoire de Chimie Organique Physique),
and the new name was ITODYS, still in vigor,
meaning until 2001: Institut de TOpologie et de DYnamique des
Systemes. This name, which can be understood in English due
to the close similarity between the French and the English words, was
partly due to the existence of a distance in the molecular graphs
(this distance is the smaller number of chemical bonds separating two
atoms), and as known, a distance induces a topology:
it clearly acknowledged the cheminformatics aspects of the research
performed in the lab.

Chemical Information Science, which is sometimes named Chemical Informatics
(http://www.indiana.edu/~cheminfo/acs800/soced_wash.html)
can be reasonably considered to be a part of the Cheminformatics field.
This latter is defined on Wikipedia
(http://en.wikipedia.org/wiki/Cheminformatics):
Chemoinformatics is the mixing of those information resources to
transform data into information and
information into knowledge for the intended purpose of making better
decisions faster in the area of
drug lead identification and optimization.
This definition, dated from 1998, clearly acknowledges the extraction
of information from data,
but it is restrictive since it discards all pioneering works about
computerization of chemical databases,
including structural formulas coding and structural motifs retrieval,
which historically cannot be denied
to be the core of the cheminformatics field.

Now let me write more lines about the story of cheminformatics in France,
which is a bit funny but enlights the debate on the definition on the
field of chemical information.
The French pioneer was Jacques-Emile Dubois (1920-2005), founder of
the LCOP and of the ITODYS,
who published his first cheminformatics paper in 1966. One of his main
ideas was to use 

Re: [Fis] Chemical information: a field of fuzzy contours ?

2011-09-17 Thread Michel Petitjean
Dear Joe, dear FISErs,

An organic chemist is able to predict a number of properties from the
structural formula, including much about reactivity of the compound.
But as you know, doing that properly is extremely difficult in a
number of cases, because the rules governing reactivity are much more
complicated that the ones which are taught at Universities, and the
number of rules expands rapidly each year. In fact, an experienced
Organic Chemist has in his head a so extraordinary rich collection of
rules and a so enormous knowledge that even many chemists which are
not Organicians cannot imagine the extent of this knowledge.
It is clear that the doing chemistry process derives from these
rules (these rules are chemical information), not only from the
formulas.
Since the 70's, some cheminformaticians tried to store that in
databases: reactions databases plus databases of reactivity rules for
computer sssisted synthesis or retrosynthesis, etc., then built
programmes intended to output proposals supposed to help the chemist.
As far as I know, the brain of the Organician is still by far much
more efficient than the best softwares which were produced.
So, I may tell that the information available in the brain of the
Organician is extremely difficult to store on computer, and it is even
very difficult to teach it, apart the very beginning.
There are examples other than reactivity. A huge of QSAR studies were
done in order to predict various physico-chemical properties of simple
chemical compounds, e.g., predicting from the structural formulas the
boiling temperatures of monofunctional compounds such as alcohols,
cetones, etc. at 20 C under 1 atm. But even in these apparently simple
cases, the chemical information we need to do that with an acceptable
accuracy is difficult to extract: the conclusions of such QSAR studies
cannot be applied to any alcohol or cetone (still assumed to be
monofunctional compounds), and it is even difficult to know the extent
of validity of the published empirical rules, concretely often
summarized by some regression coefficients.
The example of spectroscopic databases is also of interest. How
simulate spectras (infrared, NMR, mass spectras, etc.) of chemical
compounds ? Starting from the structural formula, it is really hard to
simulate, e.g. a low resolution mass spectra. Most time, it was
attempted to extract rules from spectroscopic databases, then try to
predict the spectra of a compound absent from the database, or
conversely, retrieving the structural formula of a compound from its
spectra(s). Many such softwares were developped since the 70's (one of
the oldest ones is STIRS), but really the chemical information needed
to do that properly is very difficult to extract.
To conclude, I retain your example of crystallization: for sure when
we will able to retrieve from the structural formula H-O-H that water
under 1 atm should crystallize at 0 C, then for sure we will be ready
to predict more about crystallization of chemicals.

Best regards,

Michel.

2011/9/17 joe.bren...@bluewin.ch joe.bren...@bluewin.ch:
 Dear Michel and FIS Colleagues,

 This will be an interesting discussion, since the core nature and role of
 information will be involved. Here is just one first point: to me, as a
 chemist, chemical information is only secondarily an object capable of
 being formalized, archived, etc. A formula has meaning for me in terms
 of the potential reactions the molecule to which it refers can undergo, what
 it looked like when crystallized for the first time and so on.

 Cheminformatics seems not to deal with such aspects of chemical information
 as part of a process of doing chemistry. Can this be captured by  another 
 system?

 Best wishes,

 Joseph


___
fis mailing list
fis@listas.unizar.es
https://webmail.unizar.es/cgi-bin/mailman/listinfo/fis


Re: [Fis] Chemical information: a field of fuzzy contours ?

2011-09-17 Thread Stanley N Salthe
Michel -- Organic chemistry was known to be the most difficult course in
Columbia University.  But I got interested in it, worked very hard
constantly, and I achieved an  'A'.  But what you say here indicates several
orders of magnitude more difficulty than what I played with in university.
 For me this raises a question about the 'realms of nature', as in the
subsumptive hierarchy: {physical realm {chemical realm {biological realm}}.
 Do you think one should place an 'organic realm' between chemical and
biological?  Or, otherwise, do you think it possible that there might be
organic realms out in the universe not entrained into biology?

STAN

On Sat, Sep 17, 2011 at 1:53 PM, Michel Petitjean 
petitjean.chi...@gmail.com wrote:

 Dear Joe, dear FISErs,

 An organic chemist is able to predict a number of properties from the
 structural formula, including much about reactivity of the compound.
 But as you know, doing that properly is extremely difficult in a
 number of cases, because the rules governing reactivity are much more
 complicated that the ones which are taught at Universities, and the
 number of rules expands rapidly each year. In fact, an experienced
 Organic Chemist has in his head a so extraordinary rich collection of
 rules and a so enormous knowledge that even many chemists which are
 not Organicians cannot imagine the extent of this knowledge.
 It is clear that the doing chemistry process derives from these
 rules (these rules are chemical information), not only from the
 formulas.
 Since the 70's, some cheminformaticians tried to store that in
 databases: reactions databases plus databases of reactivity rules for
 computer sssisted synthesis or retrosynthesis, etc., then built
 programmes intended to output proposals supposed to help the chemist.
 As far as I know, the brain of the Organician is still by far much
 more efficient than the best softwares which were produced.
 So, I may tell that the information available in the brain of the
 Organician is extremely difficult to store on computer, and it is even
 very difficult to teach it, apart the very beginning.
 There are examples other than reactivity. A huge of QSAR studies were
 done in order to predict various physico-chemical properties of simple
 chemical compounds, e.g., predicting from the structural formulas the
 boiling temperatures of monofunctional compounds such as alcohols,
 cetones, etc. at 20 C under 1 atm. But even in these apparently simple
 cases, the chemical information we need to do that with an acceptable
 accuracy is difficult to extract: the conclusions of such QSAR studies
 cannot be applied to any alcohol or cetone (still assumed to be
 monofunctional compounds), and it is even difficult to know the extent
 of validity of the published empirical rules, concretely often
 summarized by some regression coefficients.
 The example of spectroscopic databases is also of interest. How
 simulate spectras (infrared, NMR, mass spectras, etc.) of chemical
 compounds ? Starting from the structural formula, it is really hard to
 simulate, e.g. a low resolution mass spectra. Most time, it was
 attempted to extract rules from spectroscopic databases, then try to
 predict the spectra of a compound absent from the database, or
 conversely, retrieving the structural formula of a compound from its
 spectra(s). Many such softwares were developped since the 70's (one of
 the oldest ones is STIRS), but really the chemical information needed
 to do that properly is very difficult to extract.
 To conclude, I retain your example of crystallization: for sure when
 we will able to retrieve from the structural formula H-O-H that water
 under 1 atm should crystallize at 0 C, then for sure we will be ready
 to predict more about crystallization of chemicals.

 Best regards,

 Michel.

 2011/9/17 joe.bren...@bluewin.ch joe.bren...@bluewin.ch:
  Dear Michel and FIS Colleagues,
 
  This will be an interesting discussion, since the core nature and role of
  information will be involved. Here is just one first point: to me, as a
  chemist, chemical information is only secondarily an object capable of
  being formalized, archived, etc. A formula has meaning for me in terms
  of the potential reactions the molecule to which it refers can undergo,
 what
  it looked like when crystallized for the first time and so on.
 
  Cheminformatics seems not to deal with such aspects of chemical
 information
  as part of a process of doing chemistry. Can this be captured by
 another system?
 
  Best wishes,
 
  Joseph
 

 ___
 fis mailing list
 fis@listas.unizar.es
 https://webmail.unizar.es/cgi-bin/mailman/listinfo/fis

___
fis mailing list
fis@listas.unizar.es
https://webmail.unizar.es/cgi-bin/mailman/listinfo/fis