Dear Henrique,
I am sorry for the poor email I wrote before. What I was saying
is simply the fact that if you are trying to use the coordinates
as "features" from an .xyz file then by machine learning you will
learn at wich coordinate certain atoms will occur so you can only
make prediction on the coordinate. However, if I correctly
understood, the "features" representing the coupling J are
distance, angle, and electron number. Definitely this properties
can be derived from the XYZ file format from simple geometric
calculations and the number of electrons will depend from the
type of atom. So, what I was trying to say is that instead of
using the XYZ file as input for scikit-learn, I was suggesting to
do the calculation of angle, distances, electrons' number in
advance (with other software(s) or directly in python) and use
the new calculated matrix as input for scikit-learn. In this case
the machine will learn how J(AB) varies as a function of angle,
distance, number of electrons.
For example
distance angle n el.
1 90 1
1 90 1
2 90 1
.... ... ...
If you are using a supervised learning you will have to add a 4th
column ( in reality a separate column vector) with your J(AB) on
which you can train your model and then predict the unknown samples
For example
distance angle n el. J(AB)
1 90 1 1
1 90 1 1
2 90 1 0.5
.... ... ... ...
Now if you train the model on the second matrix, and then you try
to predict the first one you should expect a results like:
1
1
0.5
Of course in this case the "features" are perfectly equal, hence
the example is completely unrealistic. However, I hope that it
will help to understand what I was explaining in the previous email.
If you want you can directly contact me at this email, and I hope
that you got additional hints from Robert, that he seems to be
even more knowledgeable than me.
Sincerely
Tommaso
2017-03-27 18:44 GMT-04:00 Henrique C. S. Junior
<henrique...@gmail.com <mailto:henrique...@gmail.com>>:
Dear Tommaso, thank you for your kind reply.
I know I have a lot to study before actually starting any
code and that's why any suggestion is so valuable.
So, you're suggesting that a simplification of the system
using only the paramagnetic centers can be a good approach?
(I'm not sure if I understood it correctly).
My main idea was, at first, try to represent the systems as
realistically as possible (using coordinates). I know that
the software will not know what a bond is or what an
intermolecular interaction is but, let's say, after including
1000s of examples in the training, I was expecting that (as
an example) finding a C 0.000 and an H at 1.000 should start
to "make sense" because it leads to an experimental trend.
And I totally agree that my way to represent the system is
not the better.
Thank you so much for all the help.
On Mon, Mar 27, 2017 at 4:15 PM, Tommaso Costanzo
<tommaso.costanz...@gmail.com
<mailto:tommaso.costanz...@gmail.com>> wrote:
Dear Henrique,
I agree with Robert on the use of a supervised algorithm
and I would also suggest you to try a semisupervised one
if you have trouble in labeling your data.
Moreover, as a chemist I think that the input you are
thinking to use is not the in the best form for machine
learning because you are trying to predict coupling J(AB)
but in the future space you have only coordinates (XYZ).
What I suggest is to generate the pair of atoms
externally and then use a matrix of the form (Mx3), where
M are the pairs of atoms you want to predict your J and 3
are the features of the two atoms (distance, angle,
unpaired electrons). For a supervised approach you will
need a training set where the J is know so your training
data will be of the form Mx4 and the fourth feature will
be the J you know.
Hope that this is clear, if not I will be happy to help more
Sincerely
Tommaso
2017-03-27 13:46 GMT-04:00 Henrique C. S. Junior
<henrique...@gmail.com <mailto:henrique...@gmail.com>>:
Dear Robert, thank you. Yes, I'd like to talk about
some specifics on the project.
Thank you again.
On Mon, Mar 27, 2017 at 2:25 PM, Robert Slater
<rdsla...@gmail.com <mailto:rdsla...@gmail.com>> wrote:
You definitely can use some of the tools in
sci-kit learn for supervised machine learning.
The real trick will be how well your training
system is representative of your future
predictions. All of the various regression
algorithms would be of some value and you make
even consider an ensemble to help generalize.
There will be some important questions to
answer--what kind of loss function do you want to
look at? I assumed regression (continuous
response) but it could also
classify--paramagnetic, diamagnetic,
ferromagnetic, etc...
Another task to think about might be dimension
reduction.
There is no guarantee you will get fantastic
results--every problem is unique and much will
depend on exactly what you want out of the
solution--it may be that we get '10%' accuracy at
best--for some systems that is quite good, others
it is horrible.
If you'd like to talk specifics, feel free to
contact me at this email. I have a background in
magnetism (PhD in magnetic multilayers--i was
physics, but as you are probably aware chemisty
and physics blend in this area) and have a fairly
good knowledge of sci-kit learn and machine
learning.
On Mon, Mar 27, 2017 at 10:50 AM, Henrique C. S.
Junior <henrique...@gmail.com
<mailto:henrique...@gmail.com>> wrote:
I'm a chemist with some rudimentary
programming skills (getting started with
python) and in the middle of the year I'll be
starting a Ph.D. project that uses computers
to describe magnetism in molecular systems.
Most of the time I get my results after
several simulations and experiments, so, I
know that one of the hardest tasks in
molecular magnetism is to predict the nature
of magnetic interactions. That's why I'll try
to tackle this problem with Machine Learning
(because such interactions are dependent,
basically, of distances, angles and number of
unpaired electrons). The idea is to feed the
computer with a large training set (with
number of unpaired electrons, XYZ coordinates
of each molecule and experimental magnetic
couplings) and see if it can predict the
magnetic couplings (J(AB)) of new systems:
(see example in the attached image)
Can Scikit-Learn handle the task, knowing
that the matrix used to represent atomic
coordinates will probably have a different
number of atoms (because some molecules have
more atoms than others)? Or is this a job
better suited for another software/approach?
--
*Henrique C. S. Junior*
Industrial Chemist - UFRRJ
M. Sc. Inorganic Chemistry - UFRRJ
Data Processing Center - PMP
Visite o Mundo Químico
<http://mundoquimico.com.br>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
<mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
<mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
--
*Henrique C. S. Junior*
Industrial Chemist - UFRRJ
M. Sc. Inorganic Chemistry - UFRRJ
Data Processing Center - PMP
Visite o Mundo Químico <http://mundoquimico.com.br>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
--
Please do NOT send Microsoft Office Attachments:
http://www.gnu.org/philosophy/no-word-attachments.html
<http://www.gnu.org/philosophy/no-word-attachments.html>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
--
*Henrique C. S. Junior*
Industrial Chemist - UFRRJ
M. Sc. Inorganic Chemistry - UFRRJ
Data Processing Center - PMP
Visite o Mundo Químico <http://mundoquimico.com.br>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
--
Please do NOT send Microsoft Office Attachments:
http://www.gnu.org/philosophy/no-word-attachments.html
<http://www.gnu.org/philosophy/no-word-attachments.html>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>