Re: [scikit-learn] Using Scikit-Learn to predict magnetism in chemical systems

Bill Ross Mon, 27 Mar 2017 23:03:07 -0700

Image processing deals with xy coordinates by (as I understand) trainingwith multiple permutations of the raw data, in the form of translationsand rotations in the 2d space. If training with 3d data, there would bethat much more translating and rotating to do, in order to divorce thelearning from the incidentals.


Bill



On 3/27/17 4:35 PM, Tommaso Costanzo wrote:

Dear Henrique,

I am sorry for the poor email I wrote before. What I was saying issimply the fact that if you are trying to use the coordinates as"features" from an .xyz file then by machine learning you will learnat wich coordinate certain atoms will occur so you can only makeprediction on the coordinate. However, if I correctly understood, the"features" representing the coupling J are distance, angle, andelectron number. Definitely this properties can be derived from theXYZ file format from simple geometric calculations and the number ofelectrons will depend from the type of atom. So, what I was trying tosay is that instead of using the XYZ file as input for scikit-learn, Iwas suggesting to do the calculation of angle, distances, electrons'number in advance (with other software(s) or directly in python) anduse the new calculated matrix as input for scikit-learn. In this casethe machine will learn how J(AB) varies as a function of angle,distance, number of electrons.

For example

distance     angle   n el.
1                  90      1
1                  90      1
2                  90      1
....                ...        ...

If you are using a supervised learning you will have to add a 4thcolumn ( in reality a separate column vector) with your J(AB) on whichyou can train your model and then predict the unknown samples


For example
distance     angle   n el.    J(AB)
1                  90      1        1
1                  90      1        1
2                  90      1         0.5
....                ...        ...       ...

Now if you train the model on the second matrix, and then you try topredict the first one you should expect a results like:


1
1
0.5

Of course in this case the "features" are perfectly equal, hence theexample is completely unrealistic. However, I hope that it will helpto understand what I was explaining in the previous email.If you want you can directly contact me at this email, and I hope thatyou got additional hints from Robert, that he seems to be even moreknowledgeable than me.


Sincerely
Tommaso

2017-03-27 18:44 GMT-04:00 Henrique C. S. Junior<henrique...@gmail.com <mailto:henrique...@gmail.com>>:


    Dear Tommaso, thank you for your kind reply.
    I know I have a lot to study before actually starting any code and
    that's why any suggestion is so valuable.
    So, you're suggesting that a simplification of the system using
    only the paramagnetic centers can be a good approach? (I'm not
    sure if I understood it correctly).
    My main idea was, at first, try to represent the systems as
    realistically as possible (using coordinates). I know that the
    software will not know what a bond is or what an intermolecular
    interaction is but, let's say, after including 1000s of examples
    in the training, I was expecting that (as an example) finding a C
    0.000 and an H at 1.000 should start to "make sense" because it
    leads to an experimental trend. And I totally agree that my way to
    represent the system is not the better.

    Thank you so much for all the help.

    On Mon, Mar 27, 2017 at 4:15 PM, Tommaso Costanzo
    <tommaso.costanz...@gmail.com
    <mailto:tommaso.costanz...@gmail.com>> wrote:

        Dear Henrique,


        I agree with Robert on the use of a supervised algorithm and I
        would also suggest you to try a semisupervised one if you have
        trouble in labeling your data.


        Moreover, as a chemist I think that the input you are thinking
        to use is not the in the best form for machine learning
        because you are trying to predict coupling J(AB) but in the
        future space you have only coordinates (XYZ). What I suggest
        is to generate the pair of atoms externally and then use a
        matrix of the form (Mx3), where M are the pairs of atoms you
        want to predict your J and 3 are the features of the two atoms
        (distance, angle, unpaired electrons). For a supervised
        approach you will need a training set where the J is know so
        your training data will be of the form Mx4 and the fourth
        feature will be the J you know.

        Hope that this is clear, if not I will be happy to help more


        Sincerely

        Tommaso


        2017-03-27 13:46 GMT-04:00 Henrique C. S. Junior
        <henrique...@gmail.com <mailto:henrique...@gmail.com>>:

            Dear Robert, thank you. Yes, I'd like to talk about some
            specifics on the project.
            Thank you again.

            On Mon, Mar 27, 2017 at 2:25 PM, Robert Slater
            <rdsla...@gmail.com <mailto:rdsla...@gmail.com>> wrote:

                You definitely can use some of the tools in sci-kit
                learn for supervised machine learning.  The real trick
                will be how well your training system is
                representative of your future predictions.  All of the
                various regression algorithms would be of some value
                and you make even consider an ensemble to help
                generalize.  There will be some important questions to
                answer--what kind of loss function do you want to look
                at?  I assumed regression (continuous response) but it
                could also classify--paramagnetic, diamagnetic,
                ferromagnetic, etc...

                Another task to think about might be dimension reduction.
                There is no guarantee you will get fantastic
                results--every problem is unique and much will depend
                on exactly what you want out of the solution--it may
                be that we get '10%' accuracy at best--for some
                systems that is quite good, others it is horrible.

                If you'd like to talk specifics, feel free to contact
                me at this email. I have a background in magnetism
                (PhD in magnetic multilayers--i was physics, but as
                you are probably aware chemisty and physics blend in
                this area) and have a fairly good knowledge of sci-kit
                learn and machine learning.



                On Mon, Mar 27, 2017 at 10:50 AM, Henrique C. S.
                Junior <henrique...@gmail.com
                <mailto:henrique...@gmail.com>> wrote:

                    I'm a chemist with some rudimentary programming
                    skills (getting started with python) and in the
                    middle of the year I'll be starting a Ph.D.
                    project that uses computers to describe magnetism
                    in molecular systems.

                    Most of the time I get my results after several
                    simulations and experiments, so, I know that one
                    of the hardest tasks in molecular magnetism is to
                    predict the nature of magnetic interactions.
                    That's why I'll try to tackle this problem with
                    Machine Learning (because such interactions are
                    dependent, basically, of distances, angles and
                    number of unpaired electrons). The idea is to feed
                    the computer with a large training set (with
                    number of unpaired electrons, XYZ coordinates of
                    each molecule and experimental magnetic couplings)
                    and see if it can predict the magnetic couplings
                    (J(AB)) of new systems:

                    (see example in the attached image)

                    Can Scikit-Learn handle the task, knowing that the
                    matrix used to represent atomic coordinates will
                    probably have a different number of atoms (because
                    some molecules have more atoms than others)? Or is
                    this a job better suited for another
                    software/approach?

--*Henrique C. S. Junior*

                    Industrial Chemist - UFRRJ
                    M. Sc. Inorganic Chemistry - UFRRJ
                    Data Processing Center - PMP
                    Visite o Mundo Químico <http://mundoquimico.com.br>

                    _______________________________________________
                    scikit-learn mailing list
                    scikit-learn@python.org
                    <mailto:scikit-learn@python.org>
                    https://mail.python.org/mailman/listinfo/scikit-learn
                    <https://mail.python.org/mailman/listinfo/scikit-learn>



                _______________________________________________
                scikit-learn mailing list
                scikit-learn@python.org <mailto:scikit-learn@python.org>
                https://mail.python.org/mailman/listinfo/scikit-learn
                <https://mail.python.org/mailman/listinfo/scikit-learn>

--*Henrique C. S. Junior*

            Industrial Chemist - UFRRJ
            M. Sc. Inorganic Chemistry - UFRRJ
            Data Processing Center - PMP
            Visite o Mundo Químico <http://mundoquimico.com.br>

            _______________________________________________
            scikit-learn mailing list
            scikit-learn@python.org <mailto:scikit-learn@python.org>
            https://mail.python.org/mailman/listinfo/scikit-learn
            <https://mail.python.org/mailman/listinfo/scikit-learn>

--Please do NOT send Microsoft Office Attachments:

        http://www.gnu.org/philosophy/no-word-attachments.html
        <http://www.gnu.org/philosophy/no-word-attachments.html>

        _______________________________________________
        scikit-learn mailing list
        scikit-learn@python.org <mailto:scikit-learn@python.org>
        https://mail.python.org/mailman/listinfo/scikit-learn
        <https://mail.python.org/mailman/listinfo/scikit-learn>

--*Henrique C. S. Junior*

    Industrial Chemist - UFRRJ
    M. Sc. Inorganic Chemistry - UFRRJ
    Data Processing Center - PMP
    Visite o Mundo Químico <http://mundoquimico.com.br>

    _______________________________________________
    scikit-learn mailing list
    scikit-learn@python.org <mailto:scikit-learn@python.org>
    https://mail.python.org/mailman/listinfo/scikit-learn
    <https://mail.python.org/mailman/listinfo/scikit-learn>




--
Please do NOT send Microsoft Office Attachments:
http://www.gnu.org/philosophy/no-word-attachments.html


_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Using Scikit-Learn to predict magnetism in chemical systems

Reply via email to