I think I saw it in the Deep Learning book: http://www.deeplearningbook.org/

Bill

On 3/28/17 9:48 AM, Henrique C. S. Junior wrote:
@Tommaso, this is something like Internal Coordinates[1], right?
@Bill, thanks for the hint, I'll definitely take a look at this.

[1] - https://en.wikipedia.org/wiki/Z-matrix_(chemistry) <https://en.wikipedia.org/wiki/Z-matrix_%28chemistry%29>

On Tue, Mar 28, 2017 at 2:12 AM, Bill Ross <r...@cgl.ucsf.edu <mailto:r...@cgl.ucsf.edu>> wrote:

    Image processing deals with xy coordinates by (as I understand)
    training with multiple permutations of the raw data, in the form
    of translations and rotations in the 2d space. If training with 3d
    data, there would be that much more translating and rotating to
    do, in order to divorce the learning from the incidentals.

    Bill


    On 3/27/17 4:35 PM, Tommaso Costanzo wrote:
    Dear Henrique,
    I am sorry for the poor email I wrote before. What I was saying
    is simply the fact that if you are trying to use the coordinates
    as "features" from an .xyz file then by machine learning you will
    learn at wich coordinate certain atoms will occur so you can only
    make prediction on the coordinate. However, if I correctly
    understood, the "features" representing the coupling J are
    distance, angle, and electron number. Definitely this properties
    can be derived from the XYZ file format from simple geometric
    calculations and the number of electrons will depend from the
    type of atom. So, what I was trying to say is that instead of
    using the XYZ file as input for scikit-learn, I was suggesting to
    do the calculation of angle, distances, electrons' number in
    advance (with other software(s) or directly in python)  and use
    the new calculated matrix as input for scikit-learn. In this case
    the machine will learn how J(AB) varies as a function of angle,
    distance, number of electrons.
    For example

    distance     angle   n el.
    1                  90      1
    1                  90      1
    2                  90      1
    ....                ...        ...

    If you are using a supervised learning you will have to add a 4th
    column ( in reality a separate column vector) with your J(AB) on
    which you can train your model and then predict the unknown samples

    For example
    distance     angle   n el.    J(AB)
    1                  90      1        1
    1                  90      1        1
    2                  90      1         0.5
    ....                ...        ... ...

    Now if you train the model on the second matrix, and then you try
    to predict the first one you should expect a results like:

    1
    1
    0.5

    Of course in this case the "features" are perfectly equal, hence
    the example is completely unrealistic. However, I hope that it
    will help to understand what I was explaining in the previous email.
    If you want you can directly contact me at this email, and I hope
    that you got additional hints from Robert, that he seems to be
    even more knowledgeable than me.

    Sincerely
    Tommaso



    2017-03-27 18:44 GMT-04:00 Henrique C. S. Junior
    <henrique...@gmail.com <mailto:henrique...@gmail.com>>:

        Dear Tommaso, thank you for your kind reply.
        I know I have a lot to study before actually starting any
        code and that's why any suggestion is so valuable.
        So, you're suggesting that a simplification of the system
        using only the paramagnetic centers can be a good approach?
        (I'm not sure if I understood it correctly).
        My main idea was, at first, try to represent the systems as
        realistically as possible (using coordinates). I know that
        the software will not know what a bond is or what an
        intermolecular interaction is but, let's say, after including
        1000s of examples in the training, I was expecting that (as
        an example) finding a C 0.000 and an H at 1.000 should start
        to "make sense" because it leads to an experimental trend.
        And I totally agree that my way to represent the system is
        not the better.

        Thank you so much for all the help.

        On Mon, Mar 27, 2017 at 4:15 PM, Tommaso Costanzo
        <tommaso.costanz...@gmail.com
        <mailto:tommaso.costanz...@gmail.com>> wrote:

            Dear Henrique,


            I agree with Robert on the use of a supervised algorithm
            and I would also suggest you to try a semisupervised one
            if you have trouble in labeling your data.


            Moreover, as a chemist I think that the input you are
            thinking to use is not the in the best form for machine
            learning because you are trying to predict coupling J(AB)
            but in the future space you have only coordinates (XYZ).
            What I suggest is to generate the pair of atoms
            externally and then use a matrix of the form (Mx3), where
            M are the pairs of atoms you want to predict your J and 3
            are the features of the two atoms (distance, angle,
            unpaired electrons). For a supervised approach you will
            need a training set where the J is know so your training
            data will be of the form Mx4 and the fourth feature will
            be the J you know.

            Hope that this is clear, if not I will be happy to help more


            Sincerely

            Tommaso


            2017-03-27 13:46 GMT-04:00 Henrique C. S. Junior
            <henrique...@gmail.com <mailto:henrique...@gmail.com>>:

                Dear Robert, thank you. Yes, I'd like to talk about
                some specifics on the project.
                Thank you again.

                On Mon, Mar 27, 2017 at 2:25 PM, Robert Slater
                <rdsla...@gmail.com <mailto:rdsla...@gmail.com>> wrote:

                    You definitely can use some of the tools in
sci-kit learn for supervised machine learning. The real trick will be how well your training
                    system is representative of your future
                    predictions. All of the various regression
                    algorithms would be of some value and you make
                    even consider an ensemble to help generalize.
                    There will be some important questions to
                    answer--what kind of loss function do you want to
                    look at?  I assumed regression (continuous
                    response) but it could also
                    classify--paramagnetic, diamagnetic,
                    ferromagnetic, etc...

                    Another task to think about might be dimension
                    reduction.
                    There is no guarantee you will get fantastic
                    results--every problem is unique and much will
                    depend on exactly what you want out of the
                    solution--it may be that we get '10%' accuracy at
                    best--for some systems that is quite good, others
                    it is horrible.

                    If you'd like to talk specifics, feel free to
                    contact me at this email.  I have a background in
                    magnetism (PhD in magnetic multilayers--i was
                    physics, but as you are probably aware chemisty
                    and physics blend in this area) and have a fairly
                    good knowledge of sci-kit learn and machine
                    learning.



                    On Mon, Mar 27, 2017 at 10:50 AM, Henrique C. S.
                    Junior <henrique...@gmail.com
                    <mailto:henrique...@gmail.com>> wrote:

                        I'm a chemist with some rudimentary
                        programming skills (getting started with
                        python) and in the middle of the year I'll be
                        starting a Ph.D. project that uses computers
                        to describe magnetism in molecular systems.

                        Most of the time I get my results after
                        several simulations and experiments, so, I
                        know that one of the hardest tasks in
                        molecular magnetism is to predict the nature
                        of magnetic interactions. That's why I'll try
                        to tackle this problem with Machine Learning
                        (because such interactions are dependent,
                        basically, of distances, angles and number of
                        unpaired electrons). The idea is to feed the
                        computer with a large training set (with
                        number of unpaired electrons, XYZ coordinates
                        of each molecule and experimental magnetic
                        couplings) and see if it can predict the
                        magnetic couplings (J(AB)) of new systems:

                        (see example in the attached image)

                        Can Scikit-Learn handle the task, knowing
                        that the matrix used to represent atomic
                        coordinates will probably have a different
                        number of atoms (because some molecules have
                        more atoms than others)? Or is this a job
                        better suited for another software/approach? ​


-- *Henrique C. S. Junior*
                        Industrial Chemist - UFRRJ
                        M. Sc. Inorganic Chemistry - UFRRJ
                        Data Processing Center - PMP
                        Visite o Mundo Químico
                        <http://mundoquimico.com.br>

                        _______________________________________________
                        scikit-learn mailing list
                        scikit-learn@python.org
                        <mailto:scikit-learn@python.org>
                        https://mail.python.org/mailman/listinfo/scikit-learn
                        <https://mail.python.org/mailman/listinfo/scikit-learn>



                    _______________________________________________
                    scikit-learn mailing list
                    scikit-learn@python.org
                    <mailto:scikit-learn@python.org>
                    https://mail.python.org/mailman/listinfo/scikit-learn
                    <https://mail.python.org/mailman/listinfo/scikit-learn>




-- *Henrique C. S. Junior*
                Industrial Chemist - UFRRJ
                M. Sc. Inorganic Chemistry - UFRRJ
                Data Processing Center - PMP
                Visite o Mundo Químico <http://mundoquimico.com.br>

                _______________________________________________
                scikit-learn mailing list
                scikit-learn@python.org <mailto:scikit-learn@python.org>
                https://mail.python.org/mailman/listinfo/scikit-learn
                <https://mail.python.org/mailman/listinfo/scikit-learn>




-- Please do NOT send Microsoft Office Attachments:
            http://www.gnu.org/philosophy/no-word-attachments.html
            <http://www.gnu.org/philosophy/no-word-attachments.html>

            _______________________________________________
            scikit-learn mailing list
            scikit-learn@python.org <mailto:scikit-learn@python.org>
            https://mail.python.org/mailman/listinfo/scikit-learn
            <https://mail.python.org/mailman/listinfo/scikit-learn>




-- *Henrique C. S. Junior*
        Industrial Chemist - UFRRJ
        M. Sc. Inorganic Chemistry - UFRRJ
        Data Processing Center - PMP
        Visite o Mundo Químico <http://mundoquimico.com.br>

        _______________________________________________
        scikit-learn mailing list
        scikit-learn@python.org <mailto:scikit-learn@python.org>
        https://mail.python.org/mailman/listinfo/scikit-learn
        <https://mail.python.org/mailman/listinfo/scikit-learn>




-- Please do NOT send Microsoft Office Attachments:
    http://www.gnu.org/philosophy/no-word-attachments.html
    <http://www.gnu.org/philosophy/no-word-attachments.html>


    _______________________________________________
    scikit-learn mailing list
    scikit-learn@python.org <mailto:scikit-learn@python.org>
    https://mail.python.org/mailman/listinfo/scikit-learn
    <https://mail.python.org/mailman/listinfo/scikit-learn>
    _______________________________________________ scikit-learn
    mailing list scikit-learn@python.org
    <mailto:scikit-learn@python.org>
    https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
--
*Henrique C. S. Junior* Industrial Chemist - UFRRJ
M. Sc. Inorganic Chemistry - UFRRJ Data Processing Center - PMP
Visite o Mundo Químico <http://mundoquimico.com.br>

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to