@Tommaso, this is something like Internal Coordinates[1], right? @Bill, thanks for the hint, I'll definitely take a look at this.
[1] - https://en.wikipedia.org/wiki/Z-matrix_(chemistry) On Tue, Mar 28, 2017 at 2:12 AM, Bill Ross <r...@cgl.ucsf.edu> wrote: > Image processing deals with xy coordinates by (as I understand) training > with multiple permutations of the raw data, in the form of translations and > rotations in the 2d space. If training with 3d data, there would be that > much more translating and rotating to do, in order to divorce the learning > from the incidentals. > > Bill > > On 3/27/17 4:35 PM, Tommaso Costanzo wrote: > > Dear Henrique, > I am sorry for the poor email I wrote before. What I was saying is simply > the fact that if you are trying to use the coordinates as "features" from > an .xyz file then by machine learning you will learn at wich coordinate > certain atoms will occur so you can only make prediction on the coordinate. > However, if I correctly understood, the "features" representing the > coupling J are distance, angle, and electron number. Definitely this > properties can be derived from the XYZ file format from simple geometric > calculations and the number of electrons will depend from the type of atom. > So, what I was trying to say is that instead of using the XYZ file as input > for scikit-learn, I was suggesting to do the calculation of angle, > distances, electrons' number in advance (with other software(s) or directly > in python) and use the new calculated matrix as input for scikit-learn. In > this case the machine will learn how J(AB) varies as a function of angle, > distance, number of electrons. > For example > > distance angle n el. > 1 90 1 > 1 90 1 > 2 90 1 > .... ... ... > > If you are using a supervised learning you will have to add a 4th column ( > in reality a separate column vector) with your J(AB) on which you can train > your model and then predict the unknown samples > > For example > distance angle n el. J(AB) > 1 90 1 1 > 1 90 1 1 > 2 90 1 0.5 > .... ... ... ... > > Now if you train the model on the second matrix, and then you try to > predict the first one you should expect a results like: > > 1 > 1 > 0.5 > > Of course in this case the "features" are perfectly equal, hence the > example is completely unrealistic. However, I hope that it will help to > understand what I was explaining in the previous email. > If you want you can directly contact me at this email, and I hope that you > got additional hints from Robert, that he seems to be even more > knowledgeable than me. > > Sincerely > Tommaso > > > > 2017-03-27 18:44 GMT-04:00 Henrique C. S. Junior <henrique...@gmail.com>: > >> Dear Tommaso, thank you for your kind reply. >> I know I have a lot to study before actually starting any code and that's >> why any suggestion is so valuable. >> So, you're suggesting that a simplification of the system using only the >> paramagnetic centers can be a good approach? (I'm not sure if I understood >> it correctly). >> My main idea was, at first, try to represent the systems as realistically >> as possible (using coordinates). I know that the software will not know >> what a bond is or what an intermolecular interaction is but, let's say, >> after including 1000s of examples in the training, I was expecting that (as >> an example) finding a C 0.000 and an H at 1.000 should start to "make >> sense" because it leads to an experimental trend. And I totally agree that >> my way to represent the system is not the better. >> >> Thank you so much for all the help. >> >> On Mon, Mar 27, 2017 at 4:15 PM, Tommaso Costanzo < >> tommaso.costanz...@gmail.com> wrote: >> >>> Dear Henrique, >>> >>> >>> I agree with Robert on the use of a supervised algorithm and I would >>> also suggest you to try a semisupervised one if you have trouble in >>> labeling your data. >>> >>> >>> Moreover, as a chemist I think that the input you are thinking to use is >>> not the in the best form for machine learning because you are trying to >>> predict coupling J(AB) but in the future space you have only coordinates >>> (XYZ). What I suggest is to generate the pair of atoms externally and then >>> use a matrix of the form (Mx3), where M are the pairs of atoms you want to >>> predict your J and 3 are the features of the two atoms (distance, angle, >>> unpaired electrons). For a supervised approach you will need a training set >>> where the J is know so your training data will be of the form Mx4 and the >>> fourth feature will be the J you know. >>> >>> Hope that this is clear, if not I will be happy to help more >>> >>> >>> Sincerely >>> >>> Tommaso >>> >>> 2017-03-27 13:46 GMT-04:00 Henrique C. S. Junior <henrique...@gmail.com> >>> : >>> >>>> Dear Robert, thank you. Yes, I'd like to talk about some specifics on >>>> the project. >>>> Thank you again. >>>> >>>> On Mon, Mar 27, 2017 at 2:25 PM, Robert Slater <rdsla...@gmail.com> >>>> wrote: >>>> >>>>> You definitely can use some of the tools in sci-kit learn for >>>>> supervised machine learning. The real trick will be how well your >>>>> training >>>>> system is representative of your future predictions. All of the various >>>>> regression algorithms would be of some value and you make even consider an >>>>> ensemble to help generalize. There will be some important questions to >>>>> answer--what kind of loss function do you want to look at? I assumed >>>>> regression (continuous response) but it could also classify--paramagnetic, >>>>> diamagnetic, ferromagnetic, etc... >>>>> >>>>> Another task to think about might be dimension reduction. >>>>> There is no guarantee you will get fantastic results--every problem is >>>>> unique and much will depend on exactly what you want out of the >>>>> solution--it may be that we get '10%' accuracy at best--for some systems >>>>> that is quite good, others it is horrible. >>>>> >>>>> If you'd like to talk specifics, feel free to contact me at this >>>>> email. I have a background in magnetism (PhD in magnetic multilayers--i >>>>> was physics, but as you are probably aware chemisty and physics blend in >>>>> this area) and have a fairly good knowledge of sci-kit learn and machine >>>>> learning. >>>>> >>>>> >>>>> >>>>> On Mon, Mar 27, 2017 at 10:50 AM, Henrique C. S. Junior < >>>>> henrique...@gmail.com> wrote: >>>>> >>>>>> I'm a chemist with some rudimentary programming skills (getting >>>>>> started with python) and in the middle of the year I'll be starting a >>>>>> Ph.D. >>>>>> project that uses computers to describe magnetism in molecular systems. >>>>>> >>>>>> Most of the time I get my results after several simulations and >>>>>> experiments, so, I know that one of the hardest tasks in molecular >>>>>> magnetism is to predict the nature of magnetic interactions. That's why >>>>>> I'll try to tackle this problem with Machine Learning (because such >>>>>> interactions are dependent, basically, of distances, angles and number of >>>>>> unpaired electrons). The idea is to feed the computer with a large >>>>>> training >>>>>> set (with number of unpaired electrons, XYZ coordinates of each molecule >>>>>> and experimental magnetic couplings) and see if it can predict the >>>>>> magnetic >>>>>> couplings (J(AB)) of new systems: >>>>>> (see example in the attached image) >>>>>> >>>>>> Can Scikit-Learn handle the task, knowing that the matrix used to >>>>>> represent atomic coordinates will probably have a different number of >>>>>> atoms >>>>>> (because some molecules have more atoms than others)? Or is this a job >>>>>> better suited for another software/approach? >>>>>> >>>>>> >>>>>> -- >>>>>> *Henrique C. S. Junior* >>>>>> Industrial Chemist - UFRRJ >>>>>> M. Sc. Inorganic Chemistry - UFRRJ >>>>>> Data Processing Center - PMP >>>>>> Visite o Mundo Químico <http://mundoquimico.com.br> >>>>>> >>>>>> _______________________________________________ >>>>>> scikit-learn mailing list >>>>>> scikit-learn@python.org >>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn@python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>>> >>>> >>>> >>>> -- >>>> *Henrique C. S. Junior* >>>> Industrial Chemist - UFRRJ >>>> M. Sc. Inorganic Chemistry - UFRRJ >>>> Data Processing Center - PMP >>>> Visite o Mundo Químico <http://mundoquimico.com.br> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn@python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>> >>> >>> -- >>> Please do NOT send Microsoft Office Attachments: >>> http://www.gnu.org/philosophy/no-word-attachments.html >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> >> -- >> *Henrique C. S. Junior* >> Industrial Chemist - UFRRJ >> M. Sc. Inorganic Chemistry - UFRRJ >> Data Processing Center - PMP >> Visite o Mundo Químico <http://mundoquimico.com.br> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > > -- > Please do NOT send Microsoft Office Attachments: > http://www.gnu.org/philosophy/no-word-attachments.html > > > _______________________________________________ > scikit-learn mailing > listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -- *Henrique C. S. Junior* Industrial Chemist - UFRRJ M. Sc. Inorganic Chemistry - UFRRJ Data Processing Center - PMP Visite o Mundo Químico <http://mundoquimico.com.br>
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn