Dear Jake, Thank you for your inputs. Had a look at cykdtree. Core implementation of the algorithm is in C/C++ modifying which is currently beyond my skill. Will try to contact their team if they entertain special requests. I should be able fork and modify the sklearn algorithm in cython once my current project is complete. Currently going ahead with brute-force method. For now, this thread may be considered closed. Thanks once again!
Regards, Rohin. On Tue, Aug 1, 2017 at 11:29 PM, Jacob Vanderplas <jake...@cs.washington.edu > wrote: > On Tue, Aug 1, 2017 at 10:50 AM, Rohin Kumar <yrohinku...@gmail.com> > wrote: > >> I started with KD-tree but after realising it doesn't support custom >> metrics (I don't know the reason for this - would be nice feature) >> > > The scikit-learn KD-tree doesn't support custom metrics because it > utilizes relatively strong assumptions about the form of the metric when > constructing the tree. The Ball Tree makes fewer assumptions, which is why > it can support arbitrary metrics. It would in principal be possible to > create a KD Tree that supports custom *axis-aligned* metrics, but again I > think that would be too specialized for inclusion in scikit-learn. > > One project you might check out is cykdtree: https://pypi.python. > org/pypi/cykdtree > I'm not certain whether it supports the queries you need, but I would bet > the team behind that would be willing to work toward these sorts of > specialized queries if they don't already exist. > > Jake > > > > >> I shifted to BallTree and was looking for a 2 metric based >> categorisation. After looking around, the best I could find at most were >> brute-force methods written in python (had my own version too) or better >> optimised ones in C or FORTRAN. The closest one was halotools which again >> works with euclidean metric. For now, I will try to get my work done with 2 >> different BallTrees iteratively in bins. If I find a better option will try >> to post an update. >> >> Regards, >> Rohin. >> >> >> On Tue, Aug 1, 2017 at 10:55 PM, Jacob Vanderplas < >> jake...@cs.washington.edu> wrote: >> >>> Hi Rohin, >>> Ah, I see. I don't think a BallTree is the right data structure for an >>> anisotropic N-point query, because it fundamentally assumes spherical >>> symmetry of the metric. You may be able to do something like this with a >>> specialized KD-tree, but scikit-learn doesn't support this, and I don't >>> imagine that it ever will given the very specialized nature of the >>> application. >>> >>> I'm certain someone has written efficient code for this operation in the >>> astronomy community, but I don't know of any good Python package to >>> recommend for this – I'd suggest googling for keywords and seeing where >>> that gets you. >>> >>> Thanks, >>> Jake >>> >>> Jake VanderPlas >>> Senior Data Science Fellow >>> Director of Open Software >>> University of Washington eScience Institute >>> >>> On Tue, Aug 1, 2017 at 6:15 AM, Rohin Kumar <yrohinku...@gmail.com> >>> wrote: >>> >>>> Since you seem to be from Astrophysics/Cosmology background (I am >>>> assuming you are jakevdp - the creator of astroML - if you are - I am >>>> lucky!), I can explain my application scenario. I am trying to calculate >>>> the anisotropic two-point correlation function something like done in >>>> rp_pi_tpcf >>>> <http://halotools.readthedocs.io/en/latest/api/halotools.mock_observables.rp_pi_tpcf.html#halotools.mock_observables.rp_pi_tpcf> >>>> or s_mu_tpcf >>>> <http://halotools.readthedocs.io/en/latest/api/halotools.mock_observables.s_mu_tpcf.html#halotools.mock_observables.s_mu_tpcf> >>>> using pairs (DD,DR,RR) calculated from BallTree.two_point_correlation >>>> >>>> In halotools (http://halotools.readthedocs. >>>> io/en/latest/function_usage/mock_observables_functions.html) it is >>>> implemented using rectangular grids. I could calculate 2pcf with custom >>>> metrics using one variable with BallTree as done in astroML. I intend to >>>> find the anisotropic counter part. >>>> >>>> Thanks & Regards, >>>> Rohin >>>> >>>> >>>> On Tue, Aug 1, 2017 at 5:18 PM, Rohin Kumar <yrohinku...@gmail.com> >>>> wrote: >>>> >>>>> Dear Jake, >>>>> >>>>> Thanks for your response. I meant to group/count pairs in boxes (using >>>>> two arrays simultaneously-hence needing 2 metrics) instead of one distance >>>>> array as the binning parameter. I don't know if the algorithm supports >>>>> such >>>>> a thing. For now, I am proceeding with your suggestion of two ball trees >>>>> at >>>>> huge computational cost. I hope I am able to frame my question properly. >>>>> >>>>> Thanks & Regards, >>>>> Rohin. >>>>> >>>>> >>>>> >>>>> On Mon, Jul 31, 2017 at 8:16 PM, Jacob Vanderplas < >>>>> jake...@cs.washington.edu> wrote: >>>>> >>>>>> On Sun, Jul 30, 2017 at 11:18 AM, Rohin Kumar <yrohinku...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> *update* >>>>>>> >>>>>>> May be it doesn't have to be done at the tree creation level. It >>>>>>> could be using loops and creating two different balltrees. Something >>>>>>> like >>>>>>> >>>>>>> tree1=BallTree(X,metric='metric1') #for x-z plane >>>>>>> tree2=BallTree(X,metric='metric2') #for y-z plane >>>>>>> >>>>>>> And then calculate correlation functions in a loop to get >>>>>>> tpcf(X,r1,r2) using tree1.two_point_correlation(X,r1) and >>>>>>> tree2.two_point_correlation(X,r2) >>>>>>> >>>>>> >>>>>> Hi Rohin, >>>>>> It's not exactly clear to me what you wish the tree to do with the >>>>>> two different metrics, but in any case the ball tree only supports one >>>>>> metric at a time. If you can construct your desired result from two ball >>>>>> trees each with its own metric, then that's probably the best way to >>>>>> proceed, >>>>>> Jake >>>>>> >>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> scikit-learn mailing list >>>>>>> scikit-learn@python.org >>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> scikit-learn mailing list >>>>>> scikit-learn@python.org >>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>> >>>>>> >>>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn@python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn