There may also be some implicit optimization going on. Without looking at
the (non pure-python) source of the Minkowski
 metric, a few tests hint at the fact that p=2 and p=1 are probably treated
differently from the rest, because some mathematical tricks can be applied
that avoid calculating the full distance vector. So maybe some of the speed
difference is already explainable by that, but I am not sure which is the
most prevalent here, since I am unfamiliar with the internals of BallTree
and how many of these evaluations it actually does.

Michael


In [38]: d = sklearn.neighbors.dist_metrics.MinkowskiDistance(2)

In [39]: x = np.random.randn(100, 200)

In [40]: y = np.random.randn(200, 200)

In [41]: %timeit d.pairwise(x, y)
10 loops, best of 3: 66.8 ms per loop

In [42]: d = sklearn.neighbors.dist_metrics.MinkowskiDistance(2.1)

In [43]: %timeit d.pairwise(x, y)
1 loops, best of 3: 352 ms per loop

In [44]: d = sklearn.neighbors.dist_metrics.MinkowskiDistance(1.9)

In [45]: %timeit d.pairwise(x, y)
1 loops, best of 3: 353 ms per loop

In [46]: d = sklearn.neighbors.dist_metrics.MinkowskiDistance(1)

In [47]: %timeit d.pairwise(x, y)
10 loops, best of 3: 60.4 ms per loop

In [48]: d = sklearn.neighbors.dist_metrics.MinkowskiDistance(1.1)

In [49]: %timeit d.pairwise(x, y)
1 loops, best of 3: 357 ms per loop

In [50]: d = sklearn.neighbors.dist_metrics.MinkowskiDistance(10)

In [51]: %timeit d.pairwise(x, y)
1 loops, best of 3: 386 ms per loop

In [52]: d = sklearn.neighbors.dist_metrics.MinkowskiDistance(100)

In [53]: %timeit d.pairwise(x, y)
1 loops, best of 3: 403 ms per loop

In [54]: d = sklearn.neighbors.dist_metrics.MinkowskiDistance(1000)

In [55]: %timeit d.pairwise(x, y)
1 loops, best of 3: 472 ms per loop

In [56]:





On Thu, May 14, 2015 at 3:14 PM, Jacob Vanderplas <jake...@cs.washington.edu
> wrote:

> Hi Nafise,
> In general, yes the execution time will depend on the metric, and in a way
> that can be very difficult to predict.
>
> One reason is that because the euclidean metric is the most common, that
> case is slightly optimized. But that won't lead to a factor of ten.
>
> The Ball Tree works by dividing the metric space into spherical nodes, and
> depending on which metric you use, there can be more or less overlap
> between the resulting nodes. As you increase the p value of a minkowski
> metric, there will generally be more overlap between nodes because in some
> sense the amount of space covered by a ball of radius R increases. Any node
> overlap means that you're less likely to throw away branches when querying
> the tree, meaning that more distance computations need to be done. All of
> this is highly dependent on the precise dimension and structure of the data
> you're querying: e.g. if you have uniformly-distributed data in high
> dimensions, there will be a larger slowdown than if you have
> highly-structured low-dimensional data.
>
> Without knowing more details about the data you're looking at, I think
> that's probably what's going on. Note that if you're interested in fast
> queries for a minkowski metric in particular, I'd try the KD Tree: the node
> overlap issue shouldn't come up here.
>    Jake
>
>  Jake VanderPlas
>  Director of Research – Physical Sciences
>  eScience Institute, University of Washington
>  http://www.vanderplas.com
>
> On Thu, May 14, 2015 at 12:48 AM, nafise mehdipoor <mehdipour...@yahoo.com
> > wrote:
>
>> Hi everyone,
>> If the complexity of the balltree implementation depends on the metric?
>> When I run my code to cluster my points and use 'minkowski' metric instead
>> of 'euclidean', the time jumps from 0.8 s to 8 s!!
>> Thank you.
>>
>>
>>
>> ------------------------------------------------------------------------------
>> One dashboard for servers and applications across Physical-Virtual-Cloud
>> Widest out-of-the-box monitoring support with 50+ applications
>> Performance metrics, stats and reports that give you Actionable Insights
>> Deep dive visibility with transaction tracing using APM Insight.
>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to