Am I correct to assume the only algorithm that will work with a custom
distance metric is "brute"? DWT with 1NN is performing pretty slow with
just 10,000 observations.
New to Python, perhaps I could write the distance metric function more
efficiently?
# Define function to compute dynamic time warp distance between
# two arrays containing multiple time series arrays (each array
# represents a different data attribute. eg Gmail 30DA)
def dtw_2d(u, v, num_ts=80):
"""
Function to compute Dynamic Time Warp Distance
between two arrays of shape m x n.
m: number of time series attributes
n: number of observations in each time series attribute
num_ts: Number of elements in each time series attribute
Due to sklearn constraints, the input array into KNearestNeighbour
is converted from a 3D arrary to a 2D array. This function
converts it back to a 3D array before computing DTW distances
"""
# Reshape u and v into 3D arrays
u_dim = np.shape(u)
v_dim = np.shape(v)
# Calculate num dimensions to add to u & v
u_obs = u_dim[0]/num_ts
v_obs = v_dim[0]/num_ts
# Reshape u & v
new_u = u.reshape(u_obs, num_ts)
new_v = v.reshape(v_obs, num_ts)
# Compute DTW distances between u & v
dtw_distance = []
for a, b in zip(new_u, new_v):
dtw_distance.append(mlpy.dtw.dtw_std(a, b))
# Return the average of all distances
# ToDo: Improve this aggregation metric
return np.average(dtw_distance)
On Fri Jan 10 2014 at 9:42:01 AM, Gael Varoquaux <
gael.varoqu...@normalesup.org> wrote:
> Fully agreed with Lars.
>
> On Fri, Jan 10, 2014 at 02:44:40PM +0100, Lars Buitinck wrote:
> > 2014/1/10 Robert Layton <robertlay...@gmail.com>:
> > > I wonder if that check could be removed -- as long as the input is
> > > fancy-indexable, the code should otherwise not have an issue (until it
> hits
> > > the distance metric, in which case you have that covered).
>
> > -1. Since high-d data is usually a mistake and NumPy offers easy
> > reshaping for the advanced use cases, I think we should leave the code
> > as is. It fits the existing convention that an array has shape
> > (n_samples, n_features) and raises a very clear exception. Passing
> > higher-d data on would raise an exception deep down in the k-NN code,
> > making debugging of easy mistakes harder.
>
> > ------------------------------------------------------------
> ------------------
> > CenturyLink Cloud: The Leader in Enterprise Cloud Services.
> > Learn Why More Businesses Are Choosing CenturyLink Cloud For
> > Critical Workloads, Development Environments & Everything In Between.
> > Get a Quote or Start a Free Trial Today.
> > http://pubads.g.doubleclick.net/gampad/clk?id=119420431&
> iu=/4140/ostg.clktrk
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
> --
> Gael Varoquaux
> Researcher, INRIA Parietal
> Laboratoire de Neuro-Imagerie Assistee par Ordinateur
> NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
> Phone: ++ 33-1-69-08-79-68
> http://gael-varoquaux.info http://twitter.com/GaelVaroquaux
>
> ------------------------------------------------------------
> ------------------
> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
> Learn Why More Businesses Are Choosing CenturyLink Cloud For
> Critical Workloads, Development Environments & Everything In Between.
> Get a Quote or Start a Free Trial Today.
> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&
> iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general