So that implies that it's run once for every leaf nodes. You could do
similar with sklearn's tree structure, but top to bottom, but it's unlikely
to go faster than the current implementation which only does as many
comparisons as necessary, and which is compiled and statically typed and so
should not benefit from vectorised numpy operations.
I find 1s for a decision tree prediction to be a bit unbelievable. How big
are your trees (e.g. report thisEstimator.tree_.value.shape)?
On Wed, Jul 24, 2013 at 11:45 PM, Arslan, Ali <ali_ars...@brown.edu> wrote:
> Hi Joel
>
> XData in the matlab code is n_features X n_samples.
>
> On Wed, Jul 24, 2013 at 2:04 AM, Joel Nothman <
> jnoth...@student.usyd.edu.au> wrote:
>
>> Hi Ali,
>>
>> Can you describe the shapes/contents of those structures?
>>
>
> tree_node.parent returns another tree object, if available.
> tree_node.dim is a scalar (in this case it's 13)
> tree_node.right_constrain and left_constrain are float values that may or
> may not exist for each node.
> I don't know much about trees yet so unfortunately I won't be able to
> explain further.
>
>
>>
>> Am I right in thinking that this evaluates the entire tree for every
>> sample, rather than just the path from root to a single leaf? I can see
>> that as bringing speed gains if the process is vectorised over samples...?
>>
>
> XData (which is the transposed version of "dada" in my code) is n_features
> X n_samples, specifically it's 20 X 133895.
> I think you're right in that the tree is evaluate for each sample
> (tree_node.dim seems to determine for which feature). I guess the recursion
> over the "tree_node.parent" is an indicator of this.
>
> Any ideas how to implement a similar procedure with the trees in scikit?
> Thanks,
> A
>
>
>
>>
>> - Joel
>>
>>
>> On Wed, Jul 24, 2013 at 12:40 PM, Arslan, Ali <ali_ars...@brown.edu>wrote:
>>
>>> Hi,
>>> I've been running adaboost with DecisionTreeClassifier in a for a
>>> multiclass detection problem (comprises of multiple one-vs-all problems).
>>> The prediction method I'm using is like this:
>>>
>>> for ii,thisLab in enumerate(allLearners):
>>>
>>> res = np.zeros([dada.shape[0]], dtype='float16')
>>>
>>> for jj, thisLearner in enumerate(thisLab):
>>>
>>> my_weights = thisLearner.estimator_weights_
>>>
>>> #tic = time.time()
>>>
>>> for hh, thisEstimator in enumerate(thisLearner):
>>>
>>> res = res+thisEstimator.predict(DATA)*my_weights[hh]
>>> I don't know how straightforward this looks but basically I'm iterating
>>> over labels (or classes), then different estimators in the adaboost to
>>> collect their prediction into one result array (after scaling the results
>>> with each individual tree's weight).
>>>
>>> The innermost part of the loop is taking a bit too long (~1 sec)
>>> considering it's run about 2600 time for my data.
>>>
>>> I was looking for faster/alternative ways of making a prediction and
>>> I've encountered this toolbox for matlab:
>>>
>>> http://graphics.cs.msu.ru/en/science/research/machinelearning/adaboosttoolbox
>>>
>>> This toolbox's prediction method seems pretty succinct and it runs very
>>> fast (0.0015 sec). The function is something like this:
>>>
>>>
>>> function y = calc_output(tree_node, XData)
>>> y = XData(tree_node.dim, :) * 0 + 1;
>>>
>>> for i = 1 : length(tree_node.parent)
>>> y = y .* calc_output(tree_node.parent, XData); % recursively split
>>> based on its parents' constrain
>>> end
>>>
>>> if( length(tree_node.right_constrain) > 0)
>>> y = y .* ((XData(tree_node.dim, :) < tree_node.right_constrain));
>>> end
>>> if( length(tree_node.left_constrain) > 0)
>>> y = y .* ((XData(tree_node.dim, :) > tree_node.left_constrain));
>>> end
>>>
>>>
>>>
>>> I tried to find the analogues of these structures (ie. tree_node.dim ,
>>> tree_node.parent, tree_node. right_constrain) in the "tree object" in
>>> python but I failed to see them.
>>>
>>> I was wondering if it's possible to speed up the prediction like this
>>> matlab example?
>>> Thanks!
>>>
>>> --
>>> Ali B Arslan, M.Sc.
>>> Cognitive, Linguistic and Psychological Sciences
>>> Brown University
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> See everything from the browser to the database with AppDynamics
>>> Get end-to-end visibility with application monitoring from AppDynamics
>>> Isolate bottlenecks and diagnose root cause in seconds.
>>> Start your free trial of AppDynamics Pro today!
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> See everything from the browser to the database with AppDynamics
>> Get end-to-end visibility with application monitoring from AppDynamics
>> Isolate bottlenecks and diagnose root cause in seconds.
>> Start your free trial of AppDynamics Pro today!
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> --
> Ali B Arslan, M.Sc.
> Cognitive, Linguistic and Psychological Sciences
> Brown University
>
>
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general