Hi Alex.
I am not super familiar with the internals of the trees, but I think it
might be possible to implement this based on the scikit-learn trees
without any patches. There are "splitter" and "criterion" classes that
handle the feature processing, and it might be possible to define your
own to implement feature sampling.
That said, I am not sure this is a good idea. The trees are heavily
optimized for working with features and for sorting the features.
For on-demand features, using a bucketing strategy is much more common
as no feature is seen twice.
If your goal are vision applications, there is another gotcha which is
that input and output formats are quite different from what is used in
scikit-learn.
If you want to use random forests on vision applications, I would really
recommend looking into the link that I posted earlier to curfil, or look
at the
work done at microsoft cambridge. I think there is also an
implementation in the point cloud library.
To summarize, in principle I think you might be able to reuse the
scikit-learn code to create trees with on-demand features.
I think this is not a great idea, though, and in particular if you want
to do computer vision applications, I'd highly recommend looking into
other existing implementations.
Best,
Andy
On 11/20/2014 09:54 AM, Alexander Rüsch wrote:
Hey Andreas,
thanks a lot for your quick reply. The gil-released functions are a
bit difficult to handle so I guess we should provide the most common
functions.
Maybe it is possible to make an add-on to enable the on-demand
functionality?
One idea is to make a spin-off of the latest stable version and
implement the on-demand functionality with a set of the most common
functions to choose of. Thus, the user just need to implement
gil-released functions if he really tries new things.
Or is it possible to make a patch to add the functions? This would
probably be the most practical way to give easy access to new
functionality. The need to generate a new patch for every new release
version of sklearn is a disadvantage that should be mentioned.
As you can see I'm searching for a way to use the scikit-learn library
as a strong basis and add my functionality. Because I am just
interested in RDFs I wonder if there will be trouble when I just copy
the "tree" section of sklearn to add my new DecisionTreeClassifier and
import the rest of the scikit-learn library with respect to Cython?
This way I get a small library on its own.
Or is there another safe way to create such an add-on for scikit-learn?
Best,
Alex
PS: I hope this email will reach its aim, otherwise: I reply to this
<http://sourceforge.net/p/scikit-learn/mailman/message/33052675/>.
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general