Hi Alex.

I am not super familiar with the internals of the trees, but I think it might be possible to implement this based on the scikit-learn trees without any patches. There are "splitter" and "criterion" classes that handle the feature processing, and it might be possible to define your
own to implement feature sampling.

That said, I am not sure this is a good idea. The trees are heavily optimized for working with features and for sorting the features. For on-demand features, using a bucketing strategy is much more common as no feature is seen twice.

If your goal are vision applications, there is another gotcha which is that input and output formats are quite different from what is used in scikit-learn. If you want to use random forests on vision applications, I would really recommend looking into the link that I posted earlier to curfil, or look at the work done at microsoft cambridge. I think there is also an implementation in the point cloud library.


To summarize, in principle I think you might be able to reuse the scikit-learn code to create trees with on-demand features. I think this is not a great idea, though, and in particular if you want to do computer vision applications, I'd highly recommend looking into other existing implementations.


Best,
Andy


On 11/20/2014 09:54 AM, Alexander Rüsch wrote:
Hey Andreas,

thanks a lot for your quick reply. The gil-released functions are a bit difficult to handle so I guess we should provide the most common functions. Maybe it is possible to make an add-on to enable the on-demand functionality?

One idea is to make a spin-off of the latest stable version and implement the on-demand functionality with a set of the most common functions to choose of. Thus, the user just need to implement gil-released functions if he really tries new things.

Or is it possible to make a patch to add the functions? This would probably be the most practical way to give easy access to new functionality. The need to generate a new patch for every new release version of sklearn is a disadvantage that should be mentioned.

As you can see I'm searching for a way to use the scikit-learn library as a strong basis and add my functionality. Because I am just interested in RDFs I wonder if there will be trouble when I just copy the "tree" section of sklearn to add my new DecisionTreeClassifier and import the rest of the scikit-learn library with respect to Cython? This way I get a small library on its own.

Or is there another safe way to create such an add-on for scikit-learn?


Best,
Alex

PS: I hope this email will reach its aim, otherwise: I reply to this <http://sourceforge.net/p/scikit-learn/mailman/message/33052675/>.


------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to