On Thu, Sep 27, 2012 at 2:45 PM, Lars Buitinck <[email protected]> wrote: > 2012/9/27 Doug Coleman <[email protected]>: >> 1) scikit's libsvm checkin is currently version 300. The last release >> was in April and the version is 312. Are there plans to use the newer >> version? The svm_node struct changed, so it's not as trivial as >> dropping in the files. > > What are the major benefits? > >> 2) libsvm comes with a ctypes binding. What if scikit contributed a >> cython binding that was in the libsvm project? Then scikit could just >> use the libsvm cython module for implementing fit(), predict(), etc. > > That would mean that any changes to this Cython binding would have to > be contributed upstream, as you already suggested. >
Right, but just so we're clear, there are different levels of upstream? If sklearn maintains a modified version of libsvm, then "contributing upstream" is simply a matter of committing to this modified branch. There is a further-upstream branch (author's official version) that none of us controls, which has it's own release cycle, but which in principle may change significantly, and change for the better in directions that we will want to include. >> 4) I opened another issue on my github of some compiler warnings from >> the clang++ compiler. It turns out that there are a lot of calls to >> malloc where the return pointer is unchecked. So basically the library >> can crash at any time. Someone already offered to make a nontrivial >> patch using std::vector and new to fix it. How do we want to proceed? >> What's the strategy to merge the change with libsvm and the scikit >> project? >> >> https://github.com/erg/libsvm/issues/1 > > That was me :) > > I'm quite swamped ATM, but I did intend to refactor our LibSVM > bindings sometime in the near future (preferably before the next > release). I intend to decouple all the prediction code from LibSVM and > rewrite it in Python/Cython, just like I did for our Liblinear > binding. > > I'm not sure what to do with the training code yet, but after looking > at it again, I'm more and more enclined to go with Mathieu's > suggestion of maintaining our own version. The second thing on my list > would be to check how much of the code can go away. > Why do you want to rewrite the predict code, which seems to be already working? (Doesn't this further divergence from the libsvm code base just increase the sklearn maintenance burden?) The key thing seems to be how heavily patched is the svm.cpp already? If it's completely rewritten, then trying to work with the original project is silly, but I don't think it is. It seems like there are a few things: (1) the use of PREFIX and the _DENSE_REP ifdef, and the extra double-include file that drives that mechanism (2) changing the upper_bound in solution_info to a buffer of len 2 instead of 2 different variables (3) what looks like algorithmic changes around line 1600 that I don't understand I could certainly be wrong, but these things still look maintainable as a patch set. Why do you want to break further away from the libsvm trunk, rather than refactor things to be, if anything, *more* compatible with it? ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://ad.doubleclick.net/clk;258768047;13503038;j? http://info.appdynamics.com/FreeJavaPerformanceDownload.html _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
