Hey guys!

I am currently trying to do multilabel prediction using textual features 
(e.g., tfidf).

My data consists of a different amount of labels for a sample. One can 
have just one label and one can have 10 labels.

I now simply built a list of tuples for my y vector.

So for example:
(19, 8, 7, 5)
(8, 22, 23, 6, 18, 3)
(22,)
...

I have decided as first step to use LinearSVC. When I train the 
classifier with about 10.000 samples all works fine and also the 
prediction output looks fine.

But as soon as I use all my samples (~300.000) my python.exe crashes in 
Windows. So I tried it on my Linux server, and I get a segfault error.

Does anyone know how this can happen? Am I probably doing something wrong?

I have some more questions regarding multilabel classification, but 
let's stick to this first ;)

Many Regards,
Philipp

------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to