> * Sam Steingold <f...@tah.bet> [2012-02-10 10:01:54 -0500]: > > When I tried to run svm on the same data frame, memory usage as reported > by top(1) doubled to 4GB almost right away and the function never > returned (has been running for ~15 hours now). ^C does not stop it. > This is most unusual, libsvm has always seemed very fast.
looks like it _is_ libsvm: #0 0x00007ffff2aedc64 in Solver::select_working_set (this=0x7fffffff97f0, out_i=@0x7fffffff95a0, out_j=@0x7fffffff95b0) at svm.cpp:852 #1 0x00007ffff2aef91d in Solver::Solve (this=0x7fffffff97f0, l=285724, Q=..., p_=<optimized out>, y_=<optimized out>, alpha_=0x6023fb60, Cp=1, Cn=1, eps=<optimized out>, si=0x7fffffff9980, shrinking=1) at svm.cpp:573 #2 0x00007ffff2af1747 in solve_c_svc (Cn=1, Cp=1, si=0x7fffffff9980, alpha=0x6023fb60, param=<optimized out>, prob=0x7fffffff9c30) at svm.cpp:1444 #3 svm_train_one (prob=0x7fffffff9c30, param=<optimized out>, Cp=1, Cn=1) at svm.cpp:1641 #4 0x00007ffff2af4a8e in svm_train (prob=<optimized out>, param=0x7fffffff9d40) at svm.cpp:2179 #5 0x00007ffff2aea281 in svmtrain (x=0x7fff7e698038, r=0x11c9b1e0, c=<optimized out>, y=<optimized out>, rowindex=<optimized out>, colindex=<optimized out>, svm_type=0x11c9b2a0, kernel_type=0x11c9b2d0, degree=0x11c9b300, gamma=0x356e3a28, coef0=0x356e3a60, cost=0x356e3ad0, nu=0x103589a8, weightlabels=0x0, weights=0x0, nweights=0x11c9b330, cache=0x103589e0, tolerance=0x10358a18, epsilon=0x10358a50, shrinking=0x11c9b360, cross=0x11c9b390, sparse=0x11c9b3c0, probability=0x1524dbb0, seed=0x1524dbe0, nclasses=0x1524dc10, nr=0x1524dc40, index=0x148a0fa8, labels=0xa3303b8, nSV=0xa330420, rho=0x170083e8, coefs=0x391dbb48, sigma=0x10358a88, probA=0xdf94678, probB=0xcbb7eb8, cresults=0x0, ctotal1=0x10358ac0, ctotal2=0x10358af8, error=0x10358b30) at Rsvm.c:275 #6 0x00007ffff792cefc in ?? () from /usr/lib/R/lib/libR.so #7 0x00007ffff795da1d in Rf_eval () from /usr/lib/R/lib/libR.so #8 0x00007ffff795f540 in ?? () from /usr/lib/R/lib/libR.so #9 0x00007ffff795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so #10 0x00007ffff795f6c9 in ?? () from /usr/lib/R/lib/libR.so #11 0x00007ffff795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so #12 0x00007ffff7960a7f in Rf_applyClosure () from /usr/lib/R/lib/libR.so #13 0x00007ffff79ad784 in Rf_usemethod () from /usr/lib/R/lib/libR.so #14 0x00007ffff79ada47 in ?? () from /usr/lib/R/lib/libR.so #15 0x00007ffff795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so #16 0x00007ffff7960a7f in Rf_applyClosure () from /usr/lib/R/lib/libR.so #17 0x00007ffff795d6e0 in Rf_eval () from /usr/lib/R/lib/libR.so #18 0x00007ffff795f540 in ?? () from /usr/lib/R/lib/libR.so #19 0x00007ffff795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so #20 0x00007ffff795db9b in ?? () from /usr/lib/R/lib/libR.so #21 0x00007ffff795dad9 in Rf_eval () from /usr/lib/R/lib/libR.so #22 0x00007ffff795f6c9 in ?? () from /usr/lib/R/lib/libR.so #23 0x00007ffff795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so #24 0x00007ffff7960a7f in Rf_applyClosure () from /usr/lib/R/lib/libR.so #25 0x00007ffff795d6e0 in Rf_eval () from /usr/lib/R/lib/libR.so #26 0x00007ffff7998055 in Rf_ReplIteration () from /usr/lib/R/lib/libR.so #27 0x00007ffff79982e0 in ?? () from /usr/lib/R/lib/libR.so #28 0x00007ffff7998370 in run_Rmainloop () from /usr/lib/R/lib/libR.so #29 0x000000000040078b in main () #30 0x00007ffff72d930d in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6 #31 0x00000000004007bd in _start () #0 0x00007ffff2aeeb67 in Kernel::dot (px=0x48eeb220, py=0x4b21890) at svm.cpp:295 #1 0x00007ffff2af7a25 in Kernel::kernel_rbf (this=<optimized out>, i=<optimized out>, j=<optimized out>) at svm.cpp:239 #2 0x00007ffff2af782c in SVC_Q::get_Q (this=0x7fffffff9870, i=187701, len=208039) at svm.cpp:1271 #3 0x00007ffff2aef9ab in Solver::Solve (this=0x7fffffff97f0, l=285724, Q=..., p_=<optimized out>, y_=<optimized out>, alpha_=0x6023fb60, Cp=1, Cn=1, eps=<optimized out>, si=0x7fffffff9980, shrinking=1) at svm.cpp:591 #4 0x00007ffff2af1747 in solve_c_svc (Cn=1, Cp=1, si=0x7fffffff9980, alpha=0x6023fb60, param=<optimized out>, prob=0x7fffffff9c30) at svm.cpp:1444 #5 svm_train_one (prob=0x7fffffff9c30, param=<optimized out>, Cp=1, Cn=1) at svm.cpp:1641 #6 0x00007ffff2af4a8e in svm_train (prob=<optimized out>, param=0x7fffffff9d40) at svm.cpp:2179 #7 0x00007ffff2aea281 in svmtrain (x=0x7fff7e698038, r=0x11c9b1e0, c=<optimized out>, y=<optimized out>, rowindex=<optimized out>, colindex=<optimized out>, svm_type=0x11c9b2a0, kernel_type=0x11c9b2d0, degree=0x11c9b300, gamma=0x356e3a28, coef0=0x356e3a60, cost=0x356e3ad0, nu=0x103589a8, weightlabels=0x0, weights=0x0, nweights=0x11c9b330, cache=0x103589e0, tolerance=0x10358a18, epsilon=0x10358a50, shrinking=0x11c9b360, cross=0x11c9b390, sparse=0x11c9b3c0, probability=0x1524dbb0, seed=0x1524dbe0, nclasses=0x1524dc10, nr=0x1524dc40, index=0x148a0fa8, labels=0xa3303b8, nSV=0xa330420, rho=0x170083e8, coefs=0x391dbb48, sigma=0x10358a88, probA=0xdf94678, probB=0xcbb7eb8, cresults=0x0, ctotal1=0x10358ac0, ctotal2=0x10358af8, error=0x10358b30) at Rsvm.c:275 #8 0x00007ffff792cefc in ?? () from /usr/lib/R/lib/libR.so #9 0x00007ffff795da1d in Rf_eval () from /usr/lib/R/lib/libR.so #10 0x00007ffff795f540 in ?? () from /usr/lib/R/lib/libR.so #11 0x00007ffff795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so #12 0x00007ffff795f6c9 in ?? () from /usr/lib/R/lib/libR.so #13 0x00007ffff795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so #14 0x00007ffff7960a7f in Rf_applyClosure () from /usr/lib/R/lib/libR.so #15 0x00007ffff79ad784 in Rf_usemethod () from /usr/lib/R/lib/libR.so #16 0x00007ffff79ada47 in ?? () from /usr/lib/R/lib/libR.so #17 0x00007ffff795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so #18 0x00007ffff7960a7f in Rf_applyClosure () from /usr/lib/R/lib/libR.so #19 0x00007ffff795d6e0 in Rf_eval () from /usr/lib/R/lib/libR.so #20 0x00007ffff795f540 in ?? () from /usr/lib/R/lib/libR.so #21 0x00007ffff795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so #22 0x00007ffff795db9b in ?? () from /usr/lib/R/lib/libR.so #23 0x00007ffff795dad9 in Rf_eval () from /usr/lib/R/lib/libR.so #24 0x00007ffff795f6c9 in ?? () from /usr/lib/R/lib/libR.so #25 0x00007ffff795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so #26 0x00007ffff7960a7f in Rf_applyClosure () from /usr/lib/R/lib/libR.so #27 0x00007ffff795d6e0 in Rf_eval () from /usr/lib/R/lib/libR.so #28 0x00007ffff7998055 in Rf_ReplIteration () from /usr/lib/R/lib/libR.so #29 0x00007ffff79982e0 in ?? () from /usr/lib/R/lib/libR.so #30 0x00007ffff7998370 in run_Rmainloop () from /usr/lib/R/lib/libR.so #31 0x000000000040078b in main () #32 0x00007ffff72d930d in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6 #33 0x00000000004007bd in _start () > This is R version 2.13.1 (2011-07-08) (as distributed with ubuntu). > >> * Sam Steingold <f...@tah.bet> [2012-02-09 21:43:30 -0500]: >> >> I did this: >> nb <- naiveBayes(users, platform) >> pl <- predict(nb,users) >> nrow(users) ==> 314781 >> ncol(users) ==> 109 >> >> 1. naiveBayes() was quite fast (~20 seconds), while predict() was slow >> (tens of minutes). why? >> >> 2. the predict results were completely off the mark (quite the opposite >> of the expected overfitting). suffice it to show the tables: >> >> pl: >> >> android blackberry ipad iphone lg linux mac >> 3 5 11 14 312723 5 11 >> mobile nokia samsung symbian unknown windows >> 1864 17 16 112 0 0 >> >> platform: >> android blackberry ipad iphone lg linux mac >> 18013 1221 2647 1328 4 2936 34336 >> mobile nokia samsung symbian unknown windows >> 18 88 39 103 2660 251388 >> >> i.e., nb classified nearly everything as "lg" while in the actual data >> "lg" is virtually nonexistent. >> >> 3. when I print "nb", I see "A-priori probabilities" (which are what I >> expected) and "Conditional probabilities" which are confusing because >> there are only two of them, e.g.: >> >> android 0.048464998 0.43946764 >> blackberry 0.001638002 0.04045564 >> ipad 0.322251606 1.84940588 >> iphone 0.030873494 0.23250250 >> lg 0.000000000 0.00000000 >> linux 0.023501362 0.34698919 >> mac 0.082653774 1.22535027 >> mobile 0.000000000 0.00000000 >> nokia 0.000000000 0.00000000 >> samsung 0.000000000 0.00000000 >> symbian 0.000000000 0.00000000 >> unknown 0.003759398 0.08219078 >> windows 0.021158528 0.32916970 >> >> the predictors are integers. >> is the first column for the 0 predictors and the second for all non-0? >> Is there a way to ask naiveBayes to differenciate between non-0 values? >> >> thanks! -- Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000 http://www.childpsy.net/ http://pmw.org.il http://iris.org.il http://ffii.org http://truepeace.org http://memri.org http://www.memritv.org If a train station is a place where a train stops, what's a workstation? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.