Hi Marco,
Yes, this is the right place to ask! Thanks for your questions. The good news is that these really aren't error messages, they are more informational in nature, and things are working as expected. SVDPACKC has always been a source of questions and confusion, so we have put in some additional informational messages and checks in discriminate.pl, in an effort to de-mystify what SVDPACKC is doing. Let me explain below. On Fri, 8 Sep 2006, Marco Baroni wrote: > Hi there. > > Hope this is the right list to ask... (a Google search did not return > anything too helpful on the problem below, or at least nothing that I could > understand to be helpful to solve my problem...) > > I just installed the latest release of SenseClusters (0.95), curious about > the new lsa features. > > Being that I installed on a new machine (Pentium 4 3.2 GHz, 2GB RAM, > Linux Ubuntu Server 6.06 distribution), I also re-installed the newest > available versions of all helper programs from scratch. > > Following the SC installation instructions, I compiled SVDPACK's las2 using > gcc-3.3, and I modified las2.c and las2.h, the latter as follows (by cut > and paste from the instructions): > > #define LMTNW 900300001 /* max. size of working area allowed */ > #define NMAX 30000 /* bound on ncol, order of A */ > #define NZMAX 9000000 /* bound on number of nonzeros in a */ > > I ran the suggested SVDPACKC test, which generated the expected output > files (although I have no idea of how to interpret the text file, so I > don't know if it generated the "right" output files). If SVDPACKC is not installed properly, or there is some problem, the simple test that we suggest of copying belladit to matrix and running las2 will fail horribly, either hanging or causing a core dump. So, if it runs and generates output files, everyhing is a-ok. > However, when I ran Testing/testall.sh, I've got the following warnings: > > In Directory svd > In Directory mat2harbo > Test A1 for mat2harbo.pl > Running mat2harbo.pl --numform 20i4 test-A1.mat > Test Ok > Test A10 for mat2harbo.pl > Running mat2harbo.pl --param --k 4 --numform 8f10.6 test-A10.mat > NOTE(mat2harbo.pl): > The size of your default work area (LMTNW) in las2.h > should be > greater than 127 (cols=11, iter=6). > Test Ok These notes are purely for your information, and are meant to give you some indication of how much memory you need to allocate for SVDPACKC. If you don't have enough memory allocated, las2 (svd) can hang, which is a problem of course. But, the message above is not meant to suggest that you do not have enough memory allocated, it is simply telling you how much memory you should have allocated. If you have allocated enough (and indeed you have) then all is well. I can see where the message could make you worry that you don't have enough memory allocated, so I will make sure to rewrite this in the next release, to make things more clear. More below... > Running mat2harbo.pl --param --rf 2 --k 3 --numform 8f10.6 test-A10.mat > NOTE(mat2harbo.pl): > The size of your default work area (LMTNW) in las2.h > should be > greater than 184 (cols=11, iter=9). > Test Ok > Running mat2harbo.pl --param --rf 3 --k 7 --numform 8f10.6 test-A10.mat > NOTE(mat2harbo.pl): > The size of your default work area (LMTNW) in las2.h > should be > greater than 232 (cols=11, iter=11). > Test Ok > Running mat2harbo.pl --param --iter 3 --rf 1 --k 4 --numform 8f10.6 > test-A10.mat > NOTE(mat2harbo.pl): > The size of your default work area (LMTNW) in las2.h > should be > greater than 232 (cols=11, iter=11). > Test Ok > ... > > ... and similar complaints when I tried to run the demo word-wrapper.sh > script: > > [orenishii ~/sw/SenseClusters-v0.95/Demos] baroni$ more wrapper.err > Preprocessing the input data ... > Computing Unigram Counts ... > Finding Feature Regex/s ... > Building 1st Order Context Vectors ... > WARNING(discriminate.pl): > SVD could not be performed on SVDINPUT <art.n.o1_presvd> > because svd with reduction factor = 300 and scaling > factor = 10 would reduce the resultant number of > features to = 8, computed via (min(300, 83/10)). > The minimum number of features required for representing > the contexts is 10 In this case, discriminate.pl is deciding not to carry out SVD because of the small number of dimensions that would remain. There are 83 columns in the original data, and the settings for the reduction factor are the minimum of 300 or 10% of the original number of columns, which gives you 8. In discriminate.pl we have set the minimum number of dimensions to be 10, since once you get below that value SVDPACKC can sometimes hang. There are actually good reasons for that, and it doesn't always happen, so it's not a bug in SVDPACKC, it's just that one has to be careful in going below 10 dimensions, so in discriminate.pl we have opted to let 10 be the lower bound, to avoid these complexities. If one really wants to go below 10 dimensions, it is possible. One could modify discriminate.pl to allow this, or put together a script outside of discriminate.pl that called the relevant programs. However, as the number of dimensions approaches zero one must just be careful to avoid situations that will cause SVDPACKC to hang. The main problem is that the SVDPACKC process does not actually let you pick "exactly" the number of dimensions you want - if you tell it would like 100, it might find 98 or 102. That's ok if you are dealing with 100 dimensions, but if you ask for 3 dimensions, it might find 1 or even 0 dimensions, and then that's where you end up in trouble. Anyway, the message here is that you have hit a safety check in discriminate.pl that is not letting it go below 10 dimensions, this is normal, and discriminate.pl will continue and cluster your data, it just won't perform LSA in this particular case. I hope this all makes sense. In short you are doing everything right, and it looks to me like things are working just fine. Do let us know if you have any additional questions. Cordially, Ted > Clustering in Vector Space ... > Finding Number of Clusters with Cluster Stopping... > Preprocessing the input data ... > Computing Unigram Counts ... > Finding Feature Regex/s ... > Building 1st Order Context Vectors ... > Performing SVD ... > NOTE(mat2harbo.pl): > The "(LMTNW) in las2.h should be > greater than 2446 (cols=128, iter=39). > > > Any advice on what's I'm doing wrong? > > Thanks a lot! > > Regards, > > Marco > > > > > > > -- -- Ted Pedersen http://www.d.umn.edu/~tpederse ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ senseclusters-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/senseclusters-users
