Hi Marco,

Yes, this is the right place to ask! Thanks for your questions. The good
news is that these really aren't error messages, they are more 
informational in nature, and things are working as expected. 

SVDPACKC has always been a source of questions and confusion, so we have  
put in some additional informational messages and checks in   
discriminate.pl, in an effort to de-mystify what SVDPACKC is doing. Let 
me explain below. 

On Fri, 8 Sep 2006, Marco Baroni wrote:

> Hi there.
> 
> Hope this is the right list to ask... (a Google search did not return
> anything too helpful on the problem below, or at least nothing that I could
> understand to be helpful to solve my problem...)
> 
> I just installed the latest release of SenseClusters (0.95), curious about
> the new lsa features.
> 
> Being that I installed on a new machine (Pentium 4 3.2 GHz, 2GB RAM,
> Linux Ubuntu Server 6.06 distribution), I also re-installed the newest
> available versions of all helper programs from scratch.
> 
> Following the SC installation instructions, I compiled SVDPACK's las2 using
> gcc-3.3, and I modified las2.c and las2.h, the latter as follows (by cut
> and paste from the instructions):
> 
> #define LMTNW   900300001 /* max. size of working area allowed  */
> #define NMAX    30000     /* bound on ncol, order of A          */
> #define NZMAX   9000000   /* bound on number of nonzeros in a   */
> 
> I ran the suggested  SVDPACKC test, which generated the expected output 
> files (although I have no idea of how to interpret the text file, so I 
> don't know if it generated the "right" output files).

If SVDPACKC is not installed properly, or there is some problem, the 
simple test that we suggest of copying belladit to matrix and running las2  
will fail horribly, either hanging or causing a core dump. So, if it runs  
and generates output files, everyhing is a-ok. 

> However, when I ran Testing/testall.sh, I've got the following warnings:
> 
> In Directory svd
> In Directory mat2harbo
> Test A1 for mat2harbo.pl
> Running mat2harbo.pl --numform 20i4 test-A1.mat
> Test Ok
> Test A10 for mat2harbo.pl
> Running mat2harbo.pl --param --k 4 --numform 8f10.6 test-A10.mat
> NOTE(mat2harbo.pl):
>                  The size of your default work area (LMTNW) in las2.h
> should be
>                  greater than 127 (cols=11, iter=6).
> Test Ok

These notes are purely for your information, and are meant to give you  
some indication of how much memory you need to allocate for SVDPACKC. If  
you don't have enough memory allocated, las2 (svd) can hang, which is a  
problem of course. But, the message above is not meant to suggest that 
you do not have enough memory allocated, it is simply telling you how 
much memory you should have allocated. If you have allocated enough 
(and indeed you have) then all is well. 

I can see where the message could make you worry that you don't have 
enough memory allocated, so I will make sure to rewrite this in the next
release, to make things more clear. 

More below...

> Running mat2harbo.pl --param --rf 2 --k 3 --numform 8f10.6 test-A10.mat
> NOTE(mat2harbo.pl):
>                  The size of your default work area (LMTNW) in las2.h
> should be
>                  greater than 184 (cols=11, iter=9).
> Test Ok
> Running mat2harbo.pl --param --rf 3 --k 7 --numform 8f10.6 test-A10.mat
> NOTE(mat2harbo.pl):
>                  The size of your default work area (LMTNW) in las2.h
> should be
>                  greater than 232 (cols=11, iter=11).
> Test Ok
> Running mat2harbo.pl --param --iter 3 --rf 1 --k 4 --numform 8f10.6
> test-A10.mat
> NOTE(mat2harbo.pl):
>                  The size of your default work area (LMTNW) in las2.h
> should be
>                  greater than 232 (cols=11, iter=11).
> Test Ok
> ...
> 
> ... and similar complaints  when I tried to run the demo word-wrapper.sh
> script:
> 
> [orenishii ~/sw/SenseClusters-v0.95/Demos] baroni$ more wrapper.err
> Preprocessing the input data ...
> Computing Unigram Counts ...
> Finding Feature Regex/s ...
> Building 1st Order Context Vectors ...
> WARNING(discriminate.pl):
>          SVD could not be performed on SVDINPUT <art.n.o1_presvd>
>          because svd with reduction factor = 300 and scaling
>          factor = 10 would reduce the resultant number of
>          features to = 8, computed via (min(300, 83/10)).
>          The minimum number of features required for representing
>          the contexts is 10

In this case, discriminate.pl is deciding not to carry out SVD because of 
the small number of dimensions that would remain. There are 83 columns in 
the original data, and the settings for the reduction factor are the 
minimum of 300 or 10% of the original number of columns, which gives you 
8. In discriminate.pl we have set the minimum number of dimensions to be  
10, since once you get below that value SVDPACKC can sometimes hang. There  
are actually good reasons for that, and it doesn't always happen, so it's 
not a bug in SVDPACKC, it's just that one has to be careful in going  
below 10 dimensions, so in discriminate.pl we have opted to let 10 be the  
lower bound, to avoid these complexities. 

If one really wants to go below 10 dimensions, it is possible. One could 
modify discriminate.pl to allow this, or put together a script outside of 
discriminate.pl that called the relevant programs. However, as the number 
of dimensions approaches zero one must just be careful to avoid situations 
that will cause SVDPACKC to hang. The main problem is that the SVDPACKC 
process does not actually let you pick "exactly" the number of dimensions 
you want - if you tell it would like 100, it might find 98 or 102. That's
ok if you are dealing with 100 dimensions, but if you ask for 3 
dimensions, it might find 1 or even 0 dimensions, and then that's where 
you end up in trouble. 

Anyway, the message here is that you have hit a safety check in 
discriminate.pl that is not letting it go below 10 dimensions, this is 
normal, and discriminate.pl will continue and cluster your data, it just 
won't perform LSA in this particular case. 

I hope this all makes sense.

In short you are doing everything right, and it looks to me like things 
are working just fine. 

Do let us know if you have any additional questions. 

Cordially,
Ted

> Clustering in Vector Space ...
> Finding Number of Clusters with Cluster Stopping...
> Preprocessing the input data ...
> Computing Unigram Counts ...
> Finding Feature Regex/s ...
> Building 1st Order Context Vectors ...
> Performing SVD ...
> NOTE(mat2harbo.pl):
>                  The "(LMTNW) in las2.h should be
>                  greater than 2446 (cols=128, iter=39).
> 
> 
> Any advice on what's I'm doing wrong?
> 
> Thanks a lot!
> 
> Regards,
> 
> Marco
> 
> 
> 
> 
> 
> 
> 

-- 
--
Ted Pedersen
http://www.d.umn.edu/~tpederse


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users

Reply via email to