Hi,
I am trying to run ssvd on amazon EMR but, I am getting a
LeaseExpriedException during the execution of the ABt job. I posted about
my problem to AWS forum
(here<http://forums.aws.amazon.com/thread.jspa?threadID=126294&tstart=0>)
as I thought first that it could be a problem with EMR. Now, the reply I
got indicates that it's a problem with the ssvd implementation. I
successfully used ssvd before for decomposing many other datasets with
different parameter settings. Is it possible that for only one dataset I
get that exception?
The dataset I am used here is pubmed abstracts 8.2m x 141k. The ssvd params
I am using are: rank = 100, oversampling = 15, power iterations = 2, and
ABt blocksize = 10000. The dataset is partitioned into 36 blocks on a 10
nodes EMR cluster.
My jar just runs the code below with the arguments: /user/data/pubmed.dat
/user/pubmed/ssvd100/tmp 8200000 141043 /user/pubmed/ssvd100/out 10000 100
15 16 2
Configuration conf = new Configuration();
DistributedRowMatrix A = new DistributedRowMatrix(new Path(args[0]),
new Path(args[1]), Integer.parseInt(args[2]),
Integer.parseInt(args[3]));
A.setConf(conf);
SSVDSolver ssvdSolver = new SSVDSolver(conf,
new Path[] { A.getRowPath() }, new Path(args[4]),
Integer.parseInt(args[5]), Integer.parseInt(args[6]),
Integer.parseInt(args[7]), Integer.parseInt(args[8]));
ssvdSolver.setQ(Integer.parseInt(args[9]));
ssvdSolver.setComputeV(true);
ssvdSolver.run();
thanks,
--ahmed