Thanks for your reply. I am using mahout 0.7. I am calling the SSVDSolver.run() method using the code I listed in my previous email (please, let me know If something is not clear). The run method of SSVDSolver does not ask for a tempPath. I set a tempPth only for the DistributedRowMatrix I am decomposing. Here is the stacktrace:
org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /user/pubmed/ssvd100/out/ABt-job-1/_temporary/_attempt_201306041019_0003_r_000002_2/part-r-00002.gz File does not exist. Holder DFSClient_attempt_201306041019_0003_r_000002_2 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1642) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1633) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1688) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1676) at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:720) at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:573) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387) at org.apache.hadoop.ipc.Client.call(Client.java:1067) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at $Proxy2.complete(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:83) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy2.complete(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:4020) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3935) at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:1434) at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:303) at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:328) at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1473) at org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:280) at org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:263) On Tue, Jun 4, 2013 at 11:10 PM, Suneel Marthi <[email protected]>wrote: > 1. It would be helpful if u could post the actual stacktrace for this > exception. > 2. Could u post the command u r using to execute ssvd? Are you working off > of trunk? If not Mahout version? > 3. Are u specifying a tempPath when running ssvd? SSVD is series of jobs. > It could be that the one of the jobs is deleting the output that is being > accessed by another. This exception could be thrown by HDFS when a job > tries accessing files that don't exist anymore. > > > > > ________________________________ > From: Ahmed Elgohary <[email protected]> > To: [email protected] > Sent: Tuesday, June 4, 2013 11:00 PM > Subject: LeaseExpiredException in ABt job of SSVD > > > Hi, > > I am trying to run ssvd on amazon EMR but, I am getting a > LeaseExpriedException during the execution of the ABt job. I posted about > my problem to AWS forum > (here<http://forums.aws.amazon.com/thread.jspa?threadID=126294&tstart=0>) > as I thought first that it could be a problem with EMR. Now, the reply I > got indicates that it's a problem with the ssvd implementation. I > successfully used ssvd before for decomposing many other datasets with > different parameter settings. Is it possible that for only one dataset I > get that exception? > > The dataset I am used here is pubmed abstracts 8.2m x 141k. The ssvd params > I am using are: rank = 100, oversampling = 15, power iterations = 2, and > ABt blocksize = 10000. The dataset is partitioned into 36 blocks on a 10 > nodes EMR cluster. > > My jar just runs the code below with the arguments: /user/data/pubmed.dat > /user/pubmed/ssvd100/tmp 8200000 141043 /user/pubmed/ssvd100/out 10000 100 > 15 16 2 > > Configuration conf = new Configuration(); > DistributedRowMatrix A = new DistributedRowMatrix(new > Path(args[0]), > new Path(args[1]), Integer.parseInt(args[2]), > Integer.parseInt(args[3])); > A.setConf(conf); > > SSVDSolver ssvdSolver = new SSVDSolver(conf, > new Path[] { A.getRowPath() }, new Path(args[4]), > Integer.parseInt(args[5]), Integer.parseInt(args[6]), > Integer.parseInt(args[7]), Integer.parseInt(args[8])); > ssvdSolver.setQ(Integer.parseInt(args[9])); > ssvdSolver.setComputeV(true); > ssvdSolver.run(); > > > > thanks, > > --ahmed >
