Hi,
As far as I know if I transpose a matrix twice I should get back the original
matrix.
I tried to do this with DistributedRowMatrix (trunk version). My sample matrix
has 14 rows and 31 columns.
I got the following exception:
org.apache.mahout.math.IndexException: Index 31 is outside allowable range of
[0,31]
at
org.apache.mahout.math.AbstractVector.set(AbstractVector.java:324)
at
org.apache.mahout.math.SequentialAccessSparseVector.<init>(SequentialAccessSparseVector.java:69)
at
org.apache.mahout.math.hadoop.TransposeJob$TransposeReducer.reduce(TransposeJob.java:144)
at
org.apache.mahout.math.hadoop.TransposeJob$TransposeReducer.reduce(TransposeJob.java:1)
at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Exception in thread "main" java.io.IOException: Job failed!
at
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1293)
at
org.apache.mahout.math.hadoop.DistributedRowMatrix.transpose(DistributedRowMatrix.java:153)
at
com.fredhopper.MatrixTransposeJob.run(MatrixTransposeJob.java:46)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
com.fredhopper.MatrixTransposeJob.main(MatrixTransposeJob.java:52)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Does anyone had the same issues or knows how to sole it?
Thanks,
Laszlo
I runned:
hadoop jar matrix-transpose.jar \
com.fredhopper.MatrixTransposeJob \
-i input/ \
-o output/ \
--numRows 14 \
--numCols 31
My code is:
package com.fredhopper;
import java.io.IOException;
import java.util.Map;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.util.ToolRunner;
import org.apache.mahout.common.AbstractJob;
import org.apache.mahout.math.hadoop.DistributedRowMatrix;
public class MatrixTransposeJob extends AbstractJob {
@SuppressWarnings("deprecation")
@Override
public int run(String[] args) throws IOException,
ClassNotFoundException, InterruptedException {
addInputOption();
addOutputOption();
addOption("numRows", "nr", "Number of rows of
the input matrix");
addOption("numCols", "nc", "Number of columns of
the input matrix");
Configuration originalConfig = getConf();
Map<String,String> parsedArgs =
parseArguments(args);
if (parsedArgs == null) {
return -1;
}
Path inputPath = getInputPath();
Path outputPath = getOutputPath();
int numRows =
Integer.parseInt(parsedArgs.get("--numRows"));
int numCols =
Integer.parseInt(parsedArgs.get("--numCols"));
DistributedRowMatrix matrix = new
DistributedRowMatrix(inputPath,
outputPath,
numRows,
numCols);
JobConf conf = new JobConf(originalConfig);
matrix.configure(conf);
DistributedRowMatrix t1 = matrix.transpose();
DistributedRowMatrix t2 = t1.transpose();
return 0;
}
public static void main(String[] args) throws Exception {
ToolRunner.run(new Configuration(), new
MatrixTransposeJob(), args);
}
}