Hi Barrett, The parallelAls job creates a vectorized version of the user ratings that you can use as input for recommendfactorized (in its temp directory, it should create a directory "userRatings" which has this data).
Could you open a Jira ticket for this issue? The job should definitely be easier to use. Best, Sebastian On 29.10.2013 04:15, j.barrett Strausser wrote: > I have what I'm sure are some well trod questions. > > My Steps > 0. Put my csv into hdfs. See below for format of csv > 1. Called splitDataset on the put csv > 2. Called parrallelAls on the training set created by the splitdataset > 3. Called evaluateFactorization on the probeset and U and M matrices. > > All these steps seem to have worked. > > > 4. Failing Step - I attempted to call recommendfactorized. I thought I > should be able to use the probeSet that was created earlier. Apparently > though this step needs the input to be a sequenceFile. > > I created a sequencefile by calling seqDirectory on the csv I initially > put. This looks to have created a sequence file. But when I try and pass > that back to the evaluateFactorization. I get : Java.lang.RuntimeException: > java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to > org.apache.hadoop.io.IntWritable > > My questions > > 0. Should I create a sequence file initially and use this instead of the > csv file? Why do the first 4 steps not require a sequence file? > 1. If I need a sequence file am I am to use the seqDirectory or will I need > to create my own class? If so, what would the key and value be for the > schema below? Assuming I should use < Longwritable, VectorWritable> > > > Schema of data in input csv: > 7714746,1958013364,1 > 7759383,1958013364,1 > 15766851,1958013364,1 > 1703030,1958013364,1 > 12541240,1958013364,1 > 1570642,1958013364,1 > 173980,1958013364,1 > 5443266,1958013364,1 > 14417672,1958013364,1 > 923188,1958013364,1 > > > Thanks > barrett >
