I think it was user error on my part. I eventually figured out the facts that you pointed out.
I'm not sure what the Jira ticket would report besides the fact I'm not very bright! Another question if : If I receive information about a (user,product) pair in online/realtime fashion. Would it be a matter or injecting that pair into the userRating sequence file? If that is the case, do the keys in userRatings match to the keys in the U matrix ? I assume they do. If that is the case, how can one go back to the original userid from the key values? Basically I'd like to avoid recomputing U/M as I receive streaming info and just use the existing factorizations but avoid recommending products that have already been interacted with even if they were not in the original input set. -barrett On Thu, Oct 31, 2013 at 6:40 AM, Sebastian Schelter <[email protected] > wrote: > Hi Barrett, > > The parallelAls job creates a vectorized version of the user ratings > that you can use as input for recommendfactorized (in its temp > directory, it should create a directory "userRatings" which has this data). > > Could you open a Jira ticket for this issue? The job should definitely > be easier to use. > > Best, > Sebastian > > On 29.10.2013 04:15, j.barrett Strausser wrote: > > I have what I'm sure are some well trod questions. > > > > My Steps > > 0. Put my csv into hdfs. See below for format of csv > > 1. Called splitDataset on the put csv > > 2. Called parrallelAls on the training set created by the splitdataset > > 3. Called evaluateFactorization on the probeset and U and M matrices. > > > > All these steps seem to have worked. > > > > > > 4. Failing Step - I attempted to call recommendfactorized. I thought I > > should be able to use the probeSet that was created earlier. Apparently > > though this step needs the input to be a sequenceFile. > > > > I created a sequencefile by calling seqDirectory on the csv I initially > > put. This looks to have created a sequence file. But when I try and pass > > that back to the evaluateFactorization. I get : > Java.lang.RuntimeException: > > java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to > > org.apache.hadoop.io.IntWritable > > > > My questions > > > > 0. Should I create a sequence file initially and use this instead of the > > csv file? Why do the first 4 steps not require a sequence file? > > 1. If I need a sequence file am I am to use the seqDirectory or will I > need > > to create my own class? If so, what would the key and value be for the > > schema below? Assuming I should use < Longwritable, VectorWritable> > > > > > > Schema of data in input csv: > > 7714746,1958013364,1 > > 7759383,1958013364,1 > > 15766851,1958013364,1 > > 1703030,1958013364,1 > > 12541240,1958013364,1 > > 1570642,1958013364,1 > > 173980,1958013364,1 > > 5443266,1958013364,1 > > 14417672,1958013364,1 > > 923188,1958013364,1 > > > > > > Thanks > > barrett > > > > -- https://github.com/bearrito @deepbearrito
