Re: Csv and Sequence Files.

Sebastian Schelter Thu, 31 Oct 2013 03:41:48 -0700

Hi Barrett,

The parallelAls job creates a vectorized version of the user ratings
that you can use as input for recommendfactorized (in its temp
directory, it should create a directory "userRatings" which has this data).


Could you open a Jira ticket for this issue? The job should definitely
be easier to use.

Best,
Sebastian

On 29.10.2013 04:15, j.barrett Strausser wrote:
> I have what I'm sure are some well trod questions.
> 
> My Steps
> 0. Put my csv into hdfs. See below for format of csv
> 1. Called splitDataset on the put csv
> 2. Called parrallelAls on the training set created by the splitdataset
> 3. Called evaluateFactorization on the probeset and U and M matrices.
> 
> All these steps seem to have worked.
> 
> 
> 4. Failing Step - I attempted to call recommendfactorized. I thought I
> should be able to use the probeSet that was created earlier. Apparently
> though this step needs the input to be a sequenceFile.
> 
> I created a sequencefile by calling seqDirectory on the csv I initially
> put. This looks to have created a sequence file. But when I try and pass
> that back to the evaluateFactorization. I get : Java.lang.RuntimeException:
> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
> org.apache.hadoop.io.IntWritable
> 
> My questions
> 
> 0. Should I create a sequence file initially and use this instead of the
> csv file? Why do the first 4 steps not require a sequence file?
> 1. If I need a sequence file am I am to use the seqDirectory or will I need
> to create my own class? If so, what would the key and value be for the
> schema below? Assuming I should use < Longwritable, VectorWritable>
> 
> 
> Schema of data in input csv:
> 7714746,1958013364,1
> 7759383,1958013364,1
> 15766851,1958013364,1
> 1703030,1958013364,1
> 12541240,1958013364,1
> 1570642,1958013364,1
> 173980,1958013364,1
> 5443266,1958013364,1
> 14417672,1958013364,1
> 923188,1958013364,1
> 
> 
> Thanks
> barrett
>

Re: Csv and Sequence Files.

Reply via email to