Csv and Sequence Files.

j.barrett Strausser Mon, 28 Oct 2013 20:16:02 -0700

I have what I'm sure are some well trod questions.

My Steps
0. Put my csv into hdfs. See below for format of csv
1. Called splitDataset on the put csv
2. Called parrallelAls on the training set created by the splitdataset
3. Called evaluateFactorization on the probeset and U and M matrices.


All these steps seem to have worked.


4. Failing Step - I attempted to call recommendfactorized. I thought I
should be able to use the probeSet that was created earlier. Apparently
though this step needs the input to be a sequenceFile.

I created a sequencefile by calling seqDirectory on the csv I initially
put. This looks to have created a sequence file. But when I try and pass
that back to the evaluateFactorization. I get : Java.lang.RuntimeException:
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
org.apache.hadoop.io.IntWritable

My questions

0. Should I create a sequence file initially and use this instead of the
csv file? Why do the first 4 steps not require a sequence file?
1. If I need a sequence file am I am to use the seqDirectory or will I need
to create my own class? If so, what would the key and value be for the
schema below? Assuming I should use < Longwritable, VectorWritable>


Schema of data in input csv:
7714746,1958013364,1
7759383,1958013364,1
15766851,1958013364,1
1703030,1958013364,1
12541240,1958013364,1
1570642,1958013364,1
173980,1958013364,1
5443266,1958013364,1
14417672,1958013364,1
923188,1958013364,1


Thanks
barrett

-- 


https://github.com/bearrito
@deepbearrito

Csv and Sequence Files.

Reply via email to