Re: Csv and Sequence Files.

j.barrett Strausser Thu, 31 Oct 2013 07:41:15 -0700

I think it was user error on my part. I eventually figured out the facts
that you pointed out.


I'm not sure what the Jira ticket would report besides the fact I'm not
very bright!



Another question if : If I receive information about a (user,product) pair
in online/realtime fashion. Would it be a matter or injecting that pair
into the userRating sequence file?
If that is the case, do the keys in userRatings match to the keys in the U
matrix ? I assume they do. If that is the case, how can one go back to the
original userid from the key values?

Basically I'd like to avoid recomputing U/M as I receive streaming info and
just use the existing factorizations but avoid recommending products that
have already been interacted with even if they were not in the original
input set.


-barrett




On Thu, Oct 31, 2013 at 6:40 AM, Sebastian Schelter <[email protected]
> wrote:

> Hi Barrett,
>
> The parallelAls job creates a vectorized version of the user ratings
> that you can use as input for recommendfactorized (in its temp
> directory, it should create a directory "userRatings" which has this data).
>
> Could you open a Jira ticket for this issue? The job should definitely
> be easier to use.
>
> Best,
> Sebastian
>
> On 29.10.2013 04:15, j.barrett Strausser wrote:
> > I have what I'm sure are some well trod questions.
> >
> > My Steps
> > 0. Put my csv into hdfs. See below for format of csv
> > 1. Called splitDataset on the put csv
> > 2. Called parrallelAls on the training set created by the splitdataset
> > 3. Called evaluateFactorization on the probeset and U and M matrices.
> >
> > All these steps seem to have worked.
> >
> >
> > 4. Failing Step - I attempted to call recommendfactorized. I thought I
> > should be able to use the probeSet that was created earlier. Apparently
> > though this step needs the input to be a sequenceFile.
> >
> > I created a sequencefile by calling seqDirectory on the csv I initially
> > put. This looks to have created a sequence file. But when I try and pass
> > that back to the evaluateFactorization. I get :
> Java.lang.RuntimeException:
> > java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
> > org.apache.hadoop.io.IntWritable
> >
> > My questions
> >
> > 0. Should I create a sequence file initially and use this instead of the
> > csv file? Why do the first 4 steps not require a sequence file?
> > 1. If I need a sequence file am I am to use the seqDirectory or will I
> need
> > to create my own class? If so, what would the key and value be for the
> > schema below? Assuming I should use < Longwritable, VectorWritable>
> >
> >
> > Schema of data in input csv:
> > 7714746,1958013364,1
> > 7759383,1958013364,1
> > 15766851,1958013364,1
> > 1703030,1958013364,1
> > 12541240,1958013364,1
> > 1570642,1958013364,1
> > 173980,1958013364,1
> > 5443266,1958013364,1
> > 14417672,1958013364,1
> > 923188,1958013364,1
> >
> >
> > Thanks
> > barrett
> >
>
>


-- 


https://github.com/bearrito
@deepbearrito

Re: Csv and Sequence Files.

Reply via email to