I changed that by doing
ItemBasedRecommender recommender =
new GenericItemBasedRecommender(
model,
similarity,
new SamplingCandidateItemsStrategy(10, 10,10,115000,500),
new
SamplingCandidateItemsStrategy(10, 10,10,115000,500)
);
It's taking 25 seconds now, so adding the SamplingCandidateItemsStrategy
cut the time in half. I don't have any profilers to dig in deeper but I
thought the natural output might help as it shows where the time lags are a
little bit. If this helps awesome, if you need a profiler output or you
can't help anymore I'll find a way to get one that works. (the one i have
is broken for newer versions of eclipse)
12/07/25 14:52:48 WARN jdbc.AbstractJDBCDataModel: You are not using
ConnectionPoolDataSource. Make sure your DataSource pools connections to
the database itself, or database performance will be severely reduced.
12/07/25 14:52:48 INFO jdbc.ReloadFromJDBCDataModel: Loading new JDBC
delegate data...
12/07/25 14:53:06 INFO model.GenericDataModel: Processed 10000 users
12/07/25 14:53:06 INFO model.GenericDataModel: Processed 20000 users
12/07/25 14:53:07 INFO model.GenericDataModel: Processed 30000 users
12/07/25 14:53:07 INFO model.GenericDataModel: Processed 40000 users
12/07/25 14:53:07 INFO model.GenericDataModel: Processed 50000 users
12/07/25 14:53:07 INFO model.GenericDataModel: Processed 60000 users
12/07/25 14:53:08 INFO model.GenericDataModel: Processed 70000 users
12/07/25 14:53:08 INFO model.GenericDataModel: Processed 80000 users
12/07/25 14:53:09 INFO model.GenericDataModel: Processed 90000 users
12/07/25 14:53:09 INFO model.GenericDataModel: Processed 100000 users
12/07/25 14:53:10 INFO model.GenericDataModel: Processed 110000 users
12/07/25 14:53:10 INFO model.GenericDataModel: Processed 115481 users
12/07/25 14:53:13 INFO jdbc.ReloadFromJDBCDataModel: New data loaded.
12/07/25 14:53:13 INFO file.FileItemSimilarity: Creating FileItemSimilarity
for file output/part-r-00000
Wed Jul 25 14:53:16 EDT 2012 :done
On Wed, Jul 25, 2012 at 6:09 AM, Sean Owen <[email protected]> wrote:
> Look at SamplingCandidateItemsStrategy and its arguments. These are the
> knobs you can turn to reduce the amount of data considered. You might start
> with something low like 10 for each of the first 3 args.
>
> You can set this on an ItemBasedRecommender once configured.
>
> On Tue, Jul 24, 2012 at 11:05 PM, Jonathan Nassau <
> [email protected]
> > wrote:
>
> > Yeah I haven't done that, i'm going to look into that now.
> > But in case it could solve everything immediately, how would i set up
> > a CandidateItemStrategy in a way that would speed up the data?
> >
> >
>