I don't think a MapFile is a good solution as the file would have to be
accessed for every Reducer invocation to load the filter items for that
user. Correct me if I'm wrong.

--sebastian

Am 24.08.2010 15:45, schrieb han henry:
> For 1) , user's invalid items can store in multiple files, we use use
> MapFilesMap to load the data from HDFS,
> then we can check the invalid items.
>
> package org.apache.mahout.cf.taste.hadoop;
>
> import java.io.Closeable;
> import java.io.IOException;
> import java.util.ArrayList;
> import java.util.List;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.FileStatus;
> import org.apache.hadoop.fs.FileSystem;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.fs.PathFilter;
> import org.apache.hadoop.io.MapFile.Reader;
> import org.apache.hadoop.io.Writable;
> import org.apache.hadoop.io.WritableComparable;
> import org.slf4j.Logger;
> import org.slf4j.LoggerFactory;
>
> public final class MapFilesMap<K extends WritableComparable, V extends
> Writable>
>   implements Closeable
> {
>   private static final Logger log =
> LoggerFactory.getLogger(MapFilesMap.class);
>
>   private static final PathFilter PARTS_FILTER = new PathFilter()
>   {
>     public boolean accept(Path path) {
>       return path.getName().startsWith("part-");
>     }
>   };
>   private final List<MapFile.Reader> readers;
>
>   public MapFilesMap(FileSystem fs, Path parentDir, Configuration
> conf) throws IOException
>   {
>     log.info <http://log.info>("Creating MapFileMap from parent
> directory {}", parentDir);
>     this.readers = new ArrayList();
>     try {
>       for (FileStatus status : fs.listStatus(parentDir, PARTS_FILTER)) {
>         String path = status.getPath().toString();
>         log.info <http://log.info>("Adding MapFile.Reader at {}", path);
>         this.readers.add(new MapFile.Reader(fs, path, conf));
>       }
>     } catch (IOException ioe) {
>       close();
>       throw ioe;
>     }
>     if (this.readers.isEmpty())
>       throw new IllegalArgumentException("No MapFiles found in " +
> parentDir);
>   }
>
>   public V get(K key, V value)
>     throws IOException
>   {
>     for (MapFile.Reader reader : this.readers)
>     {
>       Writable theValue;
>       if ((theValue = reader.get(key, value)) != null) {
>         return theValue;
>       }
>     }
>     log.debug("No value for key {}", key);
>     return null;
>   }
>
>   public void close()
>   {
>     for (MapFile.Reader reader : this.readers)
>       try {
>         reader.close();
>       }
>       catch (IOException ioe)
>       {
>       }
>   }
> }
>
>
>
> 2010/8/24 Sebastian Schelter <[email protected] <mailto:[email protected]>>
>
>     Ok, you guys got me convinced :)
>
>     From a technical point of view two ways to implement that filter
>     come to
>     my mind:
>
>     1) Just load the user/item pairs to filter into memory in the
>     AggregateAndRecommendReducer (easy but might not be scalable) like Han
>     Hui suggested
>     2) Have the AggregateAndRecommendReducer not pick only the top-K
>     recommendations but write all predicted preferences to disk. Add
>     another
>     M/R step after that which joins recommendations and user/item filter
>     pairs to allow for custom rescoring/filtering
>
>     --sebastian
>
>     Am 24.08.2010 06:07, schrieb Ted Dunning:
>     > Sorry to chime in late, but removing items after recommendation
>     isn't such a
>     > crazy thing to do.
>     >
>     > In particular, it is common to remove previously viewed items
>     (for a period
>     > of time).  Likewise, it the user says "don't show this again",
>     it makes
>     > sense to backstop the actual recommendation system with a UI
>     limitation that
>     > does a post-recommendation elimination.
>     >
>     > Moreover, this approach has the great benefit that the results
>     are very
>     > predictable.  Exactly the requested/seen items will be
>     eliminated and no
>     > surprising effect on recommendations will occur.
>     >
>     > That predictability is exactly the problem, though.  Generally
>     you want a
>     > bit more systemic effect for negative recommendations.  This is
>     a really
>     > sticky area, however, because negative recommendations often impart
>     > information about positive preferences in addition to some level
>     of negative
>     > information.
>     >
>     > I used an explicit filter at both Musicmatch and at Veoh.  Both
>     systems
>     > worked well.  Especially at Veoh, there was a lot of additional
>     machinery
>     > required to handle the related problem of anti-flooding.  That
>     was done at
>     > the UI level as well.
>     >
>     > On Mon, Aug 23, 2010 at 8:16 PM, Sean Owen <[email protected]
>     <mailto:[email protected]>> wrote:
>     >
>     >
>     >> (Uncanny, I was just minutes before researching Grooveshark for
>     >> unrelated reasons... Good to hear from any company doing
>     >> recommendations and is willing to talk about it. I know of a number
>     >> that can't or won't unfortunately.)
>     >>
>     >> Yeah, sounds like we're all on the same page. One key point in
>     what I
>     >> think everyone is talking about is that this is not simply removing
>     >> items *after* recommendations are computed. This risks removing
>     most
>     >> or all recommended items. It needs to be done during the process of
>     >> selecting recommendations.
>     >>
>     >> But beyond that, it's a simple idea and just a question of
>     >> implementation. It's "Rescorer" in the non-Hadoop code, which does
>     >> more than provide a way to remove items but rather generally
>     rearrange
>     >> recommendations according to some logic. I think it's likely
>     easy and
>     >> useful to imitate this with a simple optional Mapper/Reducer
>     phase in
>     >> this nascent "RecommenderJob" pipeline that Sebastian is now
>     helping
>     >> expand into something more configurable and general purpose.
>     >>
>     >> Sean
>     >>
>     >> On Mon, Aug 23, 2010 at 8:25 PM, Chris Bates
>     >> <[email protected]
>     <mailto:[email protected]>> wrote:
>     >>
>     >>> Hi all,
>     >>>
>     >>> I'm new to this forum and haven't seen the code you are
>     talking about, so
>     >>> take this with a grain of salt.  The way we handle "banned
>     items" at
>     >>> Grooveshark is to post-process the itemID pairs in Hive.  If a
>     user
>     >>>
>     >> dislikes
>     >>
>     >>> a recommended song/artist, an item pair is stored in HDFS and
>     then when
>     >>>
>     >> the
>     >>
>     >>> recs are computed, those banned user-item pairs are taken into
>     account.
>     >>> Here is an example query:
>     >>>
>     >>> SELECT DISTINCT st.uid, st.simuid, IF(b.uid=st.uid,1,0) as
>     banned  FROM
>     >>> streams_u2u st LEFT OUTER JOIN bannedsimusers b ON
>     (b.simuid=st.simuid);
>     >>>
>     >>> That query will print out a 1 or a 0 if the recommended item
>     pair is
>     >>>
>     >> banned
>     >>
>     >>> or not.  Hive also supports case statements (I think), so you
>     can make a
>     >>> range of "banned-ness" I guess.  Just another solution to the
>     "dislike"
>     >>> problem.
>     >>>
>     >>> Chris
>     >>>
>     >>
>     >
>
>

Reply via email to