Re: ItemSimilarityJob creates no output

Something Something Tue, 05 Jun 2012 11:14:13 -0700

One thing I noticed is that in step 4 of this process
(RowSimilarityJob-VectorNormMapper-Reducer)


Mapper input:  6,925
Mapper output: 3

Reducer input: 3
Reducer output: 0

Most of the values going into the RowSimilarityJob are defaults.  Here's
what I see in the code:

    if (shouldRunNextPhase(parsedArgs, currentPhase)) {
      int numberOfUsers = HadoopUtil.readInt(new Path(prepPath,
PreparePreferenceMatrixJob.NUM_USERS),
          getConf());

      ToolRunner.run(getConf(), new RowSimilarityJob(), new String[] {
          "--input", new Path(prepPath,
PreparePreferenceMatrixJob.RATING_MATRIX).toString(),
          "--output", similarityMatrixPath.toString(),
          "--numberOfColumns", String.valueOf(numberOfUsers),
          "--similarityClassname", similarityClassName,
          "--maxSimilaritiesPerRow", String.valueOf(maxSimilarItemsPerItem),
          "--excludeSelfSimilarity", String.valueOf(Boolean.TRUE),
          "--threshold", String.valueOf(threshold),
          "--tempDir", getTempPath().toString() });
    }


Any ideas?


On Mon, Jun 4, 2012 at 7:36 PM, Something Something <
[email protected]> wrote:

> My job setup is really simple.  It looks like this:
>
>     public int run(String[] args) throws Exception {
>         String datasetDate = args[0];
>         String inputPath = args[1];
>         String configFile = args[2];
>         String ouputLocation = args[3];
>
>         Configuration config = getConf();
>         config.addResource(new Path(configFile));
>         logger.error("config: " + config.toString());
>
>         File inputFile = new File(inputPath);
>         File outputDir = new File(ouputLocation);
>         outputDir.delete();
>         File tmpDir = new File("/tmp");
>
>         ItemSimilarityJob similarityJob = new ItemSimilarityJob();
>
>         Configuration conf = new Configuration();
>         conf.set("mapred.input.dir", inputFile.getAbsolutePath());
>         conf.set("mapred.output.dir", outputDir.getAbsolutePath());
>         conf.setBoolean("mapred.output.compress", false);
>
>         similarityJob.setConf(conf);
>
>         similarityJob.run(new String[]{"--tempDir",
> tmpDir.getAbsolutePath(), "--similarityClassname",
>                 PearsonCorrelationSimilarity.class.getName(),});
>
>         return 0;
>     }
>
>
> The input file is sorted by UserId, ItemId & Preference.  Preference is
> always '1'.  A few lines from the file look like this:
>
> -1000000334008648908    1    1
> -1000000334008648908    70    1
> -1000000334008648908    2090    1
> -1000000334008648908    12872    1
> -1000000334008648908    32790    1
> -1000000334008648908    32799    1
> -1000000334008648908    32969    1
> -1000000397028994738    1    1
> -1000000397028994738    12872    1
> -1000000397028994738    32790    1
> -1000000397028994738    32796    1
> -1000000397028994738    32939    1
> -100000083781885705    1    1
> -100000083781885705    12872    1
> -100000083781885705    32790    1
> -100000083781885705    32837    1
> -100000083781885705    33723    1
> -1000001014586220418    1    1
> -1000001014586220418    12872    1
> -1000001014586220418    32790    1
> & so on...
>
> (UserId is created using MemoryIDMigrator)
>
>
> The job internally runs following 7 Hadoop jobs which all run successfully:
>
> PreparePreferenceMatrixJob-ItemIDIndexMapper-Reducer
> PreparePreferenceMatrixJob-ToItemPrefsMapper-Reducer
> PreparePreferenceMatrixJob-ToItemVectorsMapper-Reducer
> RowSimilarityJob-VectorNormMapper-Reducer
> RowSimilarityJob-CooccurrencesMapper-Reducer
> RowSimilarityJob-UnsymmetrifyMapper-Reducer
> ItemSimilarityJob-MostSimilarItemPairsMapper-Reducer
>
>
> Problem is that the output file is empty!  What am I missing?  Please
> help.  Thanks.
>
>

Re: ItemSimilarityJob creates no output

Reply via email to