Re: TFIDFConverter generates empty tfidf-vectors

Taner Diler Tue, 03 Sep 2013 00:48:30 -0700

Gokhan, I try it from commandline it works. I will send the command to
compare command line parameters to TFIDFConverter params.


Suneel, I had checked the seqfiles. I didn't see any problem other
generated seqfiles but I will checked  and send samples from each seqfiles.


On Sun, Sep 1, 2013 at 11:02 PM, Gokhan Capan <[email protected]> wrote:

> Suneel is right indeed. I assumed that everything performed prior to vector
> generation is done correctly.
>
> By the way, if the suggestions do not work, could you try running
> seq2sparse from commandline with the same arguments and see if that works
> well?
>
> On Sun, Sep 1, 2013 at 7:23 PM, Suneel Marthi <[email protected]
> >wrote:
>
> > I would first check to see if the input 'seqfiles' for TFIDFGenerator
> have
> > any meat in them.
> > This could also happen if the input seqfiles are empty.
>
>
> >
> >
> > ________________________________
> >  From: Taner Diler <[email protected]>
> > To: [email protected]
> > Sent: Sunday, September 1, 2013 2:24 AM
> > Subject: TFIDFConverter generates empty tfidf-vectors
> >
> >
> > Hi all,
> >
> > I try to run Reuters KMeans example in Java, but TFIDFComverter generates
> > tfidf-vectors as empty. How can I fix that?
> >
> >     private static int minSupport = 2;
> >     private static int maxNGramSize = 2;
> >     private static float minLLRValue = 50;
> >     private static float normPower = 2;
> >     private static boolean logNormalize = true;
> >     private static int numReducers = 1;
> >     private static int chunkSizeInMegabytes = 200;
> >     private static boolean sequentialAccess = true;
> >     private static boolean namedVectors = false;
> >     private static int minDf = 5;
> >     private static long maxDF = 95;
> >
> >         Path inputDir = new Path("reuters-seqfiles");
> >         String outputDir = "reuters-kmeans-try";
> >         HadoopUtil.delete(conf, new Path(outputDir));
> >         StandardAnalyzer analyzer = new
> > StandardAnalyzer(Version.LUCENE_43);
> >         Path tokenizedPath = new
> > Path(DocumentProcessor.TOKENIZED_DOCUMENT_OUTPUT_FOLDER);
> >         DocumentProcessor.tokenizeDocuments(inputDir,
> > analyzer.getClass().asSubclass(Analyzer.class), tokenizedPath, conf);
> >
> >
> >         DictionaryVectorizer.createTermFrequencyVectors(tokenizedPath,
> new
> > Path(outputDir),
> >                 DictionaryVectorizer.DOCUMENT_VECTOR_OUTPUT_FOLDER, conf,
> > minSupport , maxNGramSize, minLLRValue, normPower , logNormalize,
> > numReducers , chunkSizeInMegabytes , sequentialAccess, namedVectors);
> >
> >
> >         Pair<Long[], List<Path>> features =
> TFIDFConverter.calculateDF(new
> > Path(outputDir,
> >                 DictionaryVectorizer.DOCUMENT_VECTOR_OUTPUT_FOLDER), new
> > Path(outputDir), conf, chunkSizeInMegabytes);
> >         TFIDFConverter.processTfIdf(new Path(outputDir,
> >                 DictionaryVectorizer.DOCUMENT_VECTOR_OUTPUT_FOLDER), new
> > Path(outputDir), conf, features, minDf , maxDF , normPower, logNormalize,
> > sequentialAccess, false, numReducers);
> >
>

Re: TFIDFConverter generates empty tfidf-vectors

Reply via email to