Gokhan, I try it from commandline it works. I will send the command to compare command line parameters to TFIDFConverter params.
Suneel, I had checked the seqfiles. I didn't see any problem other generated seqfiles but I will checked and send samples from each seqfiles. On Sun, Sep 1, 2013 at 11:02 PM, Gokhan Capan <[email protected]> wrote: > Suneel is right indeed. I assumed that everything performed prior to vector > generation is done correctly. > > By the way, if the suggestions do not work, could you try running > seq2sparse from commandline with the same arguments and see if that works > well? > > On Sun, Sep 1, 2013 at 7:23 PM, Suneel Marthi <[email protected] > >wrote: > > > I would first check to see if the input 'seqfiles' for TFIDFGenerator > have > > any meat in them. > > This could also happen if the input seqfiles are empty. > > > > > > > > ________________________________ > > From: Taner Diler <[email protected]> > > To: [email protected] > > Sent: Sunday, September 1, 2013 2:24 AM > > Subject: TFIDFConverter generates empty tfidf-vectors > > > > > > Hi all, > > > > I try to run Reuters KMeans example in Java, but TFIDFComverter generates > > tfidf-vectors as empty. How can I fix that? > > > > private static int minSupport = 2; > > private static int maxNGramSize = 2; > > private static float minLLRValue = 50; > > private static float normPower = 2; > > private static boolean logNormalize = true; > > private static int numReducers = 1; > > private static int chunkSizeInMegabytes = 200; > > private static boolean sequentialAccess = true; > > private static boolean namedVectors = false; > > private static int minDf = 5; > > private static long maxDF = 95; > > > > Path inputDir = new Path("reuters-seqfiles"); > > String outputDir = "reuters-kmeans-try"; > > HadoopUtil.delete(conf, new Path(outputDir)); > > StandardAnalyzer analyzer = new > > StandardAnalyzer(Version.LUCENE_43); > > Path tokenizedPath = new > > Path(DocumentProcessor.TOKENIZED_DOCUMENT_OUTPUT_FOLDER); > > DocumentProcessor.tokenizeDocuments(inputDir, > > analyzer.getClass().asSubclass(Analyzer.class), tokenizedPath, conf); > > > > > > DictionaryVectorizer.createTermFrequencyVectors(tokenizedPath, > new > > Path(outputDir), > > DictionaryVectorizer.DOCUMENT_VECTOR_OUTPUT_FOLDER, conf, > > minSupport , maxNGramSize, minLLRValue, normPower , logNormalize, > > numReducers , chunkSizeInMegabytes , sequentialAccess, namedVectors); > > > > > > Pair<Long[], List<Path>> features = > TFIDFConverter.calculateDF(new > > Path(outputDir, > > DictionaryVectorizer.DOCUMENT_VECTOR_OUTPUT_FOLDER), new > > Path(outputDir), conf, chunkSizeInMegabytes); > > TFIDFConverter.processTfIdf(new Path(outputDir, > > DictionaryVectorizer.DOCUMENT_VECTOR_OUTPUT_FOLDER), new > > Path(outputDir), conf, features, minDf , maxDF , normPower, logNormalize, > > sequentialAccess, false, numReducers); > > >
