What were the command/options you were passing in?
On Jan 18, 2012, at 4:26 PM, John Conwell wrote:
> I got latest from Trunk and built it, and when
> running SparseVectorsFromSequenceFiles I noticed what I think is a bug.
> The SparseVectorsFromSequenceFiles throws an exception when you want term
> frequency vectors output, with the maxDFSigma filtering option.
>
> Basically the if / else if section shown below, will skip
> calling DictionaryVectorizer.createTermFrequencyVectors when have that
> combination. The condition will create vectors when you want tf vectors
> without maxDFSigma filtering, or tfidf vectors with maxDFSigma filtering,
> but if you want tf vectors with maxDFSigma filtering, it totally skips over
> the call to createTermFrequencyVectors, and later on throws an exception
> because the vector input path doesn't exist.
>
> Is this a known issue? I'm assuming thats not the way its suposed to work,
> correct? If so, I think some sort of validation should break the user out
> before they start processing anything
>
> //at line ~267 in trunk
>
> if (!processIdf && !shouldPrune) {
>
> DictionaryVectorizer.createTermFrequencyVectors(tokenizedPath,
> outputDir, tfDirName, conf, minSupport, maxNGramSize,
>
> minLLRValue, norm, logNormalize, reduceTasks, chunkSize,
> sequentialAccessOutput, namedVectors);
>
> } else if (processIdf) {
>
> DictionaryVectorizer.createTermFrequencyVectors(tokenizedPath,
> outputDir, tfDirName, conf, minSupport, maxNGramSize,
>
> minLLRValue, -1.0f, false, reduceTasks, chunkSize,
> sequentialAccessOutput, namedVectors);
>
> }
>
> --
>
> Thanks,
> John C
>
>
>
>
> --
>
> -- John C
--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com