Hi Vaibhav, Thanks for the reply. It doesn't look like total count of keys in frequency.file-0 corresponds to the number of documents, because I only used a couple hundred documents to build the model and there are thousands of keys in frequency.file-0. Am I misunderstanding something?
On Tue, Jul 29, 2014 at 1:15 PM, vaibhav srivastava <[email protected]> wrote: > Hi if I am correct you want to know the number of documents by reading > frequency.file-0; You can use the SequenceFileReader to load the frequency > file and then count the number of keys that will give you the number of > documents. > Hope this helps, > Thanks, > vaibhav > > > On Tue, Jul 29, 2014 at 10:32 PM, Jonathan Cooper-Ellis <[email protected]> > wrote: > > > Hey guys, > > > > I'm trying to make a Bayesian classifier, but I'm having a hard time > > figuring out how to programatically determine the value of the numDocs > > param for calculate method in TFIDF, using the files generated building > the > > model on the command line. > > > > I saw some code that did it like this: > > > > int numDocs = documentFrequency.get(-1).intValue(); > > > > Where documentFrequency is a HashMap<Integer,Long> read from > > frequency.file-0, but there's no key -1 in the file so its giving me an > NPE > > when I try to pass that to tfidf.calculate. > > > > Anyone know what I'm doing wrong? > > > > > > Best, > > > > jce > > > > > > -- > Thanks and Regards, > Vaibhav Srivastava > Email-id: [email protected] > Mobile no.: 9552543029 >
