They are not raw word counts. Instead they are processed using various formulae. I don't know where these are articulated.
On Sun, Mar 25, 2012 at 1:01 PM, Necati Demir <[email protected]> wrote: > You are right. I asked my question in a wrong way. > > I want to ask that some values are something like 25.5. How a wordcount can > have 0.5 value? You can see a part of this file below: > > Key: 108 1 1: Value: 241.7667508731829 > Key: 108 4: Value: 8.554995151411276 > Key: 108 4 during: Value: 25.260550610371865 > Key: 108 billion: Value: 20.98225432772597 > Key: 108 kg: Value: 24.666483410952424 > Key: 108 kg a4: Value: 44.2003664152453 > > > > > On 25 March 2012 02:59, Lance Norskog <[email protected]> wrote: > >> The counts are doubles. Vectors in Mahout are always doubles. >> >> On Fri, Mar 23, 2012 at 4:23 PM, Necati Demir <[email protected]> wrote: >> > Hello, >> > >> > I am running seq2sparse command with the parameter -ng 2. >> > When I dump wordcount/ngrams/part-r-00000 file, I see that sum values are >> > not integers. How are n-gram values calculated in mahout? >> > >> > >> > -- >> > Necati DEMİR >> > -------------------- >> >> >> >> -- >> Lance Norskog >> [email protected] >> > > > > -- > Necati DEMİR > -------------------- -- Lance Norskog [email protected]
