Re: Bayesian Analysis for v3

Josip Almasi Mon, 05 Nov 2012 10:24:07 -0800

David Legg wrote:

That's pretty straightforward actually.  Suppose you have a sentence "Mary had a 
little lamb" then you would generate the following token values in addition to the 
single word tokens if you were capturing a phrase size of 2: -


   Maryhad
   hada
   alittle
   littlelamb


Neat trick, I wonder how it works out.
Might be too large, especially with malformed MIME types.

I recommend you read Paul Graham's 'Better Bayesian Filtering' [2] (especially 
the bit titled 'Tokens').  It's fascinating stuff... or maybe I'm getting too 
old and geeky :-)


Sure I did, quite a while ago.

Image info needs extracting too.  So things like the width, height, bit depth, 
type of encoding, Exif data and any tags should all be captured.


...what would you use to extract image info?


I haven't used any graphics libraries recently but a quick scan suggests 
'Commons Sanselan' [3] which happily is an Apache project now.


Seams easy.
Broken link to MedatdataExample.java:/

Well, you got it all covered.

Regards...


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Bayesian Analysis for v3

Reply via email to