Thank you Ken, this is great!

I've created a link to your blog post on the Tika wiki:

https://wiki.apache.org/tika/TikaResources

Thank you again!

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Ken Krugler <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Thursday, July 11, 2013 1:50 PM
To: "[email protected]" <[email protected]>
Subject: Blog post on extracting text features using Tika

>
>
>
>Hi all,
>
>
>I just posted part 1 of a series on extracting text features for machine
>learningÅ 
>
>
>http://www.scaleunlimited.com/2013/07/10/text-feature-selection-for-machin
>e-learning-part-1/
>
>
>It uses a modified version of the Tika RFC822 parser to process mbox
>files.
>
>
>I decided it was time to try to share some of what I'd learned over the
>years in processing text for classification, clustering and other related
>ML tasks.
>
>
>It undoubtedly has some things that are unclear or even incorrect, so
>please comment :)
>
>
>Thanks,
>
>
>-- Ken
>
>
>--------------------------
>Ken Krugler
>+1 530-210-6378
>http://www.scaleunlimited.com
>custom big data solutions & training
>Hadoop, Cascading, Cassandra & Solr
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Reply via email to