Hello,

can you use Nutch to crawl PDFs and extract person, location, dates, times an 
money amounts as entities, as opposed to plain text strings? 

In  GATE mimir-cloud (http://gate.ac.uk/mimir/), you can search for {People}, 
{Location}, {Date}, and {Money} entities (if you have previously used the 
appropriate Processing Resources to index your data sources, in GATE Developer 
7.1.) For instance, you can run search queries such as:

« JOHN PAUL » IN {People}
Paris IN {Location},
{Date normalized>20010101 normalize<20100101}
{Money > 2000}
...

Can you do such things in Nutch?

Many thanks.

Philippe

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to