Hello, can you use Nutch to crawl PDFs and extract person, location, dates, times an money amounts as entities, as opposed to plain text strings?
In GATE mimir-cloud (http://gate.ac.uk/mimir/), you can search for {People}, {Location}, {Date}, and {Money} entities (if you have previously used the appropriate Processing Resources to index your data sources, in GATE Developer 7.1.) For instance, you can run search queries such as: « JOHN PAUL » IN {People} Paris IN {Location}, {Date normalized>20010101 normalize<20100101} {Money > 2000} ... Can you do such things in Nutch? Many thanks. Philippe
smime.p7s
Description: S/MIME cryptographic signature

