Hi all:

I'm working on an image search engine, using the combination of nutch and solr. 
With nutch and tika I get some metadata from the images extracted, so far so 
good. But I'm trying to improve the accuracy of the results using the 
surrounding text of the images. 

I know that there are several papers published around this subject, using 
several techniques and algorithms. Basically I'm trying to use some heuristics 
methods that don't require a lot of processing. In 
https://webarchive.jira.com/wiki/display/SOC06/Image+annotation+with+surrounding+text
 I've found a few heuristics methods, which I'm implementing in a custom nutch 
plugin:

the upper, or below <tr> node's text, and the <tr> node's text in which the 
image appears,
the text in the paragraph in which the image appears,
the textual content of the headings preceding the image,

But I think this is not enough, anyone can provide some advise or new heuristic 
methods to this quest?

Thanks in advance,

Greetings!

PS: Sorry for my english but it's not my native language :-S

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Reply via email to