Hi all,

 I'm trying to collecting website information and I'm using Jtidy to get 
something like "title" tag or some meta tag when crawling. However, I've got 
some serious performance issue which it takes some time to parse the whole page 
to just get one title. Every url will be inserted to neo4j even the duplicated 
ones. 

So, what I would like to do is to use traversal to lookup in neo4j if the url 
is collected already, just use the same title or should I use lucene index (not 
neo4j node index) to add url as key and store title, which one performs better 
in this case? 

Cheers, T. 
Sent from my BlackBerry?0?3 wireless device
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to