Hi all, I'm trying to collecting website information and I'm using Jtidy to get something like "title" tag or some meta tag when crawling. However, I've got some serious performance issue which it takes some time to parse the whole page to just get one title. Every url will be inserted to neo4j even the duplicated ones.
So, what I would like to do is to use traversal to lookup in neo4j if the url is collected already, just use the same title or should I use lucene index (not neo4j node index) to add url as key and store title, which one performs better in this case? Cheers, T. Sent from my BlackBerry?0?3 wireless device _______________________________________________ Neo4j mailing list [email protected] https://lists.neo4j.org/mailman/listinfo/user

