Hi Tom,

I am new to Nutch and I do anticipate the similar requires for future, your
this post had let me think in other direction i.e to have the unique set of
configurations per customer. 
The way I have been thinking of doing it was with the unique CRAWL's id for
each customer, which would mean to have a different webpage for each
customer. And there would have been a problem in case I wanted to share the
data across webpages, that would have to be handled by the application and
not by Nutch.
With your approach playing with the configurations can be more flexible, the
crawled data is not isolated in the HBase as there would be a single webpage
for all the customers.

>>I don't need a separate HBase or something do I ? I'm happy to share the
in/out link data and fetches in HBase between >>sites, just not the eventual
index.

Does you mean to say the approach of having a unique crawlId( eventually a
seperate webpage) for a customer?

I am thinking of getting all the configurations from the DB, wondering if
HBase it self can be used for so. I mean can the existing webpage be used
for getting the information itself without creating new storage. May be
someone from the core team can shed some light on it.

Here is the other thread which explains our need for it
http://lucene.472066.n3.nabble.com/Dymanic-Xpath-plugin-td4314525.html

Thanks,
Vicky




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Single-Nutch-2-x-install-multiple-customers-tp4317518p4317764.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to