Hi - we tried something similar many years ago and i am very glad i abandoned that idea. We make sure that whatever a client does, a filter query limiting them to their subset is always appended.
Markus -----Original message----- > From:Tom Chiverton <[email protected]> > Sent: Friday 27th January 2017 11:35 > To: [email protected] > Subject: Single Nutch 2.x install - multiple customers > > I have a single Nutch 2.x install with Solr, and it indexes a > group of sites fine. > > Now I have a totally separate set of sites, and want to index > these to a separate Solr core so that searches in one group can't > pick up results from the other. > > I see how to use the NUTCH_CONF_DIR environment variable to swap > in a different config for each call to 'crawl' so I can give a > different set of filters and 'crawl' already takes as an argument > the destination Solr core. > > But I'm still finding (from a faceted search for 'host') that > sites from the other group are entering the Solr index. > > I found an old mailing list post that talked about adding "-D > urlfilter.regex.file=regex-urlfilter-index.txt" to the "nutch > index" call in bin/crawl and then putting a regexp list of the > hosts that should be added to Solr into > $NUTCH_CONF_DIR/regex-urlfilter-index.txt but this doesn't seem to > be obayed (documents that do not match the expression are in the > Solr index. > > I don't need a separate HBase or something do I ? I'm happy to > share the in/out link data and fetches in HBase between sites, > just not the eventual index. > > -- > > > > > Tom Chiverton > Lead Developer > > > > e: > > [email protected] <mailto:[email protected]> > > p: > > 0161 817 2922 > > t: > > @extravision <http://www.twitter.com/extravision> > > w: > > www.extravision.com <http://www.extravision.com/> > > > <http://www.extravision.com/> > > > Registered in the UK at: 107 Timber Wharf, 33 Worsley > Street, Manchester, M15 4LD. > Company Reg No: 05017214 VAT: GB 824 5386 19 > > This e-mail is intended solely for the person to whom it > is addressed and may contain confidential or privileged > information. > Any views or opinions presented in this e-mail are > solely of the author and do not necessarily represent > those of Extravision Ltd. >

