Hi - we tried something similar many years ago and i am very glad i abandoned 
that idea. We make sure that whatever a client does, a filter query limiting 
them to their subset is always appended.

Markus
 
 
-----Original message-----
> From:Tom Chiverton <[email protected]>
> Sent: Friday 27th January 2017 11:35
> To: [email protected]
> Subject: Single Nutch 2.x install - multiple customers
> 
> I have a single Nutch 2.x install with Solr, and it indexes a
 
>       group of sites fine. 
> 
 
> Now I have a totally separate set of sites, and want to index
 
>       these to a separate Solr core so that searches in one group can't
 
>       pick up results from the other. 
> 
 
> I see how to use the NUTCH_CONF_DIR environment variable to swap
 
>       in a different config for each call to 'crawl' so I can give a
 
>       different set of filters and 'crawl' already takes as an argument
 
>       the destination Solr core. 
> 
 
> But I'm still finding (from a faceted search for 'host') that
 
>       sites from the other group are entering the Solr index. 
> 
 
> I found an old mailing list post that talked about adding "-D
 
>       urlfilter.regex.file=regex-urlfilter-index.txt" to the "nutch
 
>       index" call in bin/crawl and then putting a regexp list of the
 
>       hosts that should be added to Solr into
 
>       $NUTCH_CONF_DIR/regex-urlfilter-index.txt but this doesn't seem to
 
>       be obayed (documents that do not match the expression are in the
 
>       Solr index. 
> 
 
> I don't need a separate HBase or something do I ? I'm happy to
 
>       share the in/out link data and fetches in HBase between sites,
 
>       just not the eventual index.
 
>     
 
> -- 
 
>               
>               
>               
>               
> Tom Chiverton
 
>                 Lead Developer
 
>               
>               
 
>                       
> e: 
 
>                       
> [email protected] <mailto:[email protected]>
 
>                       
> p: 
 
>                       
> 0161 817 2922
 
>                       
> t: 
 
>                       
> @extravision <http://www.twitter.com/extravision>
 
>                       
> w: 
 
>                       
> www.extravision.com <http://www.extravision.com/>
 
>               
>               
>  <http://www.extravision.com/>
 
>               
>               
>  Registered in the UK at: 107 Timber Wharf, 33 Worsley
 
>                   Street, Manchester, M15 4LD.
 
>                 Company Reg No: 0‌‌5017214 VAT: GB 8‌‌24 5386 19
 
>                 
 
>                 This e-mail is intended solely for the person to whom it
 
>                 is addressed and may contain confidential or privileged
 
>                 information.
 
>                 Any views or opinions presented in this e-mail are
 
>                 solely of the author and do not necessarily represent
 
>                 those of Extravision Ltd. 
 
>               

Reply via email to