Re: How to avoid repeatedly upload job jars

katta surendra babu Thu, 02 Mar 2017 02:02:04 -0800

Hi Sebastian,

 I am looking  to work with  Json related website to crawl the data of that
website  by using  Nutch 2.3.1 , Hbase0.98 , Solr5.6 .

Here the problem is :

 for the 1st round I get the Json data into Hbase, but for second round  I
am not getting the meta data and the html links in nutch

So, please help me out if you  can ... to crawl the Json website completely.

On Thu, Mar 2, 2017 at 3:21 PM, Sebastian Nagel <[email protected]>
wrote:

> Hi,
>
> maybe the Hadoop Distributed Cache is what you are looking for?
>
> Best,
> Sebastian
>
> On 03/02/2017 01:35 AM, 391772322 wrote:
> > archived nutch job jar has a size of about 400M, every step will upload
> this archive and distribute to every work node. Is there away to upload
> only nutch jar, but leave depended libs on every work node?
> >
>
>

-- 
Thanks & Regards
Surendra Babu Katta
8886747555

Re: How to avoid repeatedly upload job jars

Reply via email to