Re: How to avoid repeatedly upload job jars

Sebastian Nagel Thu, 02 Mar 2017 09:34:05 -0800

Hi,

please, start a new thread for a new topic or question.
That will others help to find the right answer for their problem
when searching in the mailing list archive.


Thanks,
Sebastian

On 03/02/2017 11:01 AM, katta surendra babu wrote:
> Hi Sebastian,
> 
> 
>  I am looking  to work with  Json related website to crawl the data of that
> website  by using  Nutch 2.3.1 , Hbase0.98 , Solr5.6 .
> 
> 
> 
> Here the problem is :
> 
>  for the 1st round I get the Json data into Hbase, but for second round  I
> am not getting the meta data and the html links in nutch
> 
> 
> So, please help me out if you  can ... to crawl the Json website completely.
> 
> 
> 
> On Thu, Mar 2, 2017 at 3:21 PM, Sebastian Nagel <[email protected]>
> wrote:
> 
>> Hi,
>>
>> maybe the Hadoop Distributed Cache is what you are looking for?
>>
>> Best,
>> Sebastian
>>
>> On 03/02/2017 01:35 AM, 391772322 wrote:
>>> archived nutch job jar has a size of about 400M, every step will upload
>> this archive and distribute to every work node. Is there away to upload
>> only nutch jar, but leave depended libs on every work node?
>>>
>>
>>
> 
>

Re: How to avoid repeatedly upload job jars

Reply via email to