Hi, you have to subscribe to the list by sending a mail to [email protected] for further information, see http://nutch.apache.org/mailing_lists.html
Best, Sebastian On 03/03/2017 09:03 AM, 391772322 wrote: > Sebastian: > > > I'm sorry. It's the first time I use mail list, would you be kind to tell me > how to start a new thread? > > > bellow is all I known of a mail list be: > > > send a mail to "[email protected]". > > > ------------------ 原始邮件 ------------------ > 发件人: "Sebastian Nagel";<[email protected]>; > 发送时间: 2017年3月3日(星期五) 凌晨1:33 > 收件人: "user"<[email protected]>; > > 主题: Re: How to avoid repeatedly upload job jars > > > > Hi, > > please, start a new thread for a new topic or question. > That will others help to find the right answer for their problem > when searching in the mailing list archive. > > Thanks, > Sebastian > > On 03/02/2017 11:01 AM, katta surendra babu wrote: >> Hi Sebastian, >> >> >> I am looking to work with Json related website to crawl the data of that >> website by using Nutch 2.3.1 , Hbase0.98 , Solr5.6 . >> >> >> >> Here the problem is : >> >> for the 1st round I get the Json data into Hbase, but for second round I >> am not getting the meta data and the html links in nutch >> >> >> So, please help me out if you can ... to crawl the Json website completely. >> >> >> >> On Thu, Mar 2, 2017 at 3:21 PM, Sebastian Nagel <[email protected]> >> wrote: >> >>> Hi, >>> >>> maybe the Hadoop Distributed Cache is what you are looking for? >>> >>> Best, >>> Sebastian >>> >>> On 03/02/2017 01:35 AM, 391772322 wrote: >>>> archived nutch job jar has a size of about 400M, every step will upload >>> this archive and distribute to every work node. Is there away to upload >>> only nutch jar, but leave depended libs on every work node? >>>> >>> >>> >> >> >

