Hi, CC'd user@nutch
Which version of Nutch are you using? Your command line usage seems to be outdated. Can you please confirm? Thank you Lewis On Wed, Sep 23, 2015 at 2:55 AM, Vu Quang Tin <[email protected]> wrote: > Hi Lewis John McGibbney. > I'm a vietnam. > I'm not very good english. > I have problems with the crawler web by Nutch. > when i using : > > ./bin/nutch org.apache.nutch.parse.ParserChecker "http://dantri.com.vn/ > ">dantri2.txt > result: > fetching: http://dantri.com.vn/ > parsing: http://dantri.com.vn/ > contentType: application/xhtml+xml > signature: 5ddaf9394c8b4bd3ce275253e22e7c7e > --------- > Url > --------------- > > http://dantri.com.vn/ > --------- > Metadata > --------- > ... > Outlinks > --------- > > outlink: toUrl: > http://dantri3.vcmedia.vn/App_Themes/Default/Images/favico.ico anchor: > outlink: toUrl: http://admicro1.vcmedia.vn/ads_codes/ads_box_224.ads > anchor: > outlink: toUrl: http://admicro1.vcmedia.vn/ads_codes/ads_box_256.ads > anchor: > outlink: toUrl: http://admicro1.vcmedia.vn/ads_codes/ads_box_226.ads > anchor: > outlink: toUrl: http://admicro1.vcmedia.vn/ads_codes/ads_box_227.ads > anchor: > outlink: toUrl: http://admicro1.vcmedia.vn/ads_codes/ads_box_1087.ads > anchor: > outlink: toUrl: http://admicro1.vcmedia.vn/ads_codes/ads_box_228.ads > anchor: > .... > > --------- > Headers > --------- > > Date : Wed, 23 Sep 2015 03:15:27 GMT > Content-Length : 33298 > Content-Encoding : gzip > ServerName : 118 > Connection : close > Content-Type : text/html; charset=utf-8 > Server : Microsoft-IIS/7.5 > X-Powered-By : ASP.NET > Cache-Control : private > > great number of outlink( >300 link) -->OK > > but When i using: > ./bin/crawl ./dantri/urls_common urls_dantri14 http://localhost:8983/solr/ > 10 >crawlCommonMotTheGioilogThread.log > > result > http://dantri.com.vn/ key: vn.com.dantri:http/ > baseUrl: null > status: 2 (status_fetched) > fetchTime: 1445590410793 > prevFetchTime: 1442998395854 > fetchInterval: 2592000 > retriesSinceFetch: 0 > modifiedTime: 0 > prevModifiedTime: 0 > protocolStatus: SUCCESS, args=[] > parseStatus: success/redirect (1/100), args=[http://dantri.com.vn/,1800 > ] > title: null > score: 1.0 > marker _injmrk_ : y > marker dist : 0 > reprUrl: http://dantri.com.vn/ > batchId: 1442998403-21918 > ... > ..... > metadata meta_generator : VCCorp.vn > metadata meta_content-type : text/html; charset=UTF-8 > metadata meta_resource-type : Document > metadata OriginalCharEncoding : utf-8 > metadata meta_copyright : Công ty Cổ phần VCCorp > metadata _rs_ : \00\00\00# > outlink: http://dantri.com.vn/ > inlink: http://dantri.com.vn/ > > ERROR: only 1 outlink and 1 inlink > and in log: "parseStatus: success/redirect (1/100), args=[ > http://dantri.com.vn/,1800]" > > > (while the right result : parseStatus: success/ok (1/0), args=[]) > > I can not configure Nutch in last 2 weeks. > Can You help me? > thanks verry verry much! > > > > -- *Lewis*

