i using nutch 2.3 ,solr-4.10.3 and hbase-0.94.26.
command line for nutch 2.3.
Thank sent back.

On Thu, Sep 24, 2015 at 7:29 AM, Lewis John Mcgibbney <
[email protected]> wrote:

> Hi,
>
> CC'd user@nutch
>
> Which version of Nutch are you using?
> Your command line usage seems to be outdated. Can you please confirm?
> Thank you
> Lewis
>
> On Wed, Sep 23, 2015 at 2:55 AM, Vu Quang Tin <[email protected]>
> wrote:
>
>> Hi Lewis John McGibbney.
>> I'm a vietnam.
>> I'm not very good english.
>> I have problems with the crawler web by Nutch.
>> when i using :
>>
>> ./bin/nutch org.apache.nutch.parse.ParserChecker "http://dantri.com.vn/
>> ">dantri2.txt
>> result:
>> fetching: http://dantri.com.vn/
>> parsing: http://dantri.com.vn/
>> contentType: application/xhtml+xml
>> signature: 5ddaf9394c8b4bd3ce275253e22e7c7e
>> ---------
>> Url
>> ---------------
>>
>> http://dantri.com.vn/
>> ---------
>> Metadata
>> ---------
>> ...
>> Outlinks
>> ---------
>>
>>   outlink: toUrl:
>> http://dantri3.vcmedia.vn/App_Themes/Default/Images/favico.ico anchor:
>>  outlink: toUrl: http://admicro1.vcmedia.vn/ads_codes/ads_box_224.ads
>> anchor:
>>   outlink: toUrl: http://admicro1.vcmedia.vn/ads_codes/ads_box_256.ads
>> anchor:
>>   outlink: toUrl: http://admicro1.vcmedia.vn/ads_codes/ads_box_226.ads
>> anchor:
>>   outlink: toUrl: http://admicro1.vcmedia.vn/ads_codes/ads_box_227.ads
>> anchor:
>>   outlink: toUrl: http://admicro1.vcmedia.vn/ads_codes/ads_box_1087.ads
>> anchor:
>>   outlink: toUrl: http://admicro1.vcmedia.vn/ads_codes/ads_box_228.ads
>> anchor:
>> ....
>>
>> ---------
>> Headers
>> ---------
>>
>> Date :     Wed, 23 Sep 2015 03:15:27 GMT
>> Content-Length :     33298
>> Content-Encoding :     gzip
>> ServerName :     118
>> Connection :     close
>> Content-Type :     text/html; charset=utf-8
>> Server :     Microsoft-IIS/7.5
>> X-Powered-By :     ASP.NET
>> Cache-Control :     private
>>
>> great number of  outlink( >300 link) -->OK
>>
>> but When i using:
>> ./bin/crawl ./dantri/urls_common urls_dantri14
>> http://localhost:8983/solr/ 10 >crawlCommonMotTheGioilogThread.log
>>
>> result
>> http://dantri.com.vn/    key:    vn.com.dantri:http/
>> baseUrl:    null
>> status:    2 (status_fetched)
>> fetchTime:    1445590410793
>> prevFetchTime:    1442998395854
>> fetchInterval:    2592000
>> retriesSinceFetch:    0
>> modifiedTime:    0
>> prevModifiedTime:    0
>> protocolStatus:    SUCCESS, args=[]
>> parseStatus:    success/redirect (1/100), args=[
>> http://dantri.com.vn/,1800]
>> title:    null
>> score:    1.0
>> marker _injmrk_ :     y
>> marker dist :     0
>> reprUrl:    http://dantri.com.vn/
>> batchId:    1442998403-21918
>> ...
>> .....
>> metadata meta_generator :     VCCorp.vn
>> metadata meta_content-type :     text/html; charset=UTF-8
>> metadata meta_resource-type :     Document
>> metadata OriginalCharEncoding :     utf-8
>> metadata meta_copyright :     Công ty Cổ phần VCCorp
>> metadata _rs_ :     \00\00\00#
>> outlink:    http://dantri.com.vn/
>> inlink:    http://dantri.com.vn/
>>
>> ERROR: only 1 outlink and 1 inlink
>> and  in log: "parseStatus:    success/redirect (1/100), args=[
>> http://dantri.com.vn/,1800]";
>>
>>
>> (while the right result : parseStatus:    success/ok (1/0), args=[])
>>
>> I can not configure Nutch in last 2 weeks.
>> Can You help me?
>> thanks verry verry much!
>>
>>
>>
>>
>
>
> --
> *Lewis*
>

Reply via email to