formation Retrieval, Semantic Web
> ___|||__|| \| || | Embedded Unix, System Integration
> http://www.sigram.com Contact: info at sigram dot com
>
___
Mischa Tuffield
Email: mischa.tuffi...@garlik.com
Homepage - http://mmt.me.uk/
Garlik Limited, 2 Sheen Road, R
, Mischa Tuffield wrote:
> Hello All,
>
> I am getting the following error in my hadoop.log (see below). It seems to
> happen everytime I run any of the nutch command line tools :(
>
>
>
> Does anyone know what problem I am having ?
>
> Cheers,
>
> M
Hi Andrzej,
Yeah, I just noticed that this stack trace is for DEBUG purposes only I found
it in the hadoop src, thanks for the info.
Regards,
Mischa
On 25 Nov 2009, at 13:11, Andrzej Bialecki wrote:
> Mischa Tuffield wrote:
>> Hello Again, Following my previous post below, I hav
ate the signatures.
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Andrzej Bialecki <><
>>> ___. ___ ___ ___ _ _ __
>>> [__ || __|__/|__||\/| Information Retrieval, Semantic Web
>
set this default encoding (is
> UTF-8?) to the one that I need (ASCII I guess).
>
> Thanks in advance ;)
> --
> View this message in context:
> http://old.nabble.com/Encoding-the-content-got-from-Fetcher-tp26528468p26528468.html
> Sent from the Nutch - User mailing list archiv
files of the crawl data.
_______
Mischa Tuffield
Email: mischa.tuffi...@garlik.com
Homepage - http://mmt.me.uk/
Garlik Limited, 2 Sheen Road, Richmond, TW9 1AE, UK
+44(0)20 8973 2465 http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: T
lowing:
>
> ls crawl/crawldb/current/part-0/
> data.data.crc index .index.crc
>
> How do I convert the output to human readable format ?
>
> Thanks
_______
Mischa Tuffield
Email: mischa.tuffi...@garlik.com
Homepage - http://mmt.me.
>> proprietary. The information is intended for the use of the
>>>>>>>> individual
>>>>>>>> or entity named above. If you are not the intended recipient, be
>>>>>>>> aware
>>>>>>>> that any disclosure, copying, distribution, or use of the contents of
&g
gt;
> Can I accomplish this by setting the depth argument for 'crawl' to "0"?
>
> If I set the depth to 0, I get a message that says "No URLs to fetch - check
> your seed list and URL filters.".
>
> Any help will be greatly appreciated.
&
etc., it will never crawl any of the
> outlinks. Is that correct?
>
> Regards,
> Kumar.
>
> Mischa Tuffield wrote:
>> Hello Kumar,
>> There is a config property you can set in conf/nutch-site.xml, as follows :
>>
>> This will force nutch to only fetch
ike everything is running fine but then the index
> doesn't get created.
>
> Thanks,
> Kumar.
___
Mischa Tuffield
Email: mischa.tuffi...@garlik.com
Homepage - http://mmt.me.uk/
Garlik Limited, 2 Sheen Road, Richmond, TW9 1AE, UK
+44(0)20 8973 2465 htt
Hi,
Perhaps you are crawling and writing to the hdfs? Have you checked the
directory structure of the nutch user in your hadoop dfs? I was caught
out by that early on.
Mischa
Sent on the move
On 11 Jan 2010, at 09:12, zud wrote:
i have run nutch 1.0 in eclipse in linux every thing wor
09669.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
___
Mischa Tuffield
Email: mischa.tuffi...@garlik.com
Homepage - http://mmt.me.uk/
Garlik Limited, 2 Sheen Road, Richmond, TW9 1AE, UK
+44(0)20 8973 2465 http://www.garlik.com/
Regis
gt;>>> Do you have to set the -Xss flag somewhere else?
>>>
>>> Yes, in bin/nutch - looking for where it sets -Xmx
>>>
>>> - Godmar
>>
>>
>
>
___
Mischa Tuffield
Email: mischa.tuffi...@garlik.com
Homepage - http://mmt.me.uk/
Garlik Limited, 2 Sheen Roa
've tried sed -e 's/.com\/.*//g' 1 >> 2, and got this output
> http://www.mydomain
>
>
> Not only it removed everything after .com/, but it also removed the .com/
>
> How do I rewrite it, so I could keep the .com/ to have
> http://www.mydomain.com/
>
15 matches
Mail list logo