Currently the fetcher class is a 1,500 line piece of code.
I'd like to suggest splitting it up to multiple files to improve
readability and maintainability of the code instead of this one big class
with many nested classes.
The classes are grouped anyways by the fetcher namespace so having them
I meant writing batch/cmd scripts for windows that don't require Cygwin.
I was thinking of writing those scripts but wanted to check if people think
it's a good idea.
On Sunday, May 18, 2014, Julien Nioche lists.digitalpeb...@gmail.com
wrote:
Hi
Currently nutch isn't very friendly to
Hi,
Currently nutch isn't very friendly to windows users as it requires cygwin
to run and there are a lot of issues with Hadoop 1.x branch, which nutch
bundles with it, due to the set tmp permission issue.
What do you think about doing two things:
1. Move to Hadoop 2.4 to support windows/linux
Thanks!
Created a JIRA issue with the patch
https://issues.apache.org/jira/browse/NUTCH-1783
On Tue, May 13, 2014 at 12:19 AM, Markus Jelsma
markus.jel...@openindex.iowrote:
Hi Diaa,
Yes, you can open an issue for these fixes and attach patches if you can.
Cheers,
Markus
Diaa
Hi,
In some cases when you crawl a webpage you already know many page urls that
have a similar structure.
For example in imdb entertainment artists have the following link structure:
http://www.imdb.com/name/nm1/
http://www.imdb.com/name/nm2/
http://www.imdb.com/name/nm6499112/
How about
Hi,
I noticed that nutch doesn't handle cleaning up (removing temp folders) in
case of error.
In the following classes temp directories are created but not removed when
there is an error:
1. Injector
2. CrawlDBReader
3. Deduplication
4. SegmentReader
For example in injector you find:
RunningJob
Anyone wanna commit this?
On Mon, Apr 28, 2014 at 12:04 AM, Diaa (JIRA) j...@apache.org wrote:
[
https://issues.apache.org/jira/browse/NUTCH-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13982485#comment-13982485]
Diaa commented on
/display/solr/Apache+Solr+Reference+Guide
What do you think?
On Thu, Apr 24, 2014 at 5:01 PM, Diaa Abdallah diaa.abdelmon...@gmail.com
wrote:
Hi,
I am trying to improve the documentation of nutch while I'm going through
its classes.
I do that by creating tasks on jira.
Is that the correct
Hi,
Is there a way to debug nutch from Windows?
I followed the steps on https://wiki.apache.org/nutch/RunNutchInEclipse
and reached step 6 however when I run the application it says:
Cannot run program chmod: CreateProcess error=2, The system cannot find
the file specified
How would I go about
9 matches
Mail list logo