Re: [Nutch-dev] Looking to fix relative path issue in linkdb

Andrzej Bialecki Thu, 19 Jul 2007 03:25:06 -0700

Robert Young wrote:
> In org.apache.nutch.crawl.LinkDb on line 261 it creates a working
> directory (newLinkDb) based on the current working directory. This
> should be configurable rather than being based on where Tomcat was
> started. I am planning on writing a patch to pull the hadoop.tmp.dir
> setting if it is available, falling back to the current directory.
> 
> Can anyone see any obvious problems with doing this?


I'm not sure what Tomcat has to do with this. LinkDb does it this way in 
order to avoid rename() operation across physical volumes - if you 
invoke rename() on a local FS it may trigger a costly copy operation.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Re: [Nutch-dev] Looking to fix relative path issue in linkdb

Reply via email to