Re: [Nutch-dev] Looking to fix relative path issue in linkdb

Robert Young Thu, 19 Jul 2007 10:17:15 -0700

Yes, I do this for the searcher directory but in the LinkDb class it
makes a reference to a Path which is relative (just for a temporary
working directory). This is the problem, because if I start tomcat in
a path where the java user does not have permissions to create a
directory then LinkDb fails.


On 7/19/07, Briggs <[EMAIL PROTECTED]> wrote:
> I don't use the nutch web application, but....  You don't have to
> start nutch in the searcher directory.  You can set the location of
> the searcher dir within the nutch-site.xml config file.
>
> Add this node and set the location of your index:
>
> <property>
>   <name>searcher.dir</name>
>   <value>/your/path/to/your/index</value>
>   <description>
>   Path to root of crawl.  This directory is searched (in
>   order) for either the file search-servers.txt, containing a list of
>   distributed search servers, or the directory "index" containing
>   merged indexes, or the directory "segments" containing segment
>   indexes.
>   </description>
> </property>
>
>
>
>
>
>
>
> On 7/19/07, Robert Young <[EMAIL PROTECTED]> wrote:
> > Tomcat only comes into it because we have to start Tomcat in the
> > searcher directory, I'm guessing it's the same however you choose to
> > use Nutch. It would still have to do a rename across physical volumes
> > if searcher.dir is set to something different would it not?
> >
> > How does this sound as a sollution? Allow the user to set a
> > configuration option setting the linkdb working dir, or allow the user
> > to set a configuration flag to use another particular configuration
> > option to set the base dir. Otherwise fall back to the default which
> > is the current working directory.
> >
> > Cheers
> > Rob
> >
> > On 7/19/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> > > Robert Young wrote:
> > > > In org.apache.nutch.crawl.LinkDb on line 261 it creates a working
> > > > directory (newLinkDb) based on the current working directory. This
> > > > should be configurable rather than being based on where Tomcat was
> > > > started. I am planning on writing a patch to pull the hadoop.tmp.dir
> > > > setting if it is available, falling back to the current directory.
> > > >
> > > > Can anyone see any obvious problems with doing this?
> > >
> > > I'm not sure what Tomcat has to do with this. LinkDb does it this way in
> > > order to avoid rename() operation across physical volumes - if you
> > > invoke rename() on a local FS it may trigger a costly copy operation.
> > >
> > >
> > > --
> > > Best regards,
> > > Andrzej Bialecki     <><
> > >   ___. ___ ___ ___ _ _   __________________________________
> > > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> > > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> > > http://www.sigram.com  Contact: info at sigram dot com
> > >
> > >
> >
>
>
> --
> "Conscious decisions by conscious minds are what make reality real"
>

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Re: [Nutch-dev] Looking to fix relative path issue in linkdb

Reply via email to