Obviously not:  it must be for « [urls] just ending in US
extensions(.com.edu etc...) ». :))

Anyway, it all sounds very impressive!  Good luck with your
investigations and please keep us posted.


Regards,
Sébastien.


--- [EMAIL PROTECTED] a écrit :

> Wow, a pile of questions. :)
> Is this for a web-wide search engine?
> 
> Otis
> 
> 
> --- Jay Pound <[EMAIL PROTECTED]> wrote:
> 
> > whats the bottleneck for the slow searching, I'm monitoring it and
> > its doing
> > about 57% cpu load when I'm searching , it takes about 50secs to
> > bring up
> > the results page the first time, then if I search for the same
> thing
> > again
> > its much faster.
> > Doug, can I trash my segments after they are indexed, I don't want
> to
> > have
> > cached access to the pages do the segments still need to be there?
> my
> > 30mil
> > page index/segment is using over 300gb I have the space, but when I
> > get to
> > the hundreds of millions of pages I will run out of room on my raid
> > controler's for hd expansion, I'm planning on moving to lustre if
> > ndfs is
> > not stable by then. I plan on having a multi billion page index if
> > the
> > memory requirements for that can be below 16gb per search node.
> right
> > now
> > I'm getting pretty crappy results from my 30 million pages, I read
> > the
> > whitepaper on Authoritative Sources in a Hyperlinked Environment
> > because
> > someone said thats how the nutch algorithm worked, so I'm assuming
> as
> > my
> > index grows the pages that deserve top placement will recieve top
> > placement,
> > but I don't know if I should re-fetch a new set of segments with
> root
> > url's
> > just ending in US extensions(.com.edu etc...) I made a small set
> > testing
> > this theory (100000 pages) and its results were much better than my
> > results
> > from the 30mill page index. whats your thought on this, am I right
> in
> > thinking that the pages with the most pages linking to them will
> show
> > up
> > first? so if I index 500 million pages my results should be on par
> > with the
> > rest of the "big dogs"?
> > 
> > one last important question, if I merge my indexes will it search
> > faster
> > than if I don't merge them, I currently have 20 directories of
> > 1-1.7mill
> > pages each.
> > and if I split up these indexes across multiple machines will the
> > searching
> > be faster, I couldent get the nutch-server to work but I'm using
> 0.6.
> > 
> > I have a very fast server I didnt know if the searching would take
> > advantage
> > of smp, fetching will and I can run multiple index's at the same
> > time. my HD
> > array is 200MB a sec i/o I have the new dual core opteron 275 italy
> > core
> > with 4gb ram, working my way to 16gb when I need it and a second
> > processor
> > when I need it, 1.28TB of hd space for nutch currently with
> expansion
> > up to
> > 5.12TB, I'm currently running windows 2000 on it as they havent
> made
> > a
> > driver yet for suse 9.3 for my raid cards (highpoint 2220) so my
> > scalability
> > will be to 960MB a sec with all the drives in the system and 4x2.2
> > Ghz
> > processor cores. untill I need to cluster thats what I have to play
> > with for
> > nutch.
> > in case you guys needed to know what hardware I'm running
> > Thank you
> > -Jay Pound
> > Fromped.com
> > BTW windows 2000 is not 100% stable with dual core processors.
> nutch
> > is ok
> > but cant do too many things at once or I'll get a kernel inpage
> error
> > (guess
> > its time to migrate to 2003.net server-damn)
> > ----- Original Message ----- 
> > From: "Doug Cutting" <[EMAIL PROTECTED]>
> > To: <[email protected]>
> > Sent: Tuesday, August 02, 2005 1:53 PM
> > Subject: Re: Memory usage
> > 
> > 
> > > Try the following settings in your nutch-site.xml:
> > >
> > > <property>
> > >    <name>io.map.index.skip</name>
> > >    <value>7</value>
> > > </property>
> > >
> > > <property>
> > >    <name>indexer.termIndexInterval</name>
> > >    <value>1024</value>
> > > </property>
> > >
> > > The first causes data files to use considerably less memory.
> > >
> > > The second affects index creation, so must be done before you
> > create the
> > > index you search.  It's okay if your segment indexes were created
> > > without this, you can just (re-)merge indexes and the merged
> index
> > will
> > > get the setting and use less memory when searching.
> > >
> > > Combining these two I have searched a 40+M page index on a
> machine
> > using
> > > about 500MB of RAM.  That said, search times with such a large
> > index are
> > > not good.  At some point, as your collection grows, you will want
> > to
> > > merge multiple indexes containing different subsets of segments
> and
> > put
> > > each on a separate box and search them with distributed search.
> > >
> > > Doug
> > >
> > > Jay Pound wrote:
> > > > I'm testing an index of 30 million pages, it requires 1.5gb of
> > ram to
> > search
> > > > using tomcat 5, I plan on having an index with multiple billion
> > pages,
> > but
> > > > if this is to scale then even with 16GB of ram I wont be able
> to
> > have an
> > > > index larger than 320million pages? how can I distribute the
> > memory
> > > > requirements across multiple machines, or is there another
> > servlet
> > program
> > > > (like resin) that will require less memory to operate, has
> anyone
> > else
> > run
> > > > into this?
> > > > Thanks,
> > > > -Jay Pound
> > > >
> > > >
> > >
> > >
> > 
> > 
> > 
> > 
> > -------------------------------------------------------
> > SF.Net email is sponsored by: Discover Easy Linux Migration
> > Strategies
> > from IBM. Find simple to follow Roadmaps, straightforward articles,
> > informative Webcasts and more! Get everything you need to get up to
> > speed, fast.
> http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
> > _______________________________________________
> > Nutch-general mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/nutch-general
> > 
> 
> 



        

        
                
___________________________________________________________________________ 
Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger 
Téléchargez cette version sur http://fr.messenger.yahoo.com

Reply via email to