To begin specify full path to the nutch index.
2009/3/6 Lukas, Ray <[email protected]>
>
> I am not able to make any nutch query work. I know it is something
> simple. Could someone take a look at what I am doing..
>
> Here is the code I am using, it is pretty simple:
>
>
> NutchBean bean = new NutchBean(conf);
> Query query = Query.parse("title:credit", conf);
> Hits hits = bean.search(query, 10);
> System.out.println("hits.getLength()=>" +
> hits.getLength());
>
> The configuration is the exact same configuration I am using to create
> the indexes. The very same object. Pointing Luke at these indexes and
> issuing the above search yields plenty of hits. Yet this yields no hits.
>
> The other factor in the set up in nutch-site.xml. I added the following
> which points me to the root directory of my newly created indexes.
>
> <property>
> <name>searcher.dir</name>
> <value>outputDir</value>
> <description>
> Text Removed from this email, remains in code
> </description>
> </property>
>
> Query returns zero hits. Tried several things, no luck. Can you help me
> out?
>
> ray
>
> On Mar 3, 2009, at 7:14 PM, [email protected] wrote:
>
> >
> > Hi,
> >
> > I will need to index all links in domains then. What do you think a
> > linux box like yours with DSL connection is OK to index the domains
> > I have?
> >
> > Why only segments? I thought we need to merge all sub folders under
> > crawl folder. What did you use for merging them?
> >
> > Thanks.
> > A.
> >
> >
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: John Martyniak <[email protected]>
> > To: [email protected]
> > Sent: Tue, 3 Mar 2009 3:21 pm
> > Subject: Re: what is needed to index for about 10000 domains
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Well the way that nutch works is that you would inject your list of
> > domains into the DB, and that would be the starting point. Since
> > nutch uses a crawler it would grab those pages, and determine if
> > there are any links on those pages, and then add them to the DB. So
> > the next time that you generated your urls to fetch, it would take
> > your original list, plus the ones that it found to generate the new
> > segment.?
> > ?
> >
> > If you wanted to limit it to only pages contained on your 10000
> > domains, you could use the regex-urlfilter.txt file in the conf
> > directory to limit it to your list. But you would have to create a
> > regular expression for each one.?
> > ?
> >
> > I am not familiar with the merge script on the wiki, but have merged
> > segments before and it did work. But that was on Linux, don't think
> > that should make a difference though.?
> > ?
> >
> > -John?
> > ?
> >
> > ?
> >
> > On Mar 3, 2009, at 5:10 PM, [email protected] wrote:?
> > ?
> >
> >> ?
> >
> >> Hi,?
> >
> >> ?
> >
> >> Thanks for the reply. I have list? of those domains only. I am not
> >> > sure how many pages they have. Is a DSL? connection sufficient to
> >> > run nutch in my case. Did you run nutch for all of your pages at
> >> > once or separately for a given subset of them. Btw, yesterday I >
> >> tried to use merge shell script that we have on wiki. It gave a lot
> >> > of errors. I run it on cygwin though.?
> >
> >> ?
> >
> >> Thanks.?
> >
> >> A.?
> >
> >> ?
> >
> >> ?
> >
> >> ?
> >
> >> ?
> >
> >> ?
> >
> >> ?
> >
> >> ?
> >
> >> -----Original Message-----?
> >
> >> From: John Martyniak <[email protected]>?
> >
> >> To: [email protected]?
> >
> >> Sent: Tue, 3 Mar 2009 1:44 pm?
> >
> >> Subject: Re: what is needed to index for about 10000 domains?
> >
> >> ?
> >
> >> ?
> >
> >> ?
> >
> >> ?
> >
> >> ?
> >
> >> ?
> >
> >> ?
> >
> >> ?
> >
> >> ?
> >
> >> I think that in order to answer that questions, it is necessary to
> >> > know how many total pages are being indexed.??
> >
> >> ??
> >
> >> ?
> >
> >> I currently have ~3.5 million pages indexed, and the segment >
> >> directories are around 45GB, The response time is relatively fast.??
> >
> >> ??
> >
> >> ?
> >
> >> In the test site it is running on a dual processor Dell 1850 with >
> >> 3GB of RAM.??
> >
> >> ??
> >
> >> ?
> >
> >> -John??
> >
> >> ??
> >
> >> ?
> >
> >> On Mar 3, 2009, at 3:44 PM, [email protected] wrote:??
> >
> >> ??
> >
> >> ?
> >
> >>> Hello,??
> >
> >> ?
> >
> >>> ??
> >
> >> ?
> >
> >>> I use nutch-0.9 and need to index about 10000? domains.? I want to
> >>> >> > know? minimum requirements to hardware and memory.??
> >
> >> ?
> >
> >>> ??
> >
> >> ?
> >
> >>> Thanks in advance.??
> >
> >> ?
> >
> >>> Alex.??
> >
> >> ??
> >
> >> ?
> >
> >> ?
> >
> >> ?
> >
> >> ?
> >
> >> ?
> > ?
> >
> >
> >
> >
> >
>
>
--
Best Regards
Alexander Aristov