I've been having problems with the merge portion of the script too.
My solution was to check the success status of the merge ( $? ), and
if it failed, try again, or wait until next time.
nutch_bin/nutch mergesegs $merged_segment -dir $segments
if [ $? -ne 0 ]
then
echo merging segments
Dear nutch users
. I have some problem with nutch result some times it gives me blank page
without any error but when i see the log file i got some error. I don't
understand how to solve it i tried all the ways up to my extend. So i though
of asking you.
I am using fedora 6 + tomcat5 + jre6+
Hi,
I had the same problem using re-crawl scripts from wiki. They all work
fine with nutch versions up to 0.9 (0.9 included), but when using
nutch-1.0-dev (from trunk) they brak at merge of indexes. Reason is that
merge in nutch-0.9 (from re-crawl scripts):
bin/nutch merge crawl/indexes
hi,
regarding hit summaries:
The summaries are generated at search time. This is necessary, since
different queries will generate different summaries (and different terms
will be highlighted). The parsed text is stored in the various
segments/timestamp folders. I don't know which directory it
Hi Tomislav and Nutch users
I could not solve the problem with your instructions.
I crawled two times. In re-crawl. It generated crawl/NEWindexes.
crawl/indexes was generated in 1st crawl.
I merged == bin/nutch merge crawl/index crawl/indexes/ crawl/NEWindexes/
Now search.jsp is showing
I have merged the old as well as new segments into segments dir. Still
the same error comes.
On 9/20/07, Tomislav Poljak [EMAIL PROTECTED] wrote:
Hi Alexis,
I think that your problem is not so much in index (or merging indexes)
but in segments, because if you look at the exception you will see
We can do two things to solve this problem.
SOLUTION 'A'
1. Once the 'depth' loop is complete, merge the segments in
'crawl/segments/'. ('crawl/segments/' will have one merged segment of
the past plus all the segments generated in the depth loop, one for
each iteration of the loop.) They are now
Hi,
I am little confused about what exactly dedup does?
a. Does dedup delete duplicate documents from Index and Segments?
b. Is there a way that we could delete duplicated documents for two
segments?
Let me know. Thanks.
--
View this message in context:
Hello everyone,
I'm not going to post my config files as not to spam you all, but I
have a general question: I'm trying to index the pages of a website
(obviously), and I've created a special page with a link to all the
pages I want to index. I then pointed nutch to this special link page.
I set
I am having a problem with cached pages. images are not showing in them. how
can I make images show in them?
I am new to Nutch and having difficulties. please help me to show images in
cached page.
karthik085 wrote:
Hi,
I am little confused about what exactly dedup does?
a. Does dedup delete duplicate documents from Index and Segments?
Only from the index.
b. Is there a way that we could delete duplicated documents for two
segments?
bin/nutch mergesegs
--
Best regards,
Andrzej
Thanks - that's much clearer.
Andrzej Bialecki wrote:
karthik085 wrote:
Hi,
I am little confused about what exactly dedup does?
a. Does dedup delete duplicate documents from Index and Segments?
Only from the index.
b. Is there a way that we could delete duplicated documents
See NUTCH-281. https://issues.apache.org/jira/browse/NUTCH-281
On 9/20/07, Joseph M. [EMAIL PROTECTED] wrote:
I am having a problem with cached pages. images are not showing in them. how
can I make images show in them?
I am new to Nutch and having difficulties. please help me to show images
Nutch uses HTTP/1.0 GET request. if I change the java program in
HttpResponse.java to reqStr.append( HTTP/1.1\r\n); will it create
any problem?
I have some newbie questions.
- There are two filters crawl-urlfilter.txt and regex-urlfilter.txt.
Which one should be configured in which condition?
- Is it possible to see howmuch bandwidth Nutch crawl consumes?
- Can the Nutch bot do NTLM authentication for websites in a domain?
- Is there
Look in nutch-default.xml
The properties db.max.outlinks.per.page and http.content.limit might
need to have their values increased.
Cheers,
Carl.
Jeff Maki wrote:
Hello everyone,
I'm not going to post my config files as not to spam you all, but I
have a general question: I'm trying to
I was browsing Nutch JIRA. As per my observation, some patches are
merged into trunk and some are merged into hudson - Nutch-Nightly.
This is pretty confusing to me as a user.
As a user, which branch should I check out if I want the latest Nutch
with cutting-edge features and least open issues?
17 matches
Mail list logo