Hi Feng,
map Writrable is a kind of hashmap.
You can put in any key value pair, but the key and values need to be
Writables:
http://lucene.apache.org/hadoop/docs/api/org/apache/hadoop/io/
Writable.html
You can use UTF8 as StingKey and Value or ByteWritable as key and
Utf8 as Values.
Etc.
Hi all!
I have a question about nutch-default.xml configuration file. There is a
parameter db.default.fetch.interval that is set by default to 30. It means
that pages from the webdb are recrawled every 30
days.http://www.mail-archive.com/nutch-user@lucene.apache.org/msg02058.htmlI
want to know
Hi,Lourival.
You wrote 12 èþíÿ 2006 ã., 19:33:15:
Hi all!
I have a question about nutch-default.xml configuration file. There is a
parameter db.default.fetch.interval that is set by default to 30. It means
that pages from the webdb are recrawled every 30
Hi Lourival,
this means all pages older than 30 days are potential candidates for
a fetch list that is created by segment generation process.
Stefan
Am 12.06.2006 um 16:33 schrieb Lourival Júnior:
Hi all!
I have a question about nutch-default.xml configuration file. There
is a
Ok. So, have you any solution to do this job automatically? I have a shell
script, but I don't see if this really works yet.
Sorry if I'm being redundant. I'm learn about this tool and I have a lot of
questions :).
Thanks!
On 6/12/06, Dima Mazmanov [EMAIL PROTECTED] wrote:
Hi,Lourival.
You
Hi,Lourival.
What kind of shell script do you have?
You wrote 12 èþíÿ 2006 ã., 19:51:06:
Ok. So, have you any solution to do this job automatically? I have a shell
script, but I don't see if this really works yet.
Sorry if I'm being redundant. I'm learn about this tool and I have a lot of
Ok. So, have you any solution to do this job automatically? I have
a shell
script, but I don't see if this really works yet.
Shell scripts are the best solution.
Sorry if I'm being redundant. I'm learn about this tool and I have
a lot of
questions :).
No Problem, but the nutch user
Let explain the problem. I have this shell script:
#!/bin/bash
# A simple script to run a Nutch re-crawl
if [ -n $1 ]
then
crawl_dir=$1
else
echo Usage: recrawl crawl_dir [depth] [adddays]
exit 1
fi
if [ -n $2 ]
then
depth=$2
else
depth=5
fi
if [ -n $3 ]
then
adddays=$3
else
adddays=0
Hi,Lourival.
Ok after first indexing you must merge segments,
and if you want to reindex your db, you have to delete segments wich
are older then predefined date, in your case 30 days.
this is my solution, if someone has better , please share your
experience!
Let explain the problem. I have
[ http://issues.apache.org/jira/browse/NUTCH-289?page=all ]
Stefan Groschupf updated NUTCH-289:
---
Attachment: ipInCrawlDatumDraftV5.patch
Release Candidate 1 of this patch.
This patch contains:
+ add IP Address to CrawlDatum Version 5 (as byte[4])
+
[ http://issues.apache.org/jira/browse/NUTCH-303?page=all ]
Jerome Charron resolved NUTCH-303:
--
Resolution: Fixed
Nutch now uses the Commons Logging API and log4j as the default implementation.
There is 3 log4j.properties configuration files:
Hi everybody,
As I have said on another message, I'm trying to get Nutch search for
images.
Till now it's searching alt and title tags and indexing the image content
(the one you see when you open a image on NotePad for example).
Now that I've indexed almost 3 million images, I am trying to
12 matches
Mail list logo