Does it not have anything in the database or are there entries in the
index but nothing is being returned by the search?
Dennis
victor_emailbox wrote:
Can anyone help?
Thanks.
victor_emailbox wrote:
Hi,
I followed all the steps in the 0.8 tutorial except that I have only 2
urls in the
Hi,
I am trying to configure a recent nutch (0.8+) to configure to fetch
directly from the file system instead of http which is fairly slow. The
fetcher hits a 404 - File not found (see below). When I'm copying the
file:/// URL into lynx it gets found without any problems.
2006-09-15 10:29:57,
On 9/14/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
Everyone, thanks for the help with this. I hope to return the
assistance, once I am more familiar with 0.8. I am using tail -f now to
monitor my test crawls. It also look like you can use
conf/hadoop-env.sh to redirect log file output to
Hello Jared,
[EMAIL PROTECTED] wrote:
Everyone, thanks for the help with this. I hope to return the
assistance, once I am more familiar with 0.8. I am using tail -f now to
monitor my test crawls. It also look like you can use
conf/hadoop-env.sh to redirect log file output to a different locat
Everyone, thanks for the help with this. I hope to return the
assistance, once I am more familiar with 0.8. I am using tail -f now to
monitor my test crawls. It also look like you can use
conf/hadoop-env.sh to redirect log file output to a different location
for each of your configurations.
One
[EMAIL PROTECTED] wrote:
Hi,
Is there a way to filter pages before they're indexed in Nutch? I try to crawl
an Intranet site but only PDF documents should make it to the index (in later
stages this will be extended but PDFs are the main focus). I've tried using the
regex or suffix filters but
Thats the way I set it up at first.
This time, I started with a blank slate, unpacked nutch and tomcat,
unpacked nutch-0.8.war into the webapps/ROOT and left the deployed app
untouched.
The above means that you have an empty nutch-site.xml under
webapps/ROOT and you have a nutch-default.xml with
Hi,
Is there a way to filter pages before they're indexed in Nutch? I try to crawl
an Intranet site but only PDF documents should make it to the index (in later
stages this will be extended but PDFs are the main focus). I've tried using the
regex or suffix filters but this prevents the crawling
[EMAIL PROTECTED] hpp]$ hadoop jar hadoop-0.6.1-examples.jar grep input output
'dfs[a-z.]+'
06/09/14 23:04:50 INFO conf.Configuration: parsing
file:/home/wangensh/hadoop-0.
On 9/14/06, Zaheed Haque <[EMAIL PROTECTED]> wrote:
On 9/14/06, Tomi NA <[EMAIL PROTECTED]> wrote:
> On 9/5/06, Zaheed Haque <[EMAIL PROTECTED]> wrote:
> > Hi:
>
> I have a problem or two with the described procedure...
>
> > Assuming you have
> >
> > index 1 at /data/crawl1
> > index 2 at /data/
On 9/14/06, Tomi NA <[EMAIL PROTECTED]> wrote:
On 9/5/06, Zaheed Haque <[EMAIL PROTECTED]> wrote:
> Hi:
I have a problem or two with the described procedure...
> Assuming you have
>
> index 1 at /data/crawl1
> index 2 at /data/crawl2
Used ./bin/nutch crawl urls -dir /home/myhome/crawls/mycrawl
On 9/5/06, Zaheed Haque <[EMAIL PROTECTED]> wrote:
Hi:
I have a problem or two with the described procedure...
Assuming you have
index 1 at /data/crawl1
index 2 at /data/crawl2
Used ./bin/nutch crawl urls -dir /home/myhome/crawls/mycrawldir to
generate an index: luke says the index is vali
To answer your question, some more information is needed:
1) How do you decide which "topic" a particular page belongs to? URL
segments? The Title? Other html page elements? Latent Semantic Analysis (
http://en.wikipedia.org/wiki/Latent_semantic_indexing)?
2) Given a topic, how will your end
Hi nutch-users,
I have updated my nutch version (0.7.2) to include the analysis-fr plugin as
described by Jérôme in the Nutch Wiki (Multi Lingual Support) and NUTCH-261.
I've updated as well the front-end to take advantages of this analyzer in
queries.
The french stemming seems to work well (the
Lakshman, Madhusudhan wrote:
Hi Group,
We have a requirement where we should display the search result along
with the snippet (2-3 lines) of the content, something similar to
Google, where this snippet is displayed after the title line as shown
below:
Welcome to Nutch!
This is the fir
I want to realize a topic-based search engine through modifing the nutch. For
example I define a computer topic so I hope that I only find some information
about computer. I can't find the appropriate point where I can insert myself
sentence in Fetcher.java. Please tell me how can I modify t
Hi Group,
We have a requirement where we should display the search result along
with the snippet (2-3 lines) of the content, something similar to
Google, where this snippet is displayed after the title line as shown
below:
Welcome to Nutch!
This is the first Nutch release as an Apache Luc
I don't know if I understand completely your email.
What you mean with "cache"?
So if you go with the standard search results page, there is a link to
a cached copy of the page. If the page was html, then there are no
problems, however, if the page was binary, it returns a http 500
internal se
On my system, I run the crawl command in one shell while running this
command in another shell to monitor the crawl:
tail -f log/hadoop.log
Of course this does about the same thing as listed below, but "tail
-f" is a little easier to remember.
On 9/13/06, Tomi NA <[EMAIL PROTECTED]> wrote:
On 9/
19 matches
Mail list logo