Cristina Belderrain wrote:
On 10/9/06, Tomi NA [EMAIL PROTECTED] wrote:
This is *exactly* what I was thinking. Like Stefan, I believe the
nutch analyzer is a good foundation and should therefore be extended
to support the or operator, and possibly additional capabilities
when the need
Hi all,
As a new nutch's user, I am quite stuck regarding this:
How can I launch a search using a file containing my terms/keywords instead of
typing them in search.jsp??
Do I have to use the Query.term class? If yes, How and where do I use this
class?
Thanks a lot!
Mat
Hi,
Does anyone know how to force a page to be deleted. I have run the
WebDBWriter class and removed the page from the database but it still
shows on the search? Further checks using WebDBReader give a 'null'
response when looking for the page.
Most confusing?
Gary
CAUTION - This message may
2006/10/10, Cristina Belderrain [EMAIL PROTECTED]:
On 10/9/06, Tomi NA [EMAIL PROTECTED] wrote:
This is *exactly* what I was thinking. Like Stefan, I believe the
nutch analyzer is a good foundation and should therefore be extended
to support the or operator, and possibly additional
Hello,
I use the code below to get the term frequency for the term searched for by
the user. However, if the query consists of more than one word (separated by
space), or if it consists of a phrase within quotes, the term frequency
equals zero with this code. How can I get the term
Tomi said:
In conclusion, my position is pragmatic: I welcome the simplest
solution to implement the or search. I just believe that it'd be
easiest to do that extending the nutch Analyzer.
This seems like a very reasonable approach. I too would very much like
OR. It would also be nice if it
It completely depends on the number of urls in the crawldb.
Dennis
jaison Qburst wrote:
What will be the maximum size of crawlDb on a single node?
You would have to write something that would loop through the file and
then construct a Query object using the addRequired and addProhibited
methods to add your terms and phrases. Then pass that into the
appropriate NutchBean search method to get your results.
Dennis
frgrfg gfsdgffsd wrote:
You could write a MapReduce job that would use the parse_data folder as
input and inside the map or reduce class depending on your logic use
jdbc to update to mysql. It would look something like this for the job
configuration.
JobConf yourjob= new NutchJob(conf);
for (int i = 0; i
What java version are you using. Might be needing java 5?
Dennis
Adam Borkowski wrote:
Question from then newbie.
I've just downloaded version 0.8.1 and going trough the tutorial.
Almost got to the end, but after index command:
bin/nutch index crawl/indexes crawl/crawldb crawl/linkdb
It looks you syntax is correct ( category:video searchString). Try to
write a LOG.info line into
org.apache.nutch.searcher.LuceneQueryOptimizer(Line 178), just at the
begining of the optimize method:
public TopDocs optimize(BooleanQuery original,
Searcher searcher, int numHits,
String sortField,
IOException and setmergeFaktor...? You may check your config and have a look
at your merging faktor. Set it to 50.
What java version are you using. Might be needing java 5?
Dennis
Adam Borkowski wrote:
Question from then newbie.
I've just downloaded version 0.8.1 and going trough
Hy,
how can I write a function to the basic-summarizer so that
if (meta-description)
show meta-description
else
continue with basic summarizer
Matthias
How does the depth option work on the 0.8 recrawl script that is on
http://wiki.apache.org/nutch/IntranetRecrawl . I just want to re-index
all of the pages currently in the db and not index any new pages these
pages might link to. Should I use a 0 for this? It seems like the
fetcher never
For some tests, I ran two fetches on segments which I generated with
topN=50. I then tried to merge these segments using mergesegs with
slice=200 which resulted in 8 segments.
If I only fetched about 100 URLs, why do I end up with 8 segments
containing (supposedly) 200 URLs each?
What is the
Jacob Brunson wrote:
For some tests, I ran two fetches on segments which I generated with
topN=50. I then tried to merge these segments using mergesegs with
slice=200 which resulted in 8 segments.
If I only fetched about 100 URLs, why do I end up with 8 segments
containing (supposedly) 200
The -noAdditions feature would be ideal for my situation. Hopefully it
will be released soon.
Andrzej Bialecki wrote:
Jacob Brunson wrote:
So the depth number is the number of iterations the recrawl script
will go through. In each iteration, it will select a number of URLs
from the crawl
The webdb and the segments are two separate things. The webdb
is basically used by fetcher to keep track of the status of the URL
(like last fetch time, was there an error). The segments contain
the data from the fetches themselves, and also the data's index,
which is used during searches.
So
It's ok now. It was my fault. I unfortunatelly mixed Xalan jar with nutch
distribution. After cleaning classpath, everything went ok.
- Original Message -
From: Dennis Kubes [EMAIL PROTECTED]
To: nutch-user@lucene.apache.org
Sent: Tuesday, October 10, 2006 4:35 PM
Subject: Re:
All,
I downloaded the nutch nightly build on 22/09/2006. I do a crawl over the
file system and my current file list, generated by find is around 80,000
entries (12M). After around half way, the fetcher issueing the message
Aborting with 3 hung threads. Anybody facing the same problem?
Cheers,
I used to have that problem a lot, but not any more. The problem I
thought was connected to
http://issues.apache.org/jira/browse/NUTCH-344 which was closed on
September 24th. I am running
http://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.8
Revision: 462538 and things seem to work
I have a question about myplugin for indexfilter and queryfilter.
Can u Help me !
-
MoreIndexingFilter.java in add
doc.add(new Field(category, test, false, true, false));
-
--
package
Here's an update on my investigations:
I face this problem for quite a while now - and it seems to be that there is
a correlation to the xls file format plugin. Each time the thread seems to
get stuck parsing xls.
-Original Message-
From: Jacob Brunson [mailto:[EMAIL PROTECTED]
Sent:
23 matches
Mail list logo