Hi,
I don't know the howto you're referring to but I think it belongs to an older
version of Nutch.
Let me try to explain...
doc.add(key,value) - adds a new field to the document doc with the name
key and the value value. With that knowledge the indexer just knows there
is another field to
Hi Jesse,
I'm not sure what you're trying to achieve. Do you want to use the distributed
search or do you want to split an existing index? None of these tasks is the
prerequisite for the other.
If you want to split an index, there are several ways to do this. Which way to
choose depends on the
Mitia Notaras schrieb:
Hi there,
The two event search engines I found are down :
betherebesquare.com
and
BusyTonight.com
I would like your advice :
Is it difficult to build one?
I guess it depends on the details of the requirements. Do you have a
requirements sheet?
I have knowledge of web
Ok, I will paraphrase the question.
Consider I want to use distributed search using 3 servers: one primary and
two secondary nodes.
I create single BIG index using distributed crawler using other computers.
Now I want to split this single BIG index on two parts to put on the search
nodes.
How
Polish Web sites use Cp1250 (windows-1250) or iso8859-2 (or UTF-8 of
course). Check if diacritics like these:
ęółąśćżń
look all right in the above encodings and use appropriately.
Dawid
On Wed, Sep 16, 2009 at 4:47 PM, MilleBii mille...@gmail.com wrote:
same thing when there is
Exactly! sorry for being so confusing in my original question.
Jesse
int GetRandomNumber()
{
return 4; // Chosen by fair roll of dice
// Guaranteed to be random
} // xkcd.com
On Wed, Sep 23, 2009 at 4:45 AM, Alexander Aristov
alexander.aris...@gmail.com wrote:
Ok, I
At last someone answers.
Correct CP1250.
My pages look fine in the browsers of course, but it does not mean Nutch
handles them properly.
What I'm wondering is if the the nutch HTML parser reads them properly,
because when I do a search on such characters it fails on pages iso8859-2 or
cp1250, but
Hi:
I´m following the steps to run Nucth 1.0 release with Eclipse and Windows
described in this link
http://wiki.apache.org/nutch/RunNutchInEclipse1.0
I´m trying to build it, but when I launch the war target I have this error
C:\ECLIPSE321\workspace\nutch-1.0\build.xml:62: Specify at least one
hi, thank you for your answer...
i was talking about this howto :
CreateNewFilter
Howto
add a category metadata to your index and be able to search for it. For
this, you need to write an indexing filter and a query filter.
Indexing your custom metadata
For the
indexing filter, copy the
Hi,
the howtos you're referring to are for Nutch 0.9. In Nutch 1.0 the indexing
system changed a little bit.
If you look at the index-basic or index-more plugin you see that the doc.add
method changed.
It's no longer doc.add(new Field(category, puppies, false, true, false)) -
here you create
yes i saw the differences and i wrote my index-cutom as the index-more plugin
(nutch-1.0).
but guess u right !! i didnt use the addFiledOptions method to add my custom
fileds information ...
so if i will add them in this method.. so for the parser i have to see first
how is made the htmlparser
I had the same little big problem - everything seemed OK:
- bin/nutch org.apache.nutch.searcher.NutchBean search query ... [in my
case search query = apache] in cygwin returns 62 Total hits on cawled
+^http://([a-z0-9]*\.)*apache.org/
- Nutch in Tomcat webapp after deploy seemed fine (no
Can you provide the HTTP headers and HEAD of the HTML of a Web page
for which Nutch fails? Perhaps there is an inconsistency between HTTP
and META headers or a mispelled codepage? Just a wild guess, but
believe me -- Java does convert fine between Cp1250, Iso8859-2 and
internal UTF-16 so there
13 matches
Mail list logo