Hi
I think in the url-filter it uses contain rather than match.
/Jack
On 2/23/06, Elwin [EMAIL PROTECTED] wrote:
# accept hosts in MY.DOMAIN.NAME
+^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/
Will this pattern accept url like this http://MY.DOMAIN.NAME/([a-z0-9]*\.)*/?
I think it's not, but in
Am 23.02.2006 um 01:55 schrieb sudhendra seshachala:
Is there a way I could use HTTrack for crawling and nutch for just
searching?
Has any body done this before andcomparision between crawlers.
I suggest take a look to lucene, since I guess it is more work
changing nutch to your
Hi Wong,
take a look to:
http://lucene.apache.org/java/docs/api/org/apache/lucene/index/
IndexReader.html
There are many code snippets in the net that will show you how you
can use it.
In general I found the book lucene in action a useful guide when
working with lucene.
Stefan
Am
Oh I have asked a silly question about regex, hehe.
2006/2/23, Jack Tang [EMAIL PROTECTED]:
Hi
I think in the url-filter it uses contain rather than match.
/Jack
On 2/23/06, Elwin [EMAIL PROTECTED] wrote:
# accept hosts in MY.DOMAIN.NAME
+^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/
Will
Hi Daniel,
thanks we still working on it.
Actually we have to finish something behind the sense and than we
will publish a kind of plugin extension point that will allow other
people to contribute.
Thanks for the offer, may be the only thing you can do is to vote for
this issue since this
If you look at the section of the tutorial for doing intranet
crawls, you should be able to use that for your small number of
websites. The bin/nutch script wraps up all the crawl functions for you
(fetching, indexing, deduping, etc). You'll just need to delete the
results of your
Thanks Jacob, for the help.
It is a pity the results of the previous crawl must be removed.
Specially because it's a problem to restart the container (JBoss, in my
case).
Is this a feature inherited from lucene? Or maybe this will be improved
in the future?
Thanks again.
En/na Vanderdray,
If u tell me the exact version of nutch which you are using ,I can suggest
modifications based on that
Rgds
Prabhu
On 2/23/06, Dima Mazmanov [EMAIL PROTECTED] wrote:
Hi Raghavendra,
I tried to compile SWFParser.java and got following errors
compile:
[echo] Compiling plugin: parse-swf
That issue gets a lot of discussion on this list and some folks
have come up with their own workarounds. Those generally involve
different implementations of the search bean. I haven't heard of any
definitive solution for the next release.
Jake.
-Original Message-
From: Sugra
Hi,
I have added on my HTML-Pages two meta tags for the language and a category
(news, articles,...) of the page.
meta name=dc.language content=en /
And an meta tag for an categorie:
meta name=dc.category content=news /
Who can I buildt an search query and get the hits for example:
Find all
It's incredible... :(
I tried also to change port, to change jdk version, to change tomcat
version, to add
servlet api in mine classpath, but I receive all the time the
stackOverflowError...
The installation instructions are too simple...but what can I do wrong???
Look at the
Is it possible to manage severals NutchConf in only one webapp ?
My situation :
Inside my webapp, I can manage severals web sites with differents urls.
I want to have a search engine per web site, so I need to have
one NutchBean per web site and one NutchConf per web site.
I have
My first thought is that the page is forwarding to itself. That would create
an infinite loop and cause a stack overflow. Is the language not set
somehow? That would make it forward forever.
Thanks,
Steve Betts
[EMAIL PROTECTED]
937-477-1797
-Original Message-
From: Top100Forever
Hi,Raghavendra.
I have nutch-0.7 release without any modifications in code.
You wrote 23 февраля 2006 г., 19:21:17:
If u tell me the exact version of nutch which you are using ,I can suggest
modifications based on that
Rgds
Prabhu
On 2/23/06, Dima Mazmanov [EMAIL PROTECTED] wrote:
Hi
Hi,Raghavendra.
/usr/local/nutch/src/plugin/parse-swf/src/java/org/apache/nutch/parse/swf/SWFParser.java:108:
cannot resolve symbol
[javac] symbol : constructor ParseData
(org.apache.nutch.parse.ParseStatus,java.lang.String,
org.apache.nutch.parse.Outlink[],
You can follow the tutorial at
http://wiki.apache.org/nutch/WritingPluginExample. Just replace
recommended with category, and it will show you what to do.
(I just implemened a category filter this way ...)
Rgrds, T.
On 2/23/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
Hi,
I have added on
I have attached a new parser
I have changed ContentProperties to Properties
Probably your nutch version uses that. I did not check for compilaton
Please look at other parsers and do the same It shud be easy to spot it .
Hope this helps
All the best
Rgds
Prabhu
On 2/23/06, Dima Mazmanov [EMAIL
Ouch, I found the error...this instruction not work properly:
String language =
ResourceBundle.getBundle(org.nutch.jsp.search, request.getLocale())
.getLocale().getLanguage();
language, after this instruction, is , so for now, I have correct with
this code:
- Original Message
Ouch, I found the error...this instruction not work properly:
String language =
ResourceBundle.getBundle(org.nutch.jsp.search, request.getLocale())
.getLocale().getLanguage();
language, after this instruction, is , so for now, I have correct with
this code:
language = en;
Why language
protocol-httpclient uses this value fetcher.threads.fetch
protocol-http does not use fetcher.threads.fetch
Should not this value be used by protocol-http,protocol-file,protocol-ftp
Is this the protocol files which act as the control for this value?
Or
The fetcher module? Fetcher.java
Any light
IS invertlinks supported or not ? I am using nutch 0.7.1. I am getting no class
def found error. or should I use a compiled version.. Can some help me here ?
Whole-web: Indexing Before indexing we first invert all of the links, so that
we may index incoming anchor text with the pages.
One difference you'll want is to change the plugin.xml file so
that your query filter gets used just for the fields you're interested
in. Instead of fields=DEFAULT in the example, you'll want
raw-fields=language and raw-fields=category. Assuming you name the
fields language and category
Okie
I am new to this topic
But do u add metatags to a particular field
if so shud not that field also appear as in the field path
The normal nutch maybe does not look at that field at all ? Maybe this is
the reason ?
Unless you give the metadatafield and search for the keyword
Rgds
Prabhu
I'm not sure I understand what you're getting at. In this case
I've added a comma separated list of names of meta tags that I want to
index and search against. I've written a parse filter, an index filter
and this query filter that all read in that list of meta tags from the
Hey
One simplest way is copy BasicQueryFilter class and rename it, then
modify the FIELDS/FIELD_BOOSTS by replacing them with you meta tags
from nutch config. And don't forget the configuration in your query
filter's plugin.xml.
Good luck!
/Jack
On 2/24/06, Vanderdray, Jacob [EMAIL PROTECTED]
P.S. Now finally i could test nutch...:)
Puhh, that was a pain! :-) Welcome!
Puhh, that was a pain! :-) Welcome!
Ups I hit the send button to fast. :-/
Before people may miss understand that, 'welcome' mean 'welcome to
nutch'.
'Welcome' in german means in any case 'someone welcome to something',
sorry.
This should be possible with the latest version nutch 0.8 you may
need build from sources.
There nutchConf is not static anymore and you can pass it down the
stack.
Beside that you my need to store the Nutchbean not as context
attribute but in a hashmap that is stored as content attribute.
No.
Am 22.02.2006 um 22:29 schrieb Saravanaraj Duraisamy:
Hi,
in nutch 0.7.1 is there a way to stop indexing with out corrupting
the index
files in the middle of indexing???
thanks
d.saravanaraj
-
blog: http://www.find23.org
company:
I am running cygwin (I know), with jdk1.5.0 and tomcat 4.1
From cygwin I run:
bin/nutch crawl urls -dir crawlresults/ -depth 2 - topN 1000
results:
run java in C:/program files/java/jdk1.5.0/
060223 123010 parsing file:/c:/cygwin/home/falieson/nutch/conf/nutch-
default.xml
060223 123010
Hi,
- Is there any way to perform form based authentication? I
know
that this is a common request but I haven’t found a “good-enough”
answer to
it. The only references I’ve found are about basic auth, which I’d
prefer to
avoid. I ask this because I’ve noticed that SearchBlox,
Hi Florian,
Where is your urls file located?. If you created urls in the conf folder
then you have to call:
bin/nutch crawl conf/urls -dir crawlresults/ -depth 2 - topN 1000
Good luck
Detlev
I am running cygwin (I know), with jdk1.5.0 and tomcat 4.1
From cygwin I run:
bin/nutch crawl urls
I made a urls file, yah didn't realize that was waht crawl referred to. I thought it would simply grab the url from the conf\urlfilter.txt file.bin/nutch crawl urls -dir crawled -depth 2
crawl.logWell at least now it ran, but with zero results. file urls
Don't worry, I understood what do you meant :)
But, what is the reason of this kind of problem?
Why nutch is not capable of select the language?
Do you have some idea?
My solution is not the best... :)
At now I'm studying nutch architecture...
I need, for my purposes, a search engine that give
Don't worry, I understood what do you meant :)
Sorry my english is too often just terrible I'm trying to improve it,
I feel people to often misunderstand me.
Anyway I guess and hope my java is much better. :-)
But, what is the reason of this kind of problem?
Why nutch is not capable of select
The latest version I could see in the SVN is 0.7.1,
Where can I get 0.8., source code is even better.
Could I just grab from nightly builds ?
Please let me know..
Thanks
Sudhi Seshachala
http://sudhilogs.blogspot.com/
http://cvs.apache.org/dist/lucene/nutch/nightly/
Am 24.02.2006 um 01:44 schrieb sudhendra seshachala:
The latest version I could see in the SVN is 0.7.1,
Where can I get 0.8., source code is even better.
Could I just grab from nightly builds ?
Please let me know..
Thanks
Sudhi
Hi Stefan
The GUI looks great!
My idea is to add ajax tech. to reduce the page reload and show the
job progress in realtime. If contribution is welcome and no one is
working on this, I'd like to take this.
Regards
/Jack
On 2/23/06, Stefan Groschupf [EMAIL PROTECTED] wrote:
Hi Daniel,
thanks
I have been getting this exception during fetching for almost a month. This
exception stops the whole crawl. It happens on and off! Any Idea?? We are
really stocked with this problem.
I am using 3 data node and 1 name server.
060223 173809 task_m_b8ibww fetching
On 2/24/06, Stefan Groschupf [EMAIL PROTECTED] wrote:
Hi Jack,
The GUI looks great!
I will forward this to Frank Henze he had done the design and sample. :)
Thanks. I'll prepare some utility and debug javascript classes from now on:)
My idea is to add ajax tech. to reduce the page reload
Thanks Stefan.
But when I compiled, the jar size was just 318kB for 0.8-dev where as the 0.7.1
release was 718KB.
Am I missing something ?
Sudhi
Stefan Groschupf [EMAIL PROTECTED] wrote:
http://cvs.apache.org/dist/lucene/nutch/nightly/
Am 24.02.2006 um 01:44 schrieb sudhendra seshachala:
On 2/24/06, sudhendra seshachala [EMAIL PROTECTED] wrote:
Thanks Stefan.
But when I compiled, the jar size was just 318kB for 0.8-dev where as the
0.7.1 release was 718KB.
Am I missing something ?
I guess no.
All classes about mapreduce were sperated from nutch and hosted in hadoop proj.
Hi,
i'm using nutch-2006-02-22.tar.gz(release 0.8) to nutch web.
but when i run
bin/nutch crawl seeds -dir cnblogs -depth 3
command, i always got negative map progress?!
just like this:
060224 135430 seeds\urls.txt:0+23
060224 135431 seeds\urls.txt:0+23
060224 135432 map -32509% reduce 0%
Exception in thread main java.lang.NoClassDefFoundError:
org/apache/lucene/st ore/FSDirectory
^^^ Why one blank here?
On 2/24/06, Wong Ting Kiong [EMAIL PROTECTED] wrote:
hi,
I had tried some java codes calling lucene lib lucene-1.9-rc1-dev.jar, but
got error, my
Hi
The code which you sent is only for query-filter
In the parse-filter and especially in index-fitler , do u add it to any new
field which you define??
What i do is any data which i want to have ,i store it in a new field
(created by me)
I guess the index-filter must be storing it in such
45 matches
Mail list logo