Re: need your support

2010-01-20 Thread Mattmann, Chris A (388J)
Hi Sahar, Can you post your: 1. crawl-urlfilter 2. nutch-site.xml Also how are you running this program below? I'm CC'ing nutch-user@ so the community can benefit from this thread. Cheers, Chris On 1/20/10 1:42 PM, sahar elkazaz saharelka...@hotmail.com wrote: Dear/ sirur I have

Re: OR support

2009-12-14 Thread BrunoWL
Nobody? Please, any answer would good. -- View this message in context: http://old.nabble.com/OR-support-tp26680899p26779229.html Sent from the Nutch - User mailing list archive at Nabble.com.

Re: OR support

2009-12-14 Thread Andrzej Bialecki
On 2009-12-14 16:05, BrunoWL wrote: Nobody? Please, any answer would good. Please check this issue: https://issues.apache.org/jira/browse/NUTCH-479 That's the current status, i.e. this functionality is available only as a patch. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _

OR support

2009-12-07 Thread BrunoWL
Hi! Did anybody added the search with or operator in the nutch1.0 successfully? i found a patch for the 0.9 version, but doesn't work. thanks. -- View this message in context: http://old.nabble.com/OR-support-tp26680899p26680899.html Sent from the Nutch - User mailing list archive

support for robot rules that include a wild card

2009-11-19 Thread J.G.Konrad
I'm using nutch-1.0 and have noticed after running some tests that the robot rules parser does not support wildcard (a.k.a globbing) in rules. This means the rule will not work like it was expected to by the person who wrote the robots.txt file.  For example User-Agent: * Disallow: /somepath

Re: support for robot rules that include a wild card

2009-11-19 Thread Ken Krugler
Hi Jason, I've been spending some time on an improved robots.txt parser, as part of my Bixo project. One aspect is support for Google wildcard extensions. I think this will be part of the proposed crawler-commons project where we'll put components that can/should be shared between Nutch

Re: Multilanguage support in Nutch 1.0

2009-09-30 Thread David Jashi
-httpclient, but be aware of possible intermittent problems with the  underlying commons-httpclient library.  /description /property From: da...@jashi.ge Date: Tue, 29 Sep 2009 18:59:52 +0400 Subject: Multilanguage support in Nutch 1.0 To: nutch-user@lucene.apache.org Hello, all. I've got

Re: Multilanguage support in Nutch 1.0

2009-09-30 Thread David Jashi
On Wed, Sep 30, 2009 at 01:12, BELLINI ADAM mbel...@msn.com wrote: hi try to activate the language-identifier plugin you must add it in the nutch-site.xml file in the   nameplugin.includes/name section. Ooops. It IS activated. 2009-09-29 16:39:15,671 INFO plugin.PluginRepository -

RE: Multilanguage support in Nutch 1.0

2009-09-30 Thread BELLINI ADAM
, 30 Sep 2009 17:22:26 +0400 Subject: Re: Multilanguage support in Nutch 1.0 To: nutch-user@lucene.apache.org On Wed, Sep 30, 2009 at 01:12, BELLINI ADAM mbel...@msn.com wrote: hi try to activate the language-identifier plugin you must add it in the nutch-site.xml file

Multilanguage support in Nutch 1.0

2009-09-29 Thread David Jashi
Hello, all. I've got a bit of a trouble with Nutch 1.0 and multilanguage support: I have fresh install of Nutch and two analysis plugins I'd like to turn on: analysis-de (German) and analysis-ge (Georgian) Here are the innards of my seed file: --- http://212.72.133.54/l

RE: Multilanguage support in Nutch 1.0

2009-09-29 Thread BELLINI ADAM
library. /description /property From: da...@jashi.ge Date: Tue, 29 Sep 2009 18:59:52 +0400 Subject: Multilanguage support in Nutch 1.0 To: nutch-user@lucene.apache.org Hello, all. I've got a bit of a trouble with Nutch 1.0 and multilanguage support: I have fresh install of Nutch and two

Development support

2009-07-28 Thread Koch Martina
Hi, we're looking for a Nutch developer to implement some plugins for us in the next few weeks. Substantial knowledge in Nutch, Java and Databases is needed. If yor're interested, please contact me (koch at huberverlag dot de) Thanks in advance, Martina

Re: Support needed

2009-07-28 Thread Sudhi Seshachala
As a very old nutch user an developer of plugins and even implemented nutch in some products - I could help you. I am based in Houston, Texas -- skype me on hooduku sudhi --- On Mon, 7/27/09, sf30098 sf30...@yahoo.com wrote: From: sf30098 sf30...@yahoo.com Subject: Support needed To: nutch

Support needed

2009-07-27 Thread sf30098
about implementing such system.. This includes: 1. replying questions and providing guidance in implementation 2. reviewing codes and providing suggestions as to how to improve. Please let me know if you're interested. -- View this message in context: http://www.nabble.com/Support-needed

Multi-Lingual Support in Nutch

2009-04-13 Thread Kunal Wku
Hello, I am using Nutch 0.9. I would like to enable multi-lingual support in our existing system. I read the article on Multi-Lingual Support in Nutch by Jérôme Charron. But it is about the previous versions of Nutch. I included the plugin in Nutch-Site.xml as analysis-es. What are the other

Professional Nutch Support and Distribution

2009-03-17 Thread Dennis Kubes
Wanted to gauge community interest in having a certified Nutch distribution with support? Similar to what Lucid Imagination is doing for Solr and Lucene and what Cloudera is providing for Hadoop. Anybody interested? Dennis

Re: Professional Nutch Support and Distribution

2009-03-17 Thread Marc Boucher
This sounds interesting. I might be interested in this. Marc Boucher http://hyperix.com On Tue, Mar 17, 2009 at 12:31 PM, Dennis Kubes ku...@apache.org wrote: Wanted to gauge community interest in having a certified Nutch distribution with support?  Similar to what Lucid Imagination is doing

Does Nutch support the boolean OR operator in a search query?

2009-01-19 Thread M S Ram
Hi, Does Nutch support the boolean OR operator (or something similar) in a search query? I mean is there any class already available to do this? The Nutch search interface doesn't seem to have this option. Expcted functionality: If I ask it to search for (Post Graduate) OR (Masters

Re: Does Nutch support the boolean OR operator in a search query?

2009-01-19 Thread Doğacan Güney
Hi, On Mon, Jan 19, 2009 at 4:02 PM, M S Ram ms...@cse.iitk.ac.in wrote: Hi, Does Nutch support the boolean OR operator (or something similar) in a search query? I mean is there any class already available to do this? The Nutch search interface doesn't seem to have this option. Expcted

Re: Does Nutch support the boolean OR operator in a search query?

2009-01-19 Thread M S Ram
: Hi, Does Nutch support the boolean OR operator (or something similar) in a search query? I mean is there any class already available to do this? The Nutch search interface doesn't seem to have this option. Expcted functionality: If I ask it to search for (Post Graduate) OR (Masters), it should

Re: Does Nutch support the boolean OR operator in a search query?

2009-01-19 Thread Lyndon Maydwell
Lucene has support for OR queries, so it should be possible to do it, but support for this in nutch isn't available as far as I know. I'd also be intersted if anyone has managed to implement this. On Tue, Jan 20, 2009 at 1:50 AM, M S Ram ms...@cse.iitk.ac.in wrote: Oh! That's sad! :( What

does nutch support crawling cold fusion pages?

2008-12-08 Thread Alex Basa
Hi, Does anyone know if there is a plugin for cold fusion pages or if it's supported? I'm trying to crawl http://www.knowitall.org/naturalstate Thanks in advance, Alex

What kind of searches does Nutch support?

2008-05-04 Thread Miao Liqiang NCS
What kind of searches does Nutch support?

Missing zh.ngp for zh locate support for language Identifier

2008-03-15 Thread Vinci
Hi all, I found there is missing zh.ngp for zh locate. I have seen this file via a screenshot and then I googled the filename return nothing for me...can anyone provide this file for me? Thank you -- View this message in context: http://www.nabble.com/Missing-zh.ngp-for-zh-locate-support

Support Hardware and OS for nutch and hadoop

2008-01-04 Thread Developer Developer
Hello Frens, I am gathering information on supoorted hardware and OS for nutch and hadoop . I did not find any conclusive information by going thru Nutch wiki. If I want to build a cluster of nodes using nutch/hadoop for crawling then what are my options for H/W and OS ?

Prefix Query in Nutch and Wildcard support.

2008-01-03 Thread Developer Developer
Hello Frens, Is there anyway to do prefix query in Nutch ? Eg Query the content field for the occurance of abc* ? I could do it in Lucene, but i want to do it in nuthch . Going through the mialing list it appeared that Nutch does not support such queries. Is it ture ? Thanks !

Re: NUTCH-479 Support for OR queries - what is this about

2007-07-09 Thread Briggs
: * to avoid the need to support low-level index and searcher operations, which the Lucene API would require us to implement. * to keep the Nutch core largely independent of Lucene, so that it's possible to use Nutch with different back-end searcher implementations. This started to materialize only

Re: NUTCH-479 Support for OR queries - what is this about

2007-07-07 Thread Briggs
[EMAIL PROTECTED] wrote: I've been reading up on NUTCH-479 Support for OR queries but I must be missing something obvious because I don't understand what the JIRA is about: https://issues.apache.org/jira/browse/NUTCH-479 Description: There have been many requests from users to extend

Re: NUTCH-479 Support for OR queries - what is this about

2007-07-07 Thread Andrzej Bialecki
actually almost nothing to do with the scoring filters (which were added much later). The decision to use a different query syntax than the one from Lucene was motivated by a few reasons: * to avoid the need to support low-level index and searcher operations, which the Lucene API would require us

NUTCH-479 Support for OR queries - what is this about

2007-07-06 Thread Kai_testing Middleton
I've been reading up on NUTCH-479 Support for OR queries but I must be missing something obvious because I don't understand what the JIRA is about: https://issues.apache.org/jira/browse/NUTCH-479 Description: There have been many requests from users to extend Nutch query syntax to add

How best to add sponsored link support..??

2006-12-19 Thread RP
Hi all, I've been tasked with looking into this and am not a coder - that said, Nutch is doing great and the bean counters have asked me to look into adding sponsored link results and I'm wondering how best to add this. It would be nice to utilize the Nutch engine to come up with the pages

Re: How best to add sponsored link support..??

2006-12-19 Thread RP
, 2006 10:52:56 AM Subject: How best to add sponsored link support..?? Hi all, I've been tasked with looking into this and am not a coder - that said, Nutch is doing great and the bean counters have asked me to look into adding sponsored link results and I'm wondering how best to add

Re: How best to add sponsored link support..??

2006-12-19 Thread Sami Siren
advertising such as Google Ads. Sean - Original Message From: RP [EMAIL PROTECTED] To: nutch-user@lucene.apache.org Sent: Tuesday, December 19, 2006 10:52:56 AM Subject: How best to add sponsored link support..?? Hi all, I've been tasked with looking into this and am

Re: How best to add sponsored link support..??

2006-12-19 Thread RP
PROTECTED] To: nutch-user@lucene.apache.org Sent: Tuesday, December 19, 2006 10:52:56 AM Subject: How best to add sponsored link support..?? Hi all, I've been tasked with looking into this and am not a coder - that said, Nutch is doing great and the bean counters have asked me to look

Re: Lucene query support in Nutch

2006-10-10 Thread Stefan Neufeind
Cristina Belderrain wrote: On 10/9/06, Tomi NA [EMAIL PROTECTED] wrote: This is *exactly* what I was thinking. Like Stefan, I believe the nutch analyzer is a good foundation and should therefore be extended to support the or operator, and possibly additional capabilities when the need

Re: Lucene query support in Nutch

2006-10-10 Thread Tomi NA
2006/10/10, Cristina Belderrain [EMAIL PROTECTED]: On 10/9/06, Tomi NA [EMAIL PROTECTED] wrote: This is *exactly* what I was thinking. Like Stefan, I believe the nutch analyzer is a good foundation and should therefore be extended to support the or operator, and possibly additional

Re: Lucene query support in Nutch

2006-10-10 Thread Bill Goffe
Tomi said: In conclusion, my position is pragmatic: I welcome the simplest solution to implement the or search. I just believe that it'd be easiest to do that extending the nutch Analyzer. This seems like a very reasonable approach. I too would very much like OR. It would also be nice if it

Re: Lucene query support in Nutch

2006-10-09 Thread Tomi NA
-syntax? As has just been pointed out: It This is *exactly* what I was thinking. Like Stefan, I believe the nutch analyzer is a good foundation and should therefore be extended to support the or operator, and possibly additional capabilities when the need arises. t.n.a.

Re: Lucene query support in Nutch

2006-10-09 Thread Cristina Belderrain
On 10/9/06, Tomi NA [EMAIL PROTECTED] wrote: This is *exactly* what I was thinking. Like Stefan, I believe the nutch analyzer is a good foundation and should therefore be extended to support the or operator, and possibly additional capabilities when the need arises. t.n.a. Tomi, why would

Re: Lucene query support in Nutch

2006-10-07 Thread Cristina Belderrain
Hello, I just would like to confirm that the version of the search() method shown in the previous post works fine, at least regarding boolean queries. Anyway, I see no reason why it wouldn't work with any other Lucene query (fuzzy, proximity, etc.). Now, please be warned that the inclusion of

Re: Lucene query support in Nutch

2006-10-07 Thread Björn Wilmsmann
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, Am 07.10.2006 um 17:40 schrieb Cristina Belderrain: Let me remind you that all this must be done just to provide something that's already there: Nutch is built on top of Lucene, after all. If it's hard to understand why Lucene's capabilities

Re: Lucene query support in Nutch

2006-10-07 Thread Sami Siren
Nevertheless, I agree that there should be an option to choose the Lucene query engine instead of the Nutch flavour one because Nutch has been proven to be equally suitable for areas which do not require as efficient queries (like intranet crawling for instance) as an all-out web indexing

Re: Lucene query support in Nutch

2006-10-07 Thread Stefan Neufeind
Björn Wilmsmann wrote: Am 07.10.2006 um 17:40 schrieb Cristina Belderrain: Let me remind you that all this must be done just to provide something that's already there: Nutch is built on top of Lucene, after all. If it's hard to understand why Lucene's capabilities were simply neutralized

Re: Lucene query support in Nutch

2006-10-05 Thread Stefan Neufeind
Hi, yes, I guess having the full strength of Lucene-based queries would be nice. That would as well solve the boolean queries-question I had a few days ago :-) Ravi, doesn't Lucene also allow querying of other fields? Is there any possibility to add that feature to your proposal? In general:

Re: Lucene query support in Nutch

2006-10-05 Thread Björn Wilmsmann
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi everybody, On 05/10/2006 05:44 Ravi Chintakunta wrote: public Hits search(String queryString, int numHits, String dedupField, String sortField, boolean reverse) throws IOException {

Re: Lucene query support in Nutch

2006-10-05 Thread Cristina Belderrain
Hi Björn, yes, the error you point out will happen indeed... A possible workaround would be: public Hits search(String queryString, int numHits, String dedupField, String sortField, boolean reverse) throws IOException { org.apache.lucene.queryParser.QueryParser parser =

OpenOffice Support?

2006-07-11 Thread Matthew Holt
Just wondering, has anyone done any work on a plugin (or aware of a plugin) that supports the indexing of open office documents? Thanks. Matt

Re: OpenOffice Support?

2006-07-11 Thread Lourival Júnior
Using to advantage your question, anyone knows if the version 0.7.2 of nutch supports the zip plugin? If so, where can I find it? Lourival Junior On 7/11/06, Matthew Holt [EMAIL PROTECTED] wrote: Just wondering, has anyone done any work on a plugin (or aware of a plugin) that supports the

Re: Add Wyona to the wiki support page?

2006-06-21 Thread Andrzej Bialecki
Renaud Richardet wrote: Hello Nutch, My name is Renaud Richardet and I am the COO of Wyona LLC. We are offering Nutch and Lucene support (http://wyona.com/lucene.html), and I was wondering if I could add our company to http://wiki.apache.org/nutch/Support. That would be great. Certainly

Re: Add Wyona to the wiki support page?

2006-06-21 Thread Insurance Squared Inc.
obey the nofollow tags? g. Andrzej Bialecki wrote: Renaud Richardet wrote: Hello Nutch, My name is Renaud Richardet and I am the COO of Wyona LLC. We are offering Nutch and Lucene support (http://wyona.com/lucene.html), and I was wondering if I could add our company to http

Re: Add Wyona to the wiki support page?

2006-06-21 Thread Andrzej Bialecki
Insurance Squared Inc. wrote: The funny thing about that wiki page (and some others in that area) is that they apparently use the nofollow tags. Given the topic of that wiki, isn't that a bit odd? I personally dislike the nofollow tag and think it should be used only in extreme circumstances

Re: Add Wyona to the wiki support page?

2006-06-21 Thread Insurance Squared Inc.
Well so much for knee-jerk suspicions as to intent. No need to look for conspiracy theories when default settings are more likely to be the cause. That should probably a corollary to occam's razor or something :). Andrzej Bialecki wrote: Insurance Squared Inc. wrote: The funny thing

Re: Full fledged Lucene Query Syntax support in Nutch

2006-05-04 Thread Ravi Chintakunta
. Is there a reason that Nutch does not support the entire Lucene query syntax by default? Thanks in advance, Ravi Chintakunta

Full fledged Lucene Query Syntax support in Nutch

2006-05-02 Thread Ravi Chintakunta
. We have to modify the analyzer and add more plugins to Nutch to use the Lucene's query syntax. Or we have to directly use Lucene's Query Parser. I tried the second approach by modifying org.apache.nutch.searcher.IndexSearcher and that seems to work. Is there a reason that Nutch does not support

Re: Full fledged Lucene Query Syntax support in Nutch

2006-05-02 Thread Herman Hardenbol
Sorry, I am on holiday until the 8th of May. Please contact the [EMAIL PROTECTED] for urgent matters. Kind regards, Herman.

HTTPS support?

2006-03-06 Thread David Odmark
Hi, Does Nutch 0.8 support https fetches? If not, are there any active efforts to support it? TIA, David Odmark

Re: HTTPS support?

2006-03-06 Thread Andrzej Bialecki
David Odmark wrote: Hi, Does Nutch 0.8 support https fetches? If not, are there any active efforts to support it? It does, using protocol-httpclient plugin. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information

Nutch doesn't support Korean?

2006-03-03 Thread Teruhiko Kurosaka
I was browing NutchAnalysis.jj and found that Hungul Syllables (U+AC00 ... U+D7AF; U+ means a Unicode character of the hex value ) are not part of LETTER or CJK class. This seems to me that Nutch cannot handle Korean documents at all. Is anybody successfully using Nutch for Korean?

Re: Nutch doesn't support Korean?

2006-03-03 Thread Cheolgoo Kang
Hello, There was similar issue with Lucene's StandardTokenizer.jj. http://issues.apache.org/jira/browse/LUCENE-444 and http://issues.apache.org/jira/browse/LUCENE-461 I'm have almost no experience with Nutch, but you can handle it like those issues above. On 3/4/06, Teruhiko Kurosaka [EMAIL

xquery support for nutch

2006-02-20 Thread Raghavendra Prabhu
Hi It would be great if we provide xquery support to nutch where expressions like 3 + 4=7 would be evaluated. http://www.xml.com/pub/a/2002/10/16/xquery.html It is just an idea and probably would make it a universal tool Rgds Prabhu

Single NutchBean and multiple indices support

2006-02-15 Thread Jack Tang
opened. Then I call closeSegments() after each search. I realise that NutchBean really wasn't designed to support being instantiated once per search, but I don't care. It works well, and performance is not an issue. Regards, David. Date: Mon, 6 Feb 2006 20:59:34 -0500 From: Ravi

Re: Single NutchBean and multiple indices support

2006-02-15 Thread Ravi Chintakunta
to FetchSegments.Segment in my installation, to close all the readers. I added a closeSegments() method to NutchBean, to call close() on each segment that's been opened. Then I call closeSegments() after each search. I realise that NutchBean really wasn't designed to support being instantiated

Re: Which version of rss does parse-rss plugin support?

2006-02-10 Thread Chris Mattmann
- From: 盖世豪侠 [mailto:[EMAIL PROTECTED] Sent: Saturday, February 04, 2006 11:40 PM To: nutch-user@lucene.apache.org Subject: Re: Which version of rss does parse-rss plugin support? Hi Chris How do I change the plugin.xml? For example, if I want to crawl rss files end with xml, just add a new

Re: Which version of rss does parse-rss plugin support?

2006-02-10 Thread Elwin
of either NASA, JPL, or the California Institute of Technology. -Original Message- From: 盖世豪侠 [mailto:[EMAIL PROTECTED] Sent: Saturday, February 04, 2006 11:40 PM To: nutch-user@lucene.apache.org Subject: Re: Which version of rss does parse-rss plugin support? Hi Chris How do

opensearch support

2006-02-07 Thread Geraint Williams
Is OpenSearch being developed? I am using nutch 0.7 and it seems to have some opensearch support. However, I failed to get either a python or perl opensearch client library (admittedly these are also in early development). The perl library seemed to choke at not finding

RE: Which version of rss does parse-rss plugin support?

2006-02-05 Thread Chris Mattmann
those of either NASA, JPL, or the California Institute of Technology. -Original Message- From: 盖世豪侠 [mailto:[EMAIL PROTECTED] Sent: Saturday, February 04, 2006 11:40 PM To: nutch-user@lucene.apache.org Subject: Re: Which version of rss does parse-rss plugin support? Hi Chris

Re: Which version of rss does parse-rss plugin support?

2006-02-05 Thread 盖世豪侠
, or the California Institute of Technology. -Original Message- From: 盖世豪侠 [mailto:[EMAIL PROTECTED] Sent: Saturday, February 04, 2006 11:40 PM To: nutch-user@lucene.apache.org Subject: Re: Which version of rss does parse-rss plugin support? Hi Chris How do I change the plugin.xml

Re: Which version of rss does parse-rss plugin support?

2006-02-04 Thread 盖世豪侠
locally. For web-based crawls, you need to make sure that the content type being returned for your RSS content matches the content type specified in the plugin.xml file that parse-rss claims to support. Note that you might not have * a lot * of success with being able to control the content

Which version of rss does parse-rss plugin support?

2006-02-03 Thread 盖世豪侠
I see the test file is of version 0.91. Does the plugin support higher versions like 1.0 or 2.0? -- 《盖世豪侠》好评如潮,让无线收视居高不下,无线高兴之余,仍未重用。周星驰岂是池中物,喜剧天分既然崭露,当然不甘心受冷落,于是转投电影界,在大银幕上一展风采。无线既得千里马,又失千里马,当然后悔莫及。

Re: Which version of rss does parse-rss plugin support?

2006-02-03 Thread Chris Mattmann
1.0 modules capability... Hope that helps. Thanks, Chris On 2/3/06 6:46 AM, 盖世豪侠 [EMAIL PROTECTED] wrote: I see the test file is of version 0.91. Does the plugin support higher versions like 1.0 or 2.0? -- 《盖世豪侠》好评如潮,让无线收视居高不下,无线高兴之余,仍未重用。周星驰岂是池中物,喜剧天分既 然崭露,当然不甘心受冷落,于是转投电影界,在大银幕上一展风采。无线既

Re: Which version of rss does parse-rss plugin support?

2006-02-03 Thread 盖世豪侠
file is of version 0.91. Does the plugin support higher versions like 1.0 or 2.0? -- 《盖世豪侠》好评如潮,让无线收视居高不下,无线高兴之余,仍未重用。周星驰岂是池中物,喜剧天分既 然崭露,当然不甘心受冷落,于是转投电影界,在大银幕上一展风采。无线既得千里马,又失千里马,当然 后悔莫及。 -- 《盖世豪侠》好评如潮,让无线收视居高不下,无线高兴之余,仍未重用。周星驰岂是池中物,喜剧天分既然崭露,当然不甘心受冷落,于是转投电影界,在大银幕上一展风采。无线既得千里马,又失千里马

Multi CPU support

2006-01-09 Thread Teruhiko Kurosaka
Can I use MapReduce to run Nutch on a multi CPU system? I want to run the index job on two (or four) CPUs on a single system. I'm not trying to distribute the job over multiple systems. If the MapReduce is the way to go, do I just specify config parameters like these:

Re: Multi CPU support

2006-01-09 Thread Doug Cutting
Teruhiko Kurosaka wrote: Can I use MapReduce to run Nutch on a multi CPU system? Yes. I want to run the index job on two (or four) CPUs on a single system. I'm not trying to distribute the job over multiple systems. If the MapReduce is the way to go, do I just specify config parameters

multibyte character support status

2005-12-27 Thread Teruhiko Kurosaka
What is the current state and plan for multibyte character support by Nutch? As far as I can tell... The PDF plugin uses PDFBox (www.pdfbox.org) which does not work with Japanese and probably other multibyte characters and code sets. The Word plugin uses POI (http://jakarta.apache.org/poi

Re: PDF indexing support?

2005-11-16 Thread Håvard W. Kongsgård
Tanks it worked Jérôme Charron wrote: The value you specified is biggest than the maximal int value, so that it return an exception, and then the default value is used. As mentionned in the property's description, use a negative value (-1) for no truncation at all (or a value lesser than

Re: PDF indexing support?

2005-11-16 Thread Hasan Diwan
On Nov 15, 2005, at 2:46 PM, Håvard W. Kongsgård wrote: Don't have a conf/nutch-site.xml Create it and put the overrides in there, per the nutch tutorial. Cheers, Hasan Diwan [EMAIL PROTECTED] PGP.sig Description: This is a digitally signed message part

Re: PDF indexing support?

2005-11-15 Thread Stefan Groschupf
PDF indexing support? Simply by activating the parse-pdf plugin in nutch-default.xml or nutch-site.xml (take a look at the plugin.includes property) Jérôme -- http://motrech.free.fr/ http://www.frutch.org/ - --- No virus

Re: PDF indexing support?

2005-11-15 Thread Håvard W. Kongsgård
conf/nutch-default Jérôme Charron wrote: http.content.limit=542256565536 and file.content.limit=4541165536 still the same error: where do you specify these values? in nutch-default or nutch-site? Jérôme -- http://motrech.free.fr/ http://www.frutch.org/

Re: PDF indexing support?

2005-11-15 Thread Jérôme Charron
conf/nutch-default Checks that they are not overrided in the conf/nutch-site If no, sorry, no more idea for now :-( Jérôme -- http://motrech.free.fr/ http://www.frutch.org/

Re: PDF indexing support?

2005-11-15 Thread Håvard W. Kongsgård
Don't have a conf/nutch-site.xml Jérôme Charron wrote: conf/nutch-default Checks that they are not overrided in the conf/nutch-site If no, sorry, no more idea for now :-( Jérôme -- http://motrech.free.fr/ http://www.frutch.org/

PDF indexing support?

2005-11-14 Thread Håvard W. Kongsgård
Hello I new with nutch how do I enable PDF indexing support?

PDF support? Does crawl parse p

2005-08-31 Thread Diane Palla
expect to support application/pdf types and have such parsing of pdf files available? Diane Palla Web Services Developer Seton Hall University 973 313-6199 [EMAIL PROTECTED] Bryan Woliner [EMAIL PROTECTED] 08/23/2005 05:22 PM Please respond to nutch-user@lucene.apache.org To nutch-user

Re: PDF support? Does crawl parse p

2005-08-31 Thread Piotr Kosiorowski
developers expect to support application/pdf types and have such parsing of pdf files available? Diane Palla Web Services Developer Seton Hall University 973 313-6199 [EMAIL PROTECTED] Bryan Woliner [EMAIL PROTECTED] 08/23/2005 05:22 PM Please respond to nutch-user@lucene.apache.org

Re: metadata support in WebDB (Stefan's NUTCH-59 patch)

2005-07-19 Thread Stefan Groschupf
Hi Otis, http://issues.apache.org/jira/browse/NUTCH-59 This patch looks interesting for my Nutch needs, So please vote for the patch if you like it. :-) I can't look at the code, but looking at your diff, it looks like this metadata would be stored somewhere inside Nutch's WebDB, and that