subject:"OR support"

Re: need your support

2010-01-20 Thread Mattmann, Chris A (388J)

Hi Sahar, Can you post your: 1. crawl-urlfilter 2. nutch-site.xml Also how are you running this program below? I'm CC'ing nutch-user@ so the community can benefit from this thread. Cheers, Chris On 1/20/10 1:42 PM, sahar elkazaz saharelka...@hotmail.com wrote: Dear/ sirur I have

Re: OR support

2009-12-14 Thread BrunoWL

Nobody? Please, any answer would good. -- View this message in context: http://old.nabble.com/OR-support-tp26680899p26779229.html Sent from the Nutch - User mailing list archive at Nabble.com.

Re: OR support

2009-12-14 Thread Andrzej Bialecki

On 2009-12-14 16:05, BrunoWL wrote: Nobody? Please, any answer would good. Please check this issue: https://issues.apache.org/jira/browse/NUTCH-479 That's the current status, i.e. this functionality is available only as a patch. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _

OR support

2009-12-07 Thread BrunoWL

Hi! Did anybody added the search with or operator in the nutch1.0 successfully? i found a patch for the 0.9 version, but doesn't work. thanks. -- View this message in context: http://old.nabble.com/OR-support-tp26680899p26680899.html Sent from the Nutch - User mailing list archive

support for robot rules that include a wild card

2009-11-19 Thread J.G.Konrad

I'm using nutch-1.0 and have noticed after running some tests that the robot rules parser does not support wildcard (a.k.a globbing) in rules. This means the rule will not work like it was expected to by the person who wrote the robots.txt file. For example User-Agent: * Disallow: /somepath

Re: support for robot rules that include a wild card

2009-11-19 Thread Ken Krugler

Hi Jason, I've been spending some time on an improved robots.txt parser, as part of my Bixo project. One aspect is support for Google wildcard extensions. I think this will be part of the proposed crawler-commons project where we'll put components that can/should be shared between Nutch

Re: Multilanguage support in Nutch 1.0

2009-09-30 Thread David Jashi

-httpclient, but be aware of possible intermittent problems with the underlying commons-httpclient library. /description /property From: da...@jashi.ge Date: Tue, 29 Sep 2009 18:59:52 +0400 Subject: Multilanguage support in Nutch 1.0 To: nutch-user@lucene.apache.org Hello, all. I've got

Re: Multilanguage support in Nutch 1.0

2009-09-30 Thread David Jashi

On Wed, Sep 30, 2009 at 01:12, BELLINI ADAM mbel...@msn.com wrote: hi try to activate the language-identifier plugin you must add it in the nutch-site.xml file in the nameplugin.includes/name section. Ooops. It IS activated. 2009-09-29 16:39:15,671 INFO plugin.PluginRepository -

RE: Multilanguage support in Nutch 1.0

2009-09-30 Thread BELLINI ADAM

, 30 Sep 2009 17:22:26 +0400 Subject: Re: Multilanguage support in Nutch 1.0 To: nutch-user@lucene.apache.org On Wed, Sep 30, 2009 at 01:12, BELLINI ADAM mbel...@msn.com wrote: hi try to activate the language-identifier plugin you must add it in the nutch-site.xml file

Multilanguage support in Nutch 1.0

2009-09-29 Thread David Jashi

Hello, all. I've got a bit of a trouble with Nutch 1.0 and multilanguage support: I have fresh install of Nutch and two analysis plugins I'd like to turn on: analysis-de (German) and analysis-ge (Georgian) Here are the innards of my seed file: --- http://212.72.133.54/l

RE: Multilanguage support in Nutch 1.0

2009-09-29 Thread BELLINI ADAM

library. /description /property From: da...@jashi.ge Date: Tue, 29 Sep 2009 18:59:52 +0400 Subject: Multilanguage support in Nutch 1.0 To: nutch-user@lucene.apache.org Hello, all. I've got a bit of a trouble with Nutch 1.0 and multilanguage support: I have fresh install of Nutch and two

Development support

2009-07-28 Thread Koch Martina

Hi, we're looking for a Nutch developer to implement some plugins for us in the next few weeks. Substantial knowledge in Nutch, Java and Databases is needed. If yor're interested, please contact me (koch at huberverlag dot de) Thanks in advance, Martina

Re: Support needed

2009-07-28 Thread Sudhi Seshachala

As a very old nutch user an developer of plugins and even implemented nutch in some products - I could help you. I am based in Houston, Texas -- skype me on hooduku sudhi --- On Mon, 7/27/09, sf30098 sf30...@yahoo.com wrote: From: sf30098 sf30...@yahoo.com Subject: Support needed To: nutch

Support needed

2009-07-27 Thread sf30098

about implementing such system.. This includes: 1. replying questions and providing guidance in implementation 2. reviewing codes and providing suggestions as to how to improve. Please let me know if you're interested. -- View this message in context: http://www.nabble.com/Support-needed

Multi-Lingual Support in Nutch

2009-04-13 Thread Kunal Wku

Hello, I am using Nutch 0.9. I would like to enable multi-lingual support in our existing system. I read the article on Multi-Lingual Support in Nutch by Jérôme Charron. But it is about the previous versions of Nutch. I included the plugin in Nutch-Site.xml as analysis-es. What are the other

Professional Nutch Support and Distribution

2009-03-17 Thread Dennis Kubes

Wanted to gauge community interest in having a certified Nutch distribution with support? Similar to what Lucid Imagination is doing for Solr and Lucene and what Cloudera is providing for Hadoop. Anybody interested? Dennis

Re: Professional Nutch Support and Distribution

2009-03-17 Thread Marc Boucher

This sounds interesting. I might be interested in this. Marc Boucher http://hyperix.com On Tue, Mar 17, 2009 at 12:31 PM, Dennis Kubes ku...@apache.org wrote: Wanted to gauge community interest in having a certified Nutch distribution with support? Similar to what Lucid Imagination is doing

Does Nutch support the boolean OR operator in a search query?

2009-01-19 Thread M S Ram

Hi, Does Nutch support the boolean OR operator (or something similar) in a search query? I mean is there any class already available to do this? The Nutch search interface doesn't seem to have this option. Expcted functionality: If I ask it to search for (Post Graduate) OR (Masters

Re: Does Nutch support the boolean OR operator in a search query?

2009-01-19 Thread Doğacan Güney

Hi, On Mon, Jan 19, 2009 at 4:02 PM, M S Ram ms...@cse.iitk.ac.in wrote: Hi, Does Nutch support the boolean OR operator (or something similar) in a search query? I mean is there any class already available to do this? The Nutch search interface doesn't seem to have this option. Expcted

Re: Does Nutch support the boolean OR operator in a search query?

2009-01-19 Thread M S Ram

: Hi, Does Nutch support the boolean OR operator (or something similar) in a search query? I mean is there any class already available to do this? The Nutch search interface doesn't seem to have this option. Expcted functionality: If I ask it to search for (Post Graduate) OR (Masters), it should

Re: Does Nutch support the boolean OR operator in a search query?

2009-01-19 Thread Lyndon Maydwell

Lucene has support for OR queries, so it should be possible to do it, but support for this in nutch isn't available as far as I know. I'd also be intersted if anyone has managed to implement this. On Tue, Jan 20, 2009 at 1:50 AM, M S Ram ms...@cse.iitk.ac.in wrote: Oh! That's sad! :( What

does nutch support crawling cold fusion pages?

2008-12-08 Thread Alex Basa

Hi, Does anyone know if there is a plugin for cold fusion pages or if it's supported? I'm trying to crawl http://www.knowitall.org/naturalstate Thanks in advance, Alex

What kind of searches does Nutch support?

2008-05-04 Thread Miao Liqiang NCS

What kind of searches does Nutch support?

Missing zh.ngp for zh locate support for language Identifier

2008-03-15 Thread Vinci

Hi all, I found there is missing zh.ngp for zh locate. I have seen this file via a screenshot and then I googled the filename return nothing for me...can anyone provide this file for me? Thank you -- View this message in context: http://www.nabble.com/Missing-zh.ngp-for-zh-locate-support

Support Hardware and OS for nutch and hadoop

2008-01-04 Thread Developer Developer

Hello Frens, I am gathering information on supoorted hardware and OS for nutch and hadoop . I did not find any conclusive information by going thru Nutch wiki. If I want to build a cluster of nodes using nutch/hadoop for crawling then what are my options for H/W and OS ?

Prefix Query in Nutch and Wildcard support.

2008-01-03 Thread Developer Developer

Hello Frens, Is there anyway to do prefix query in Nutch ? Eg Query the content field for the occurance of abc* ? I could do it in Lucene, but i want to do it in nuthch . Going through the mialing list it appeared that Nutch does not support such queries. Is it ture ? Thanks !

Re: NUTCH-479 Support for OR queries - what is this about

2007-07-09 Thread Briggs

: * to avoid the need to support low-level index and searcher operations, which the Lucene API would require us to implement. * to keep the Nutch core largely independent of Lucene, so that it's possible to use Nutch with different back-end searcher implementations. This started to materialize only

Re: NUTCH-479 Support for OR queries - what is this about

2007-07-07 Thread Briggs

[EMAIL PROTECTED] wrote: I've been reading up on NUTCH-479 Support for OR queries but I must be missing something obvious because I don't understand what the JIRA is about: https://issues.apache.org/jira/browse/NUTCH-479 Description: There have been many requests from users to extend

Re: NUTCH-479 Support for OR queries - what is this about

2007-07-07 Thread Andrzej Bialecki

actually almost nothing to do with the scoring filters (which were added much later). The decision to use a different query syntax than the one from Lucene was motivated by a few reasons: * to avoid the need to support low-level index and searcher operations, which the Lucene API would require us

NUTCH-479 Support for OR queries - what is this about

2007-07-06 Thread Kai_testing Middleton

I've been reading up on NUTCH-479 Support for OR queries but I must be missing something obvious because I don't understand what the JIRA is about: https://issues.apache.org/jira/browse/NUTCH-479 Description: There have been many requests from users to extend Nutch query syntax to add

How best to add sponsored link support..??

2006-12-19 Thread RP

Hi all, I've been tasked with looking into this and am not a coder - that said, Nutch is doing great and the bean counters have asked me to look into adding sponsored link results and I'm wondering how best to add this. It would be nice to utilize the Nutch engine to come up with the pages

Re: How best to add sponsored link support..??

2006-12-19 Thread RP

, 2006 10:52:56 AM Subject: How best to add sponsored link support..?? Hi all, I've been tasked with looking into this and am not a coder - that said, Nutch is doing great and the bean counters have asked me to look into adding sponsored link results and I'm wondering how best to add

Re: How best to add sponsored link support..??

2006-12-19 Thread Sami Siren

advertising such as Google Ads. Sean - Original Message From: RP [EMAIL PROTECTED] To: nutch-user@lucene.apache.org Sent: Tuesday, December 19, 2006 10:52:56 AM Subject: How best to add sponsored link support..?? Hi all, I've been tasked with looking into this and am

Re: How best to add sponsored link support..??

2006-12-19 Thread RP

PROTECTED] To: nutch-user@lucene.apache.org Sent: Tuesday, December 19, 2006 10:52:56 AM Subject: How best to add sponsored link support..?? Hi all, I've been tasked with looking into this and am not a coder - that said, Nutch is doing great and the bean counters have asked me to look

Re: Lucene query support in Nutch

2006-10-10 Thread Stefan Neufeind

Cristina Belderrain wrote: On 10/9/06, Tomi NA [EMAIL PROTECTED] wrote: This is *exactly* what I was thinking. Like Stefan, I believe the nutch analyzer is a good foundation and should therefore be extended to support the or operator, and possibly additional capabilities when the need

Re: Lucene query support in Nutch

2006-10-10 Thread Tomi NA

2006/10/10, Cristina Belderrain [EMAIL PROTECTED]: On 10/9/06, Tomi NA [EMAIL PROTECTED] wrote: This is *exactly* what I was thinking. Like Stefan, I believe the nutch analyzer is a good foundation and should therefore be extended to support the or operator, and possibly additional

Re: Lucene query support in Nutch

2006-10-10 Thread Bill Goffe

Tomi said: In conclusion, my position is pragmatic: I welcome the simplest solution to implement the or search. I just believe that it'd be easiest to do that extending the nutch Analyzer. This seems like a very reasonable approach. I too would very much like OR. It would also be nice if it

Re: Lucene query support in Nutch

2006-10-09 Thread Tomi NA

-syntax? As has just been pointed out: It This is *exactly* what I was thinking. Like Stefan, I believe the nutch analyzer is a good foundation and should therefore be extended to support the or operator, and possibly additional capabilities when the need arises. t.n.a.

Re: Lucene query support in Nutch

2006-10-09 Thread Cristina Belderrain

On 10/9/06, Tomi NA [EMAIL PROTECTED] wrote: This is *exactly* what I was thinking. Like Stefan, I believe the nutch analyzer is a good foundation and should therefore be extended to support the or operator, and possibly additional capabilities when the need arises. t.n.a. Tomi, why would

Re: Lucene query support in Nutch

2006-10-07 Thread Cristina Belderrain

Hello, I just would like to confirm that the version of the search() method shown in the previous post works fine, at least regarding boolean queries. Anyway, I see no reason why it wouldn't work with any other Lucene query (fuzzy, proximity, etc.). Now, please be warned that the inclusion of

Re: Lucene query support in Nutch

2006-10-07 Thread Björn Wilmsmann

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, Am 07.10.2006 um 17:40 schrieb Cristina Belderrain: Let me remind you that all this must be done just to provide something that's already there: Nutch is built on top of Lucene, after all. If it's hard to understand why Lucene's capabilities

Re: Lucene query support in Nutch

2006-10-07 Thread Sami Siren

Nevertheless, I agree that there should be an option to choose the Lucene query engine instead of the Nutch flavour one because Nutch has been proven to be equally suitable for areas which do not require as efficient queries (like intranet crawling for instance) as an all-out web indexing

Re: Lucene query support in Nutch

2006-10-07 Thread Stefan Neufeind

Björn Wilmsmann wrote: Am 07.10.2006 um 17:40 schrieb Cristina Belderrain: Let me remind you that all this must be done just to provide something that's already there: Nutch is built on top of Lucene, after all. If it's hard to understand why Lucene's capabilities were simply neutralized

Re: Lucene query support in Nutch

2006-10-05 Thread Stefan Neufeind

Hi, yes, I guess having the full strength of Lucene-based queries would be nice. That would as well solve the boolean queries-question I had a few days ago :-) Ravi, doesn't Lucene also allow querying of other fields? Is there any possibility to add that feature to your proposal? In general:

Re: Lucene query support in Nutch

2006-10-05 Thread Björn Wilmsmann

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi everybody, On 05/10/2006 05:44 Ravi Chintakunta wrote: public Hits search(String queryString, int numHits, String dedupField, String sortField, boolean reverse) throws IOException {

Re: Lucene query support in Nutch

2006-10-05 Thread Cristina Belderrain

Hi Björn, yes, the error you point out will happen indeed... A possible workaround would be: public Hits search(String queryString, int numHits, String dedupField, String sortField, boolean reverse) throws IOException { org.apache.lucene.queryParser.QueryParser parser =

OpenOffice Support?

2006-07-11 Thread Matthew Holt

Just wondering, has anyone done any work on a plugin (or aware of a plugin) that supports the indexing of open office documents? Thanks. Matt

Re: OpenOffice Support?

2006-07-11 Thread Lourival Júnior

Using to advantage your question, anyone knows if the version 0.7.2 of nutch supports the zip plugin? If so, where can I find it? Lourival Junior On 7/11/06, Matthew Holt [EMAIL PROTECTED] wrote: Just wondering, has anyone done any work on a plugin (or aware of a plugin) that supports the

Re: Add Wyona to the wiki support page?

2006-06-21 Thread Andrzej Bialecki

Renaud Richardet wrote: Hello Nutch, My name is Renaud Richardet and I am the COO of Wyona LLC. We are offering Nutch and Lucene support (http://wyona.com/lucene.html), and I was wondering if I could add our company to http://wiki.apache.org/nutch/Support. That would be great. Certainly

Re: Add Wyona to the wiki support page?

2006-06-21 Thread Insurance Squared Inc.

obey the nofollow tags? g. Andrzej Bialecki wrote: Renaud Richardet wrote: Hello Nutch, My name is Renaud Richardet and I am the COO of Wyona LLC. We are offering Nutch and Lucene support (http://wyona.com/lucene.html), and I was wondering if I could add our company to http

Re: Add Wyona to the wiki support page?

2006-06-21 Thread Andrzej Bialecki

Insurance Squared Inc. wrote: The funny thing about that wiki page (and some others in that area) is that they apparently use the nofollow tags. Given the topic of that wiki, isn't that a bit odd? I personally dislike the nofollow tag and think it should be used only in extreme circumstances

Re: Add Wyona to the wiki support page?

2006-06-21 Thread Insurance Squared Inc.

Well so much for knee-jerk suspicions as to intent. No need to look for conspiracy theories when default settings are more likely to be the cause. That should probably a corollary to occam's razor or something :). Andrzej Bialecki wrote: Insurance Squared Inc. wrote: The funny thing

Re: Full fledged Lucene Query Syntax support in Nutch

2006-05-04 Thread Ravi Chintakunta

. Is there a reason that Nutch does not support the entire Lucene query syntax by default? Thanks in advance, Ravi Chintakunta

Full fledged Lucene Query Syntax support in Nutch

2006-05-02 Thread Ravi Chintakunta

. We have to modify the analyzer and add more plugins to Nutch to use the Lucene's query syntax. Or we have to directly use Lucene's Query Parser. I tried the second approach by modifying org.apache.nutch.searcher.IndexSearcher and that seems to work. Is there a reason that Nutch does not support

Re: Full fledged Lucene Query Syntax support in Nutch

2006-05-02 Thread Herman Hardenbol

Sorry, I am on holiday until the 8th of May. Please contact the [EMAIL PROTECTED] for urgent matters. Kind regards, Herman.

HTTPS support?

2006-03-06 Thread David Odmark

Hi, Does Nutch 0.8 support https fetches? If not, are there any active efforts to support it? TIA, David Odmark

Re: HTTPS support?

2006-03-06 Thread Andrzej Bialecki

David Odmark wrote: Hi, Does Nutch 0.8 support https fetches? If not, are there any active efforts to support it? It does, using protocol-httpclient plugin. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information

Nutch doesn't support Korean?

2006-03-03 Thread Teruhiko Kurosaka

I was browing NutchAnalysis.jj and found that Hungul Syllables (U+AC00 ... U+D7AF; U+ means a Unicode character of the hex value ) are not part of LETTER or CJK class. This seems to me that Nutch cannot handle Korean documents at all. Is anybody successfully using Nutch for Korean?

Re: Nutch doesn't support Korean?

2006-03-03 Thread Cheolgoo Kang

Hello, There was similar issue with Lucene's StandardTokenizer.jj. http://issues.apache.org/jira/browse/LUCENE-444 and http://issues.apache.org/jira/browse/LUCENE-461 I'm have almost no experience with Nutch, but you can handle it like those issues above. On 3/4/06, Teruhiko Kurosaka [EMAIL

xquery support for nutch

2006-02-20 Thread Raghavendra Prabhu

Hi It would be great if we provide xquery support to nutch where expressions like 3 + 4=7 would be evaluated. http://www.xml.com/pub/a/2002/10/16/xquery.html It is just an idea and probably would make it a universal tool Rgds Prabhu

Single NutchBean and multiple indices support

2006-02-15 Thread Jack Tang

opened. Then I call closeSegments() after each search. I realise that NutchBean really wasn't designed to support being instantiated once per search, but I don't care. It works well, and performance is not an issue. Regards, David. Date: Mon, 6 Feb 2006 20:59:34 -0500 From: Ravi

Re: Single NutchBean and multiple indices support

2006-02-15 Thread Ravi Chintakunta

to FetchSegments.Segment in my installation, to close all the readers. I added a closeSegments() method to NutchBean, to call close() on each segment that's been opened. Then I call closeSegments() after each search. I realise that NutchBean really wasn't designed to support being instantiated

Re: Which version of rss does parse-rss plugin support?

2006-02-10 Thread Chris Mattmann

- From: 盖世豪侠 [mailto:[EMAIL PROTECTED] Sent: Saturday, February 04, 2006 11:40 PM To: nutch-user@lucene.apache.org Subject: Re: Which version of rss does parse-rss plugin support? Hi Chris How do I change the plugin.xml? For example, if I want to crawl rss files end with xml, just add a new

Re: Which version of rss does parse-rss plugin support?

2006-02-10 Thread Elwin

of either NASA, JPL, or the California Institute of Technology. -Original Message- From: 盖世豪侠 [mailto:[EMAIL PROTECTED] Sent: Saturday, February 04, 2006 11:40 PM To: nutch-user@lucene.apache.org Subject: Re: Which version of rss does parse-rss plugin support? Hi Chris How do

opensearch support

2006-02-07 Thread Geraint Williams

Is OpenSearch being developed? I am using nutch 0.7 and it seems to have some opensearch support. However, I failed to get either a python or perl opensearch client library (admittedly these are also in early development). The perl library seemed to choke at not finding

RE: Which version of rss does parse-rss plugin support?

2006-02-05 Thread Chris Mattmann

those of either NASA, JPL, or the California Institute of Technology. -Original Message- From: 盖世豪侠 [mailto:[EMAIL PROTECTED] Sent: Saturday, February 04, 2006 11:40 PM To: nutch-user@lucene.apache.org Subject: Re: Which version of rss does parse-rss plugin support? Hi Chris

Re: Which version of rss does parse-rss plugin support?

2006-02-05 Thread 盖世豪侠

, or the California Institute of Technology. -Original Message- From: 盖世豪侠 [mailto:[EMAIL PROTECTED] Sent: Saturday, February 04, 2006 11:40 PM To: nutch-user@lucene.apache.org Subject: Re: Which version of rss does parse-rss plugin support? Hi Chris How do I change the plugin.xml

Re: Which version of rss does parse-rss plugin support?

2006-02-04 Thread 盖世豪侠

locally. For web-based crawls, you need to make sure that the content type being returned for your RSS content matches the content type specified in the plugin.xml file that parse-rss claims to support. Note that you might not have * a lot * of success with being able to control the content

Which version of rss does parse-rss plugin support?

2006-02-03 Thread 盖世豪侠

I see the test file is of version 0.91. Does the plugin support higher versions like 1.0 or 2.0? -- 《盖世豪侠》好评如潮，让无线收视居高不下，无线高兴之余，仍未重用。周星驰岂是池中物，喜剧天分既然崭露，当然不甘心受冷落，于是转投电影界，在大银幕上一展风采。无线既得千里马，又失千里马，当然后悔莫及。

Re: Which version of rss does parse-rss plugin support?

2006-02-03 Thread Chris Mattmann

1.0 modules capability... Hope that helps. Thanks, Chris On 2/3/06 6:46 AM, 盖世豪侠 [EMAIL PROTECTED] wrote: I see the test file is of version 0.91. Does the plugin support higher versions like 1.0 or 2.0? -- 《盖世豪侠》好评如潮，让无线收视居高不下，无线高兴之余，仍未重用。周星驰岂是池中物，喜剧天分既然崭露，当然不甘心受冷落，于是转投电影界，在大银幕上一展风采。无线既

Re: Which version of rss does parse-rss plugin support?

2006-02-03 Thread 盖世豪侠

file is of version 0.91. Does the plugin support higher versions like 1.0 or 2.0? -- 《盖世豪侠》好评如潮，让无线收视居高不下，无线高兴之余，仍未重用。周星驰岂是池中物，喜剧天分既然崭露，当然不甘心受冷落，于是转投电影界，在大银幕上一展风采。无线既得千里马，又失千里马，当然后悔莫及。 -- 《盖世豪侠》好评如潮，让无线收视居高不下，无线高兴之余，仍未重用。周星驰岂是池中物，喜剧天分既然崭露，当然不甘心受冷落，于是转投电影界，在大银幕上一展风采。无线既得千里马，又失千里马

Multi CPU support

2006-01-09 Thread Teruhiko Kurosaka

Can I use MapReduce to run Nutch on a multi CPU system? I want to run the index job on two (or four) CPUs on a single system. I'm not trying to distribute the job over multiple systems. If the MapReduce is the way to go, do I just specify config parameters like these:

Re: Multi CPU support

2006-01-09 Thread Doug Cutting

Teruhiko Kurosaka wrote: Can I use MapReduce to run Nutch on a multi CPU system? Yes. I want to run the index job on two (or four) CPUs on a single system. I'm not trying to distribute the job over multiple systems. If the MapReduce is the way to go, do I just specify config parameters

multibyte character support status

2005-12-27 Thread Teruhiko Kurosaka

What is the current state and plan for multibyte character support by Nutch? As far as I can tell... The PDF plugin uses PDFBox (www.pdfbox.org) which does not work with Japanese and probably other multibyte characters and code sets. The Word plugin uses POI (http://jakarta.apache.org/poi

Re: PDF indexing support?

2005-11-16 Thread Håvard W. Kongsgård

Tanks it worked Jérôme Charron wrote: The value you specified is biggest than the maximal int value, so that it return an exception, and then the default value is used. As mentionned in the property's description, use a negative value (-1) for no truncation at all (or a value lesser than

Re: PDF indexing support?

2005-11-16 Thread Hasan Diwan

On Nov 15, 2005, at 2:46 PM, Håvard W. Kongsgård wrote: Don't have a conf/nutch-site.xml Create it and put the overrides in there, per the nutch tutorial. Cheers, Hasan Diwan [EMAIL PROTECTED] PGP.sig Description: This is a digitally signed message part

Re: PDF indexing support?

2005-11-15 Thread Stefan Groschupf

PDF indexing support? Simply by activating the parse-pdf plugin in nutch-default.xml or nutch-site.xml (take a look at the plugin.includes property) Jérôme -- http://motrech.free.fr/ http://www.frutch.org/ - --- No virus

Re: PDF indexing support?

2005-11-15 Thread Håvard W. Kongsgård

conf/nutch-default Jérôme Charron wrote: http.content.limit=542256565536 and file.content.limit=4541165536 still the same error: where do you specify these values? in nutch-default or nutch-site? Jérôme -- http://motrech.free.fr/ http://www.frutch.org/

Re: PDF indexing support?

2005-11-15 Thread Jérôme Charron

conf/nutch-default Checks that they are not overrided in the conf/nutch-site If no, sorry, no more idea for now :-( Jérôme -- http://motrech.free.fr/ http://www.frutch.org/

Re: PDF indexing support?

2005-11-15 Thread Håvard W. Kongsgård

Don't have a conf/nutch-site.xml Jérôme Charron wrote: conf/nutch-default Checks that they are not overrided in the conf/nutch-site If no, sorry, no more idea for now :-( Jérôme -- http://motrech.free.fr/ http://www.frutch.org/

PDF indexing support?

2005-11-14 Thread Håvard W. Kongsgård

Hello I new with nutch how do I enable PDF indexing support?

PDF support? Does crawl parse p

2005-08-31 Thread Diane Palla

expect to support application/pdf types and have such parsing of pdf files available? Diane Palla Web Services Developer Seton Hall University 973 313-6199 [EMAIL PROTECTED] Bryan Woliner [EMAIL PROTECTED] 08/23/2005 05:22 PM Please respond to nutch-user@lucene.apache.org To nutch-user

Re: PDF support? Does crawl parse p

2005-08-31 Thread Piotr Kosiorowski

developers expect to support application/pdf types and have such parsing of pdf files available? Diane Palla Web Services Developer Seton Hall University 973 313-6199 [EMAIL PROTECTED] Bryan Woliner [EMAIL PROTECTED] 08/23/2005 05:22 PM Please respond to nutch-user@lucene.apache.org

Re: metadata support in WebDB (Stefan's NUTCH-59 patch)

2005-07-19 Thread Stefan Groschupf

Hi Otis, http://issues.apache.org/jira/browse/NUTCH-59 This patch looks interesting for my Nutch needs, So please vote for the patch if you like it. :-) I can't look at the code, but looking at your diff, it looks like this metadata would be stored somewhere inside Nutch's WebDB, and that

84 matches

Mail list logo