date:20050831

Re: null lang bug? and patch?

2005-08-31 Thread Jérôme Charron

I did a little digging and it appears that lang ends up being null (couldn't quite track down where lang should have been set). Not sure if it is a proper fix, but changing doc.getField(lang).stringValue() to doc.get(lang), makes my little crawl complete. lang is null cause you don't have

Re: Language identifier plugin questions

2005-08-31 Thread Jérôme Charron

I agree it is important to have the NGramProfile.getSimilarity() method. However, I think it is also important that it is consistent with the scoring that LanguageIdentifier uses, even if LanguageIdentifier optimises the implementation. Looking at the code I see that the two scoring

Re: Language identifier plugin questions

2005-08-31 Thread Jérôme Charron

Tom, I have created the NUTCH-86 issue to report the needed changes in the LanguageIdentifier we discussed in this thread. The issue is available at http://issues.apache.org/jira/browse/NUTCH-86 Regards Jérôme

Re[2]: NDFS question

2005-08-31 Thread Egor Chernodarov

Hello, Doug! I try with mapred branch, but anyway get errors like this: $./nutch ndfs -put ./test.txt /test.txt = 050831 055936 Client connection to 192.168.0.170:9000: starting 050831 060245 Waiting to find target node = On namenode I see : 050831 055936

Re: [Nutch-cvs] svn commit: r240359 - in /lucene/nutch/trunk/src: java/org/apache/nutch/analysis/ java/org/apache/nutch/indexer/ plugin/nutch-extensionpoints/

2005-08-31 Thread Jérôme Charron

I see several instances of 'analySer' in comments/javadoc and some variables. That should probably be changed to american version - analyzer, for consistency's sake. Corrected/Committed (http://svn.apache.org/viewcvs.cgi?rev=265020view=rev) Regards Jérôme -- http://motrech.free.fr/

Re: [jira] Commented: (NUTCH-65) index-more plugin can't parse large set of modification-date

2005-08-31 Thread Michael Nebel

Some more errors (short selection from my logfile). Do we really have to handle the all seperatly or are there any functions/tools for this kind of problem? ...can't parse erroneous date: 12.06.2005 22:02:54 GMT ...can't parse erroneous date: 14.07.2005 GMT ...can't parse erroneous date:

Re: [Nutch Wiki] Update of Committer's Rules by AndrzejBialecki

2005-08-31 Thread Doug Cutting

Apache Wiki wrote: 1. The SVN repository consists of the following areas: a. '''trunk''' [ ... ] a. '''Release-x.x''' branches [ ... ] This should also mention tags, fixed versions of the code where no development occurs. I also would prefer that tag names and branch names are distinct,

Re: [Nutch Wiki] Update of Committer's Rules by AndrzejBialecki

2005-08-31 Thread Piotr Kosiorowski

Doug Cutting wrote: Glancing at other Apache projects in subversion, I see that httpd uses branch names like 2.2.x and tag names like 2.2.4. That's a little cryptic. I propose that we use branch names like branch-2.4 and tag names like release-2.4.1. What do folks think? +1 In fact I

Re: merge mapred to trunk

2005-08-31 Thread Piotr Kosiorowski

Doug Cutting wrote: Currently we have three versions of nutch: trunk, 0.7 and mapred. This increases the chances for conflicts. I would thus like to merge the mapred branch into trunk soon. The soonest I could actually start this is next week. Are there any objections? Doug +1 P.

Re: [Nutch Wiki] Update of Committer's Rules by AndrzejBialecki

2005-08-31 Thread Jérôme Charron

Glancing at other Apache projects in subversion, I see that httpd uses branch names like 2.2.x and tag names like 2.2.4. That's a little cryptic. I propose that we use branch names like branch-2.4 and tag names like release-2.4.1. What do folks think? +1 Jérôme -- http://motrech.free.fr/

Re: null lang bug? and patch?

2005-08-31 Thread Jérôme Charron

I am a bit lost but just a quick check - shouldn't it also be committed in Release-0.7 branch? No, the analyzer extension-point is commited only in trunk. It's a new feature, so I follow Committer's Rules ( http://wiki.apache.org/nutch/Committer's_Rules) ;-) Regards Jérôme --

Re: Automating workflow using ndfs

2005-08-31 Thread Doug Cutting

I assume that in most NDFS-based configurations the production search system will not run out of NDFS. Rather, indexes will be created offline for a deployment (i.e., merging things to create an index per search node), then copied out of NDFS to the local filesystem on a production search

Re: null lang bug? and patch?

2005-08-31 Thread Piotr Kosiorowski

Great - I just thought that it would be better if you look at it - instead of me digging into the code. I wanted to be on the safe side with 0.7.1 release. Regards Piotr Jérôme Charron wrote: I am a bit lost but just a quick check - shouldn't it also be committed in Release-0.7 branch? No,

Fw: PDF support? Does crawl parse pdf files? How do I get it work?

2005-08-31 Thread Diane Palla

Does Nutch have a way to parse pdf files, that is, application/pdf content type files? I noticed a plugin variable setting in default.properties: plugin.pdf=org.apache.nutch.parse.pdf* I never changed this file. Is that the right value? I am using Nutch 0.7. What do I have to do make parse

[jira] Commented: (NUTCH-21) parser plugin for MS PowerPoint slides

2005-08-31 Thread Jerome Charron (JIRA)

[ http://issues.apache.org/jira/browse/NUTCH-21?page=comments#action_12320717 ] Jerome Charron commented on NUTCH-21: - Want to commit it, but unit tests failed. parser plugin for MS PowerPoint slides --

Re: [jira] Commented: (NUTCH-65) index-more plugin can't parse large set of modification-date

2005-08-31 Thread Jérôme Charron

Michael, the solution is perhaps to use Jakarta Commons DateUtils.parseDate method: http://jakarta.apache.org/commons/lang/api/org/apache/commons/lang/time/DateUtils.html#parseDate(java.lang.String,%20java.lang.String[]) It will gives something like: Date parsedDate =

Re: merge mapred to trunk

2005-08-31 Thread ogjunk-nutch

Currently we have three versions of nutch: trunk, 0.7 and mapred. This increases the chances for conflicts. I would thus like to merge the mapred branch into trunk soon. The soonest I could actually start this is next week. Are there any objections? I, too, am looking forward to this,

Re: merge mapred to trunk

2005-08-31 Thread Doug Cutting

[EMAIL PROTECTED] wrote: I, too, am looking forward to this, but I am wondering what that will do to Kelvin Tan's recent contribution, especially since I saw that both MapReduce and Kelvin's code change how FetchListEntry works. If merging mapred to trunk means losing Kelvin's changes, then I

Re: merge mapred to trunk

2005-08-31 Thread ogjunk-nutch

--- Doug Cutting [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote: I, too, am looking forward to this, but I am wondering what that will do to Kelvin Tan's recent contribution, especially since I saw that both MapReduce and Kelvin's code change how FetchListEntry works. If merging

Re: merge mapred to trunk

2005-08-31 Thread Kelvin Tan

On Wed, 31 Aug 2005 14:37:54 -0700, Doug Cutting wrote: [EMAIL PROTECTED] wrote: I, too, am looking forward to this, but I am wondering what that will do to Kelvin Tan's recent contribution, especially since I saw that both MapReduce and Kelvin's code change how FetchListEntry works. If

Re: [jira] Commented: (NUTCH-65) index-more plugin can't parse large set of modification-date

2005-08-31 Thread Michael Nebel

Hi Jérôme, it works great (see the new function bellow). But we'll have to add commons-lang (http://jakarta.apache.org/commons/lang/) to the libraries. Are there any objections? How is the procedure to add it? I'm trying my changes right now (I think, it will take the rest of the night to

Re: null lang bug? and patch?

Re: Language identifier plugin questions

Re: Language identifier plugin questions

Re[2]: NDFS question

Re: [Nutch-cvs] svn commit: r240359 - in /lucene/nutch/trunk/src: java/org/apache/nutch/analysis/ java/org/apache/nutch/indexer/ plugin/nutch-extensionpoints/

Re: [jira] Commented: (NUTCH-65) index-more plugin can't parse large set of modification-date

Re: [Nutch Wiki] Update of Committer's Rules by AndrzejBialecki

Re: [Nutch Wiki] Update of Committer's Rules by AndrzejBialecki

Re: merge mapred to trunk

Re: [Nutch Wiki] Update of Committer's Rules by AndrzejBialecki

Re: null lang bug? and patch?

Re: Automating workflow using ndfs

Re: null lang bug? and patch?

Fw: PDF support? Does crawl parse pdf files? How do I get it work?

[jira] Commented: (NUTCH-21) parser plugin for MS PowerPoint slides

Re: [jira] Commented: (NUTCH-65) index-more plugin can't parse large set of modification-date

Re: merge mapred to trunk

Re: merge mapred to trunk

Re: merge mapred to trunk

Re: merge mapred to trunk

Re: [jira] Commented: (NUTCH-65) index-more plugin can't parse large set of modification-date

21 matches

Site Navigation

Mail list logo

Footer information