Re: [jira] Updated: (NUTCH-627) Minimize host address lookup

2008-04-10 Thread Chris Mattmann
it will be positive, and we will have a clean situation from the formal POV. Ok? +1 +1, as well. Cheers, Chris __ Chris Mattmann, Ph.D. [EMAIL PROTECTED] Cognizant Development Engineer Early Detection Research Network Project

Re: End-Of-Life status for 0.7.x?

2008-01-17 Thread Chris Mattmann
releases or apply patches that sit in JIRA? My opinion is that we should mark it EOL, and close all JIRA issues that are relevant only to 0.7.x, with the status Won't Fix. __ Chris Mattmann, Ph.D. [EMAIL PROTECTED] Cognizant Development Engineer

Re: Student contributions

2008-01-02 Thread Chris Mattmann
be appropriate for a small group of undergrad, upperclass CS students? I'm looking for ideas for improving Nutch that they could accomplish in a few weeks time. Thanks, __ Chris Mattmann, Ph.D. [EMAIL PROTECTED] Cognizant Development Engineer Early Detection

Re: Commit Times for Issues

2007-11-16 Thread Chris Mattmann
are just some suggestions would love to hear from others in the community. What I think would be best is to come to a consensus on this and then have a wiki page describing this and other processes for committers. Dennis Kubes __ Chris Mattmann

Re: JIRA, Resolving and Closing Issues

2007-10-18 Thread Chris Mattmann
Kubes [EMAIL PROTECTED] wrote: Quick question about Jira. When we commit, are we supposed to first resolve and then close the issue. What is the process on this. Dennis Kubes __ Chris Mattmann, Ph.D. [EMAIL PROTECTED] Cognizant Development Engineer

Re: writing a new parse-exe plugin

2007-10-17 Thread Chris Mattmann
() { return this.conf; } __ Chris Mattmann, Ph.D. [EMAIL PROTECTED] Cognizant Development Engineer Early Detection Research Network Project _ Jet Propulsion LaboratoryPasadena, CA Office

Re: [jira] Closed: (NUTCH-562) Port mime type framework to use Tika mime detection framework

2007-10-10 Thread Chris Mattmann
there are some similarities here. -Chris __ Chris Mattmann, Ph.D. [EMAIL PROTECTED] Cognizant Development Engineer Early Detection Research Network Project _ Jet Propulsion LaboratoryPasadena, CA

Re: svn commit: r550669 - in /lucene/nutch/trunk/src: java/org/apache/nutch/util/ plugin/languageidentifier/src/java/org/apache/nutch/analysis/lang/ plugin/parse-html/src/java/org/apache/nutch/parse/h

2007-06-25 Thread Chris Mattmann
No problemo! Thanks! Cheers, Chris On 6/25/07 9:45 PM, Dennis Kubes [EMAIL PROTECTED] wrote: ooopsgotta remember to do that. Done. Dennis Chris Mattmann wrote: On 6/25/07 8:34 PM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Author: kubes Date: Mon Jun 25 20:33:59 2007 New

Re: Build failed in Hudson: Nutch-Nightly #123

2007-06-20 Thread Chris Mattmann
Doğacan, This is strange indeed. I noticed this during my testing of parse-feed, however, thought it was an anomaly. I got this same strange cryptic unit test error message, and then after some frustration figuring it out, I did ant clean, then ant compile-core test, and miraculously the error

Re: Build failed in Hudson: Nutch-Nightly #123

2007-06-20 Thread Chris Mattmann
On 6/20/07 8:17 AM, Doğacan Güney [EMAIL PROTECTED] wrote: Since you are doing compile-core, no plugins get compiled (say, urlfilter-prefix), then when you do a ant test in feed only protocol-file gets compiled. So, no urlfilter-prefix, no problem :). I have to say that I am certain that I

Re: Welcome Doğacan as Nutch committer

2007-06-12 Thread Chris Mattmann
+1 Welcome to the team, Doğacan! Cheers, Chris On 6/12/07 9:43 AM, Sami Siren [EMAIL PROTECTED] wrote: Doğacan Güney wrote: Hi all, I hope that together we will make nutch rock even harder. By looking at your earlier efforts there should be no doubt. Welcome!

Committer

2007-05-30 Thread Chris Mattmann
Hi Folks, I'd just like to throw out my +1 for Doğacan Güney's committer status. I've been impressed by several of his contributions and the guy just keeps them coming and coming. I'm not a member of the Lucene PMC, so I don't have official voting rights, however, I would like to express my

Re: Nutch Release 0.9 - Waiting for release to propagate to mirrors

2007-04-05 Thread Chris Mattmann
announcing the completion of the release. Thanks! Cheers, Chris On 4/4/07 7:21 PM, Chris Mattmann [EMAIL PROTECTED] wrote: Hi Guys, I've just moved forward with step 13 in the release process (waiting for release to propogate to mirrors). Should I just go ahead and do the other

Nutch 0.9 officially released!

2007-04-05 Thread Chris Mattmann
Hi Folks, After some hard work from all folks involved, we've managed to push out Apache Nutch, release 0.9. This is the second release of Nutch based entirely on the underlying Hadoop platform. This release includes several critical bug fixes, as well as key speedups described in more detail at

Re: [VOTE] Release Apache Nutch 0.9

2007-04-04 Thread Chris Mattmann
wrapped up tonight! :-) Cheers, Chris On 4/4/07 8:04 AM, Sami Siren [EMAIL PROTECTED] wrote: Chris Mattmann wrote: Hi Folks, I have posted a candidate for the Apache Nutch 0.9 release at http://people.apache.org/~mattmann/nutch_0.9/rc2/ Please vote on releasing these packages as Apache

Nutch Release 0.9 - Waiting for release to propagate to mirrors

2007-04-04 Thread Chris Mattmann
Hi Guys, I've just moved forward with step 13 in the release process (waiting for release to propogate to mirrors). Should I just go ahead and do the other steps (update Nutch site, update Lucene site, Update javadoc, create version in JIRA, etc.)? It seems that I could do these without the

Re: [VOTE] Release Apache Nutch 0.9

2007-04-02 Thread Chris Mattmann
Hi Guys, I think we're discussing about the same thing(improving the process), I just don't think 0.9 is out yet :) But to wrap it up for me: +1 for creating 0.9 branch after fixing the bug (and removing the tag), creating new rc and starting a vote. +1. +1. So, that's 3

Re: svn commit: r524932 - in /lucene/nutch/trunk/src/java/org/apache/nutch/segment: SegmentMerger.java SegmentReader.java

2007-04-02 Thread Chris Mattmann
Hi Dennis, Thanks for taking care of this. :-) Could you update CHANGES.txt as well? Once you take care of that, in about 2 hrs (when I get home), I'll begin the release process again. Thanks! Cheers, Chris On 4/2/07 2:40 PM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Author: kubes

Re: svn commit: r524932 - in /lucene/nutch/trunk/src/java/org/apache/nutch/segment: SegmentMerger.java SegmentReader.java

2007-04-02 Thread Chris Mattmann
to it sooner. Dennis Kubes Chris Mattmann wrote: Hi Dennis, Thanks for taking care of this. :-) Could you update CHANGES.txt as well? Once you take care of that, in about 2 hrs (when I get home), I'll begin the release process again. Thanks! Cheers, Chris On 4/2/07 2:40

[VOTE] Release Apache Nutch 0.9

2007-04-02 Thread Chris Mattmann
Hi Folks, I have posted a candidate for the Apache Nutch 0.9 release at http://people.apache.org/~mattmann/nutch_0.9/rc2/ See the included CHANGES-0.9.txt file for details on release contents and latest changes. The release was made from the 0.9-dev trunk, including the recent patch applied

Re: [VOTE] Release Apache Nutch 0.9

2007-04-02 Thread Chris Mattmann
Folks, As an FYI, here is a link to the log of the steps that I followed to get to this point in the release: http://people.apache.org/~mattmann/NUTCH_0.9_release_log_v2.doc Cheers, Chris On 4/2/07 10:52 PM, Chris Mattmann [EMAIL PROTECTED] wrote: Hi Folks, I have posted a candidate

Re: [VOTE] Release Apache Nutch 0.9

2007-03-28 Thread Chris Mattmann
Well, it's just going to add more work for me, but in the end, it's probably something that needs to be in there. I could go either way on this though, as in, if we don't commit it, 0.9.1 shouldn't be far off. Here's my +1 for going ahead and committing it... On 3/28/07 10:21 AM, Dennis Kubes

Re: Next release - 0.10.0 or 1.0.0 ?

2007-03-28 Thread Chris Mattmann
My +1 for 1.0.0. I already changed it to 0.10.0, but this can be easily reverted, and was probably something that I should have brought to the attention of the dev list before I did that (sorry about that). In any case, I think 1.0.0 makes a lot of sense, politically, and software wise. Nutch is

Re: [VOTE] Release Apache Nutch 0.9

2007-03-27 Thread Chris Mattmann
/, using the same convention as the others. To get the header, I did a gpg --list-keys. Thanks! Cheers, Chris On 3/27/07 8:14 AM, Chris Mattmann [EMAIL PROTECTED] wrote: Hi Sami, A very limited acid test shows that I can do crawling and searching through web app so that part is ok. Great

Re: [VOTE] Release Apache Nutch 0.9

2007-03-27 Thread Chris Mattmann
Hey Sami, Well the sum itself is obviously the same :) The point in this is to use same conventions in Lucene family, not strictly required, but still IMO it just looks better. Okey dok -- I will run the md5sum command, and generate a .md5 for the nutch release that matches that. I will

Initiation of 0.9 release process

2007-03-26 Thread Chris Mattmann
Hi Folks, As your friendly neighborhood 0.9 release manager, I just wanted to give you all a heads up that I'd like to begin the release process today. If I hear no objections by 00:00:00 UTC time, I will begin the release process then. I will notify the list as soon as I'm done. Thanks!

Re: Initiation of 0.9 release process

2007-03-26 Thread Chris Mattmann
smoothly, I can probably get it done on my own. Thanks for the offer: I'll be sure to call on you if I get stuck. :-) Cheers, Chris On 3/26/07 10:06 AM, Dennis Kubes [EMAIL PROTECTED] wrote: Let me know if I can help in any way? Dennis Kubes Chris Mattmann wrote: Hi Folks

Nutch 0 .9 release progress update

2007-03-26 Thread Chris Mattmann
Hi Folks, Just to update everyone on progress. I've made it to Step 13 (waiting for release to appear on mirrors) in the Release Process: http://wiki.apache.org/nutch/Release_HOWTO You can view a full log of the fun that I've been having by going to:

Re: Nutch 0 .9 release progress update

2007-03-26 Thread Chris Mattmann
!) Thanks! Cheers, Chris On 3/26/07 10:22 PM, Sami Siren [EMAIL PROTECTED] wrote: Chris Mattmann wrote: Hi Folks, Just to update everyone on progress. I've made it to Step 13 (waiting for release to appear on mirrors) in the Release Process: Chris, thanks for your work so far

[VOTE] Release Apache Nutch 0.9

2007-03-26 Thread Chris Mattmann
Hi Folks, I have posted a candidate for the Apache Nutch 0.9 release at http://people.apache.org/~mattmann/nutch_0.9/ See the included CHANGES-0.9.txt file for details on release contents and latest changes. The release was made from the 0.9-dev trunk. Please vote on releasing these packages

Re: svn commit: r516759 - /lucene/nutch/trunk/CHANGES.txt

2007-03-10 Thread Chris Mattmann
Hi Dennis, Not to nit-pick, but the place where you inserted your change isn't at the end (where they typically should be placed). You inserted in the middle of the file, throwing off the numbering (there are now 2 sets of 18, and 19 in the unreleased changes section). Could you please append

Re: svn commit: r516759 - /lucene/nutch/trunk/CHANGES.txt

2007-03-10 Thread Chris Mattmann
Dennis, No probs. Thanks, a lot! Cheers, Chris On 3/10/07 5:35 PM, Dennis Kubes [EMAIL PROTECTED] wrote: Chris Mattmann wrote: Hi Dennis, Not to nit-pick, but the place where you inserted your change isn't at the end (where they typically should be placed). You inserted

Re: [jira] Commented: (NUTCH-384) Protocol-file plugin does not allow the parse plugins framework to operate properly

2007-03-08 Thread Chris Mattmann
Hi Andrzej, Yep, +1. I also want to make a small update, where instead of creating a new NutchConf object, to just pass it through (maybe via the protocol layer?). Does this make sense? Cheers, Chris On 3/8/07 1:47 PM, Andrzej Bialecki (JIRA) [EMAIL PROTECTED] wrote: [

Re: [jira] Commented: (NUTCH-384) Protocol-file plugin does not allow the parse plugins framework to operate properly

2007-03-08 Thread Chris Mattmann
On 3/8/07 1:55 PM, Andrzej Bialecki [EMAIL PROTECTED] wrote: Chris Mattmann wrote: Hi Andrzej, Yep, +1. I also want to make a small update, where instead of creating a new NutchConf object, to just pass it through (maybe via the protocol layer?). Does this make sense? I'm not sure

0.9 release

2007-03-07 Thread Chris Mattmann
Hi Folks, As suggested by Sami, I'm moving this discussion to the nutch-dev list. Seems like I am the guy that is going to do the Nutch 0.9 release :-) However, it seems also that there are some issues that need to be sorted out first. I'd like to follow up to Andrzej's email about loose ends

Re: Issues pending before 0.9 release

2007-03-05 Thread Chris Mattmann
Hi Guys, Blocker * NUTCH-400 (Update add missing license headers) - I believe this is fixed and should be closed +1, thanks to Sami for closing it. * NUTCH-353 (pages that serverside forwards will be refetched every time) - this was partially fixed in NUTCH-273, but a more

Re: Welcome Dennis Kubes as Nutch committer

2007-02-28 Thread Chris Mattmann
Dennis, I take my coffee black: with a single creamer ;) Okay, okay, sorry: I thought we were talking about *real* hazing ;) Cheers, Chris On 2/28/07 12:31 PM, Dennis Kubes [EMAIL PROTECTED] wrote: Hi All, Thank you Andrzej for your kind words. I am looking forward to working

Re: log guards

2007-02-13 Thread Chris Mattmann
Hi Doug, and Jerome, Ah, yes, the log guard conversation. I remember this from a while back. Hmmm, do you guys know what issue that this recorded as in JIRA? I have some free time recently, so I will be able to add this to my list of Nutch stuff to work on, and would be happy to take the lead

Re: RSS-fecter and index individul-how can i realize this function

2007-02-08 Thread Chris Mattmann
, and contacting the folks who've begun work on this issue. Thanks! Cheers, Chris On 2/7/07 1:31 PM, Doug Cutting [EMAIL PROTECTED] wrote: Chris Mattmann wrote: Got it. So, the logic behind this is, why bother waiting until the following fetch to parse (and create ParseData objects from

Re: RSS-fecter and index individul-how can i realize this function

2007-02-07 Thread Chris Mattmann
Guys, Sorry to be so thick-headed, but could someone explain to me in really simple language what this change is requesting that is different from the current Nutch API? I still don't get it, sorry... Cheers, Chris On 2/7/07 9:58 AM, Doug Cutting [EMAIL PROTECTED] wrote: Renaud Richardet

Re: RSS-fecter and index individul-how can i realize this function

2007-02-06 Thread Chris Mattmann
Hi Doug, Since the target of the link must still be indexed separately from the item itself, how much use is all this? If the RSS document is considered a single page that changes frequently, and item's links are considered ordinary outlinks, isn't much the same effect achieved? IMHO, yes.

Re: RSS-fecter and index individul-how can i realize this function

2007-02-01 Thread Chris Mattmann
and parsing it in the next fetch phase. Maybe adding a new flag to CrawlDatum, that would flag the URL as parsable not fetchable? Just my two cents... Gal. -Original Message- From: Chris Mattmann [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 31, 2007 8:44 AM To: nutch-dev

Re: RSS-fecter and index individul-how can i realize this function

2007-01-30 Thread Chris Mattmann
Hi there, I could most likely be of assistance, if you gave me some more information. For instance: I'm wondering if the use case you describe below is already supported by the current RSS parse plugin? The current RSS parser, parse-rss, does in fact index individual items that are pointed to

Re: RSS-fecter and index individul-how can i realize this function

2007-01-30 Thread Chris Mattmann
you mention asynchronous above, are you talking about the protocol for fetching the different RSS documents? Thanks! Cheers, Chris Thanks -Original Message- From: Chris Mattmann [EMAIL PROTECTED] Date: Tue, 30 Jan 2007 18:16:44 To:nutch-dev@lucene.apache.org Subject: Re

Re: RSS-fecter and index individul-how can i realize this function

2007-01-30 Thread Chris Mattmann
://lucene.apache.org/nutch/link categorynews /category authorkauu/author On 1/31/07, Chris Mattmann [EMAIL PROTECTED] wrote: Hi there, I could most likely be of assistance, if you gave me some more information. For instance: I'm wondering if the use case you describe below

Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2007-01-25 Thread Chris Mattmann
Hi Doug, So, does this render the patch that I wrote obsolete? Cheers, Chris On 1/25/07 10:08 AM, Doug Cutting [EMAIL PROTECTED] wrote: Scott Ganyo (JIRA) wrote: ... since Hadoop hijacks and reassigns all log formatters (also a bad practice!) in the org.apache.hadoop.util.LogFormatter

Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2007-01-25 Thread Chris Mattmann
It's at least out-of-date and perhaps obsolete. A quick read of Fetcher.java looks like there might be a case where a fatal error is logged but the fetcher doesn't exit, in FetcherThread#output(). So this raises an interesting question: People (such as Scott G.) out there -- are you folks

Re: Reviving Nutch 0.7

2007-01-22 Thread Chris Mattmann
Before doubling (or after 0.9.0 tripling?) the maintenance/development work please consider the following: One option would be re factoring the code in a way that the parts that are usable to other projects like protocols?, parsers (this actually was proposed by Jukka Zitting some time

Re: How to Become a Nutch Developer

2007-01-21 Thread Chris Mattmann
Hi Dennis, On 1/21/07 11:47 AM, Dennis Kubes [EMAIL PROTECTED] wrote: All, I am working on a How to Become a Nutch Developer document for the wiki and I need some input. I need an overview of how the process for JIRA works? If I am a developer new to Nutch and just starting to look at

Re: Next Nutch release

2007-01-16 Thread Chris Mattmann
Folks, When would you like to make the release? I've been working on NUTCH-185, but got a bit bogged down with other work. If there is interest in having NUTCH-185 included in the release, I could make a push to get out a patch by week's end... As for the rest, my +1 for NUTCH-61 being

Re: svn commit: r485076 - in /lucene/nutch/trunk/src: java/org/apache/nutch/metadata/SpellCheckedMetadata.java test/org/apache/nutch/metadata/TestSpellCheckedMetadata.java

2006-12-09 Thread Chris Mattmann
Hi Sami, On 12/9/06 2:27 PM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Author: siren Date: Sat Dec 9 14:27:07 2006 New Revision: 485076 URL: http://svn.apache.org/viewvc?view=revrev=485076 Log: Optimize SpellCheckedMetadata further by taking into account the fact that it is used only

Re: svn commit: r485076 - in /lucene/nutch/trunk/src: java/org/apache/nutch/metadata/SpellCheckedMetadata.java test/org/apache/nutch/metadata/TestSpellCheckedMetadata.java

2006-12-09 Thread Chris Mattmann
in org.apache.nutch.metadata that aggreates all the met key fields from HttpHeaders, and it would be the place that the met key fields for FileHeaders, etc. could go into. Let me know what you think, and thanks! Cheers, Chris On 12/9/06 3:53 PM, Sami Siren [EMAIL PROTECTED] wrote: Chris Mattmann wrote: Hi Sami

Re: [jira] Updated: (NUTCH-379) ParseUtil does not pass through the content's URL to the ParserFactory

2006-10-14 Thread Chris Mattmann
Hi Guys, Can we disable the selection of released versions within JIRA for issues so that people like me don't continue to get confused? Thanks! Cheers, Chris On 10/13/06 9:32 AM, Sami Siren (JIRA) [EMAIL PROTECTED] wrote: [ http://issues.apache.org/jira/browse/NUTCH-379?page=all ]

Nutch requires JDK 1.5 now?

2006-10-03 Thread Chris Mattmann
Hi Folks, I noticed that Nutch now requires JDK 5 in order to compile, due to recent changes to the PluginRepository and some other classes. I think that this is a good move, however, I wasn't sure that I had seen any official announcement that Nutch now requires 1.5... Cheers, Chris

Re: Nutch requires JDK 1.5 now?

2006-10-03 Thread Chris Mattmann
The switch to 1.5 format was also logged on jira issue http://issues.apache.org/jira/browse/NUTCH-360 -- Sami Siren Ahh, I didn't see this. Way to go Sami, I love it when people actually keep records of changes! ;) Cheers, Chris __ Chris A.

Re: Nutch requires JDK 1.5 now?

2006-10-03 Thread Chris Mattmann
the email address for JIRA to not use the Apache incubator one anymore, and to use to Lucene one. Sound good? If so, could someone with permissions please take care of it? :-) Cheers, Chris On 10/3/06 9:04 AM, Sami Siren [EMAIL PROTECTED] wrote: Andrzej Bialecki wrote: Chris Mattmann wrote

Re: Patch Available status?

2006-08-31 Thread Chris Mattmann
Hi Doug, But the nutch-developers Jira group pretty closely corresponds to Nutch's committers, so perhaps all committers should be permitted to close, although this should be exercised with caution, only at releases, since closes cannot be undone in this workflow. Another alternative

Re: Patch Available status?

2006-08-30 Thread Chris Mattmann
Hi Doug and Andrzej, +1. I think that workflow makes a lot of sense. Currently users in the nutch-developers group can close and resolve issues. In the Hadoop workflow, would this continue to be the case? Cheers, Chris On 8/30/06 3:14 PM, Andrzej Bialecki [EMAIL PROTECTED] wrote: Doug

Re: 0.8 not loading plugins

2006-08-17 Thread Chris Mattmann
Hi Chris, It seems from your email message that your plugin is located in $NUTCH_HOME/build/custom-meta? Is this where your plugin * code * is currently stored? If so, this is the wrong location and the most likely reason that your plugin isn't being loaded. Plugin code should live in

Re: Tika update

2006-08-16 Thread Chris Mattmann
Hi Jukka, Thanks for your email. Indeed, there was discussion on the Lucene PMC email list, about the Tika project. It was decided by the powers that be to discuss it more on the Nutch mailing list before moving forward with any vote on making Tika a sub-project of Apache Lucene. With regards to

Re: Any plans to move to build Nutchusing Maven?

2006-08-16 Thread Chris Mattmann
Hi Steven, On 8/16/06 7:36 AM, steven shingler [EMAIL PROTECTED] wrote: (This thread moved from the User List.) OK Lukas, lets open it up to the dev list! :) Particularly, does the group feel moving to Maven would be _a good thing_ ? +1 I suggested this (however did not make any

Patch Available status?

2006-08-15 Thread Chris Mattmann
Hi Guys, I've seen on the Hadoop mailing list recently that there was a new status added for issues in JIRA called Patch Available to let committers know that a patch is ready for review to commit. How about we add this to the Nutch jira instance as well? I tried doing this, but I don't think I

Re: parse-plugins.xml

2006-08-03 Thread Chris Mattmann
Hi Marko, Thanks for your question. Basically it was set up as a sort of last result of getting at least * some * information from the PDF file, albeit littered with garbage. If indeed the parse-text does not really make sense in terms of a backup parser to handle PDF files and get at least

Re: parse-plugins.xml

2006-08-03 Thread Chris Mattmann
Hey Andrzej, On 8/3/06 8:19 AM, Andrzej Bialecki [EMAIL PROTECTED] wrote: Chris Mattmann wrote: Hi Marko, Thanks for your question. Basically it was set up as a sort of last result of getting at least * some * information from the PDF file, albeit littered with garbage. If indeed

Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-05 Thread Chris Mattmann
Folks, Before I (or someone else) reopens the issue, I think it's important to understand the implications: 1) Having a *side-effect* of the entire system stop processing after merely logging a message at a certain event level is a poor practice. I'm not sure that the Fetcher quitting is a *

Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-05 Thread Chris Mattmann
Hi Andrzej, The main problem, as Scott observed, is that the static flag affects all instances of the task executing inside the same JVM. If there are several Fetcher tasks (or any other tasks that check for SEVERE flag!), belonging to different jobs, all of them will quit. This is

Re: Nutch Parser Bug

2006-04-25 Thread Chris Mattmann
Hi Alex, I also noticed this issue a while back. It's described here: http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200510.mbox/%3c435 [EMAIL PROTECTED] Cheers, Chris On 4/25/06 2:41 PM, Alex [EMAIL PROTECTED] wrote: Hi there, I'm fairly new to nutch and in working on the

0.8 release?

2006-04-12 Thread Chris Mattmann
Hi Guys, Any progress on the 0.8 release? Was there any resolution about which JIRA issues to complete before the 0.8 release? We had a bit of conversation there and some ideas, but no definitive answer... Thanks for your help, and sorry to pester ;) Cheers, Chris

Re: 0.8 release schedule (was Re: latest build throws error - critical)

2006-04-07 Thread Chris Mattmann
+1 On 4/7/06 10:20 AM, Doug Cutting [EMAIL PROTECTED] wrote: Chris Mattmann wrote: +1 for a release sooner rather than later. I think this is a good plan. There's no reason we can't do another release in a month. If it is back-compatbible we can call it 0.8.x and if it's incompatible

Re: 0.8 release schedule (was Re: latest build throws error - critical)

2006-04-07 Thread Chris Mattmann
Hi Andrzej, On 4/7/06 12:18 PM, Andrzej Bialecki [EMAIL PROTECTED] wrote: Do you guys have any additional insights / suggestions whether NUTCH-240 and/or NUTCH-61 should be included in this release? Looking at the JIRA popular issues pane for Nutch (

Re: 0.8 release schedule (was Re: latest build throws error - critical)

2006-04-06 Thread Chris Mattmann
+1 for a release sooner rather than later. Several interesting features contributed since the 0.7 branch I believe are now tested and production-worthy, at least in my environment. Hats off to the folks who were able to split the MapReduce and NDFS into Hadoop -- I'm going to be experimenting with

Null Pointer exception in AnalyzerFactory?

2006-03-13 Thread Chris Mattmann
Hi Folks, I updated to the latest SVN revision (385691) today, and I am now seeing a Null Pointer exception in the AnalyzerFactory.java class. It seems that in some cases, the method: private Extension getExtension(String lang) { Extension extension = (Extension)

RE: found resource parse-plugins.xm?

2006-03-06 Thread Chris Mattmann
Hi Stefan, after a short time I already had 1602 time this lines in my tasktracker log files. 060307 022707 task_m_2bu9o4 found resource parse-plugins.xml at file:/home/joa/nutch/conf/parse-plugins.xml Sounds like this file is loaded 1602 (after lets say 3 minutes) I guess that wasn't

RE: found resource parse-plugins.xm?

2006-03-06 Thread Chris Mattmann
RuntimeException(x point + Parser.X_POINT_ID + not found.); Cheers, Chris Cheers, Stefan Am 07.03.2006 um 04:38 schrieb Chris Mattmann: Hi Stefan, after a short time I already had 1602 time this lines in my tasktracker log files. 060307 022707 task_m_2bu9o4 found resource parse

RE: found resource parse-plugins.xm?

2006-03-06 Thread Chris Mattmann
) { throw new RuntimeException(x point + Parser.X_POINT_ID + not found.); -Original Message- From: Chris Mattmann [mailto:[EMAIL PROTECTED] Sent: Monday, March 06, 2006 7:51 PM To: 'nutch-dev@lucene.apache.org' Subject: RE: found resource parse-plugins.xm? Hi Stefan, Hi Chris

Re: ignore eclipse .project and .classpath

2006-02-09 Thread Chris Mattmann
and .classpath +1 Am 08.02.2006 um 06:16 schrieb Chris Mattmann: Hi Folks, Just wondering if someone could add to the svn:ignore property for Nutch the files: .classpath .project I happen to use eclipse to do Nutch development and always ignore these files in my other

ignore eclipse .project and .classpath

2006-02-07 Thread Chris Mattmann
Hi Folks, Just wondering if someone could add to the svn:ignore property for Nutch the files: .classpath .project I happen to use eclipse to do Nutch development and always ignore these files in my other eclipse projects as well. Cheers, Chris

RE: [jira] Updated: (NUTCH-179) Proposition: Enable Nutch to use a parser plugin not just based on content type

2006-01-15 Thread Chris Mattmann
Hi Gail, Check out: http://wiki.apache.org/nutch/ParserFactoryImprovementProposal/ That's the way that the parser factory currently works. Also added, but not described in that proposal is the ability to call a parser by its id, which is a method present in ParseUtil.java. G'luck! Cheers,

RE: [jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-05 Thread Chris Mattmann
Guys, My apologies for the spamming comments -- I tried to submit my comment through JIRA one time and it kept giving me service unavailable. So I resubmitted like 5 times, on the fifth time it finally went through -- but I guess the other comments went through too. I'll try and remove them

Standard metadata property names in the ParseData metadata

2005-12-13 Thread Chris Mattmann
Hi Folks, I was just thinking about the ParseData java.util.Properties metaata object and thinking about the way that we store names in there. Currently, people are free to name their string-based properties anything that they want, such as having names of Content-type, content-TyPe,

Re: Standard metadata property names in the ParseData metadata

2005-12-13 Thread Chris Mattmann
insensitive? Stefan Am 13.12.2005 um 18:07 schrieb Chris Mattmann: Hi Folks, I was just thinking about the ParseData java.util.Properties metaata object and thinking about the way that we store names in there. Currently, people are free to name their string-based properties anything

Idea about aliases in the parse-plugins.xml file

2005-12-13 Thread Chris Mattmann
Hi Folks, Jerome and I have been talking about an idea to address the current issue raised by Stefan G. about having a mapping of mimeType-list of pluginIds rather than mimeType-list of extensionIds in the parse-plugins.xml file. We've come up with the following proposed update that would

Re: Standard metadata property names in the ParseData metadata

2005-12-13 Thread Chris Mattmann
Hi Guys, Okay, that makes sense then. I will create an issue in JIRA later today describing the update, and then begin working on this over the next few days. Thanks for your responses and reviews. Cheers, Chris On 12/13/05 12:45 PM, Jérôme Charron [EMAIL PROTECTED] wrote: I agree, too.

NUTCH-112: Link in cached.jsp page to cached content is an absolute link

2005-12-06 Thread Chris Mattmann
Hi Guys, Just wondering if any of the committers checked out http://issues.apache.org/jira/browse/NUTCH-112. Turns out the link to the cached.jsp page to the cached content contains an absolute link which makes the link mess up when you don't deploy the nutch webapp in the root context. I've

Re: Urlfilter Patch

2005-12-01 Thread Chris Mattmann
Jerome, I think that this is a great idea and ensures that there isn't replication of so-called management information across the system. It could be easily implemented as a utility method because we have utility java classes that represent the ParsePluginList, that you could get the mimeTypes

RE: Urlfilter Patch

2005-12-01 Thread Chris Mattmann
Hi Doug, Chris Mattmann wrote: In principle, the mimeType system should give us some guidance on determining the appropriate mimeType for the content, regardless of whether it ends in .foo, .bar or the like. Right, but the URL filters run long before we know the mime type

RE: [proposal] Generic Markup Language Parser

2005-11-24 Thread Chris Mattmann
generic forms of XML markup content. Cheers, Chris Mattmann Am 24.11.2005 um 00:01 schrieb Jérôme Charron: Hi, We (Chris Mattmann, François Martelet, Sébastien Le Callonnec and me) just add a new proposal on the nutch Wiki: http://wiki.apache.org/nutch

RE: [proposal] Generic Markup Language Parser

2005-11-24 Thread Chris Mattmann
Hi Stefan, and Jerome, A mail archive is a amazing source of information, isn't it?! :-) To answer your question, just ask your self how many pages per second your plan to fetch and parse and how much queries per second a lucene index is able to handle - and you can deliver in the ui. I

Re: developing a parse-/index-/query- plugin set

2005-10-17 Thread Chris Mattmann
Hi Doug, On 10/17/05 11:38 AM, Doug Cutting [EMAIL PROTECTED] wrote: Chris Mattmann wrote: So, one thing it seems is that fields to be indexed, and used in a field query must be fully lowercase to work? Additionally, it seems that they can't have symbols in them, such as _, is that correct

Re: developing a parse-/index-/query- plugin set

2005-10-17 Thread Chris Mattmann
Hi Doug, Thanks, that worked. Cheers, Chris On 10/17/05 11:56 AM, Doug Cutting [EMAIL PROTECTED] wrote: Chris Mattmann wrote: So, my question to you then is, what type of QueryFilter should I develop in order to get my query for contactemail:email address to work as a standalone query

RE: [jira] Updated: (NUTCH-110) OpenSearchServlet outputs illegal xml characters

2005-10-12 Thread Chris Mattmann
Hi, I'm not an XML expert by any means, but wouldn't it be simpler to just wrap any text where illegal chars are possible with a !CDATA[ ]! tag? That way, the offending characters won't be dropped and the process won't be lossy, no? If the CDATA method won't work, and there's no other way

Re: failing of org.apache.nutch.tools.TestSegmentMergeTool?

2005-09-27 Thread Chris Mattmann
You know what the crazy thing is: Seemingly, all tests pass now. And I didn't change a thing. Honest. I swear. Very strange, indeed, but I'm happy because at least the tests are passing! :-) Cheers, Chris On 9/27/05 12:29 PM, Paul Baclace [EMAIL PROTECTED] wrote: Chris Mattmann wrote

failing of org.apache.nutch.tools.TestSegmentMergeTool?

2005-09-26 Thread Chris Mattmann
Hi there, I just noticed after checking out the latest SVN of Nutch that I am currently failing the TestSegmentMergeTool Junit test when I type ant test for Nutch. Is anyone experiencing the same problem? Here is the relevant information which I captured out of the

Re: [Nutch-cvs] [Nutch Wiki] Update of ParserFactoryImprovementProposal by ChrisMattmann

2005-09-15 Thread Chris Mattmann
Hi Otis, Point taken. In actuality since both convey the same information I think that it's okay to support both, but by default say we could code the initial plugins specified in parse-plugins.xml without the order= attribute. Fair enough? Cheers, Chris On 9/15/05 3:23 PM, [EMAIL

RE: [jira] Commented: (NUTCH-30) rss feed parser

2005-07-30 Thread Chris Mattmann
73005. The patch and source distro are zipped up in the file: parse-rss-73005.zip. Here is a direct link: http://issues.apache.org/jira/secure/attachment/12311475/parse-rss-73005.zip Thanks! Cheers, Chris Mattmann __ Chris A. Mattmann [EMAIL PROTECTED