[jira] Updated: (NUTCH-673) Upgrade the Carrot2 plug-in to release 3.0

2008-12-15 Thread Sean Dean (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Dean updated NUTCH-673: Priority: Minor (was: Major) Priority has been changed to "minor". > Upgrade the Carro

[jira] Created: (NUTCH-673) Upgrade the Carrot2 plug-in to release 3.0

2008-12-15 Thread Sean Dean (JIRA)
Versions: 0.9.0 Environment: All Nutch deployments. Reporter: Sean Dean Fix For: 1.0.0 Release 3.0 of the Carrot2 plug-in was released recently. We currently have version 2.1 in the source tree and upgrading it to the latest version before 1.0-release might make

Re: NUTCH-92

2008-11-25 Thread Sean Dean
to work with trunk (and the future 1.0 release). I would personally like to see NUTCH-92 (or some form of it) included in trunk for a legitimate evaluation before the next release. Sean Dean From: Andrzej Bialecki <[EMAIL PROTECTED]> To: nut

NUTCH-92 - DistributedSearch incorrectly scores results

2008-11-21 Thread Sean Dean
Folks, I was wondering if anyone could shed some light on the status of this issue heading into a potential 1.0 (or 0.x) release over the few months? I realize many upgrades have been made to Hadoop and Lucene, and in addition to that bug fixes in just about every element of the system but does

Re: [VOTE] Release Apache Nutch 0.9

2007-03-27 Thread Sean Dean
+1 for my official non-binding vote :) You might want to correct the word "confiquration" at "1." in CHANGES-0.9.txt, and CHANGES.txt inside the package. Everything else looks great and more importantly, runs! Good work guys. - Original Message From: Chris Mattmann <[EMAIL PROTECTED

Re: Hadoop 0.11.2 vs. 0.12.1

2007-03-11 Thread Sean Dean
It looks like we might want to at least give it a try then, with the worst possible case of Nutch users having to keep speculative execution disabled if it causes grief again. If other problems arise, then we can just revert back to 0.11.2 which seems to be stable in terms of all the Nutch opera

Re: 0.9 release

2007-03-11 Thread Sean Dean
thing in performance as the Java processes didn't lock despite the lowering of total threads. - Original Message ---- From: Sean Dean <[EMAIL PROTECTED]> To: nutch-dev@lucene.apache.org Sent: Wednesday, March 7, 2007 6:52:05 PM Subject: Re: 0.9 release Great, thanks a lot.

Re: 0.9 release

2007-03-07 Thread Sean Dean
With NUTCH-233 the issue is independent of Hadoop and lies with the regex-urlfilter. The last solution posted in JIRA gives you more room to work with, it allowed myself to fetch a segment over 1-2 million but I ran into the same issue when the segment approached 10 million in size. Unless you

Re: 0.9 release

2007-03-07 Thread Sean Dean
. All this testing will be based off revision 515791 in trunk. - Original Message From: Andrzej Bialecki <[EMAIL PROTECTED]> To: nutch-dev@lucene.apache.org Sent: Wednesday, March 7, 2007 5:04:21 PM Subject: Re: 0.9 release Sean Dean wrote: > As it stands now with whats in tr

Re: 0.9 release

2007-03-07 Thread Sean Dean
As it stands now with whats in trunk under 0.9-dev, one of the biggest problems is the version of Hadoop we have included. It fails on anything above 200k URLs, and should be considered a "blocker" issue. Its my understanding that Andrzej has a newer Hadoop JAR with some custom patches applied

Re: Issues pending before 0.9 release

2007-03-04 Thread Sean Dean
sion hang reduce process for ever) - I > propose to apply the fix provided by Sean Dean and close this issue for > now. yes that was the resolution also last time :) > * NUTCH-427 (protocol-smb). This relies on a LGPL library, and it's > certainly not critical (as this is an opt

Re: NPE while fetching

2007-02-07 Thread Sean Dean
This was corrected in Hadoop as per issue HADOOP-917, but I'm thinking some code in Nutch might have to be changed also. I reported this issue (via mailing list) a while ago and I'm glad it was fixed, but I have been purposely staying with revision 495214 of trunk which seems to provide the best

Issue with trunk (rev 496535)

2007-01-16 Thread Sean Dean
I have had a common error come up now on two seperate fetches, both using the new Hadoop 0.10.1. The first error came up on my regular fetch using my large Nutch DB, but to rule out any problems with that (possibly related to the new fetch statuses) i created a brand new DB using the standard DM

[jira] Commented: (NUTCH-417) After upgrade to hadoop-0.9.1, parsing and indexing doesn't work.

2006-12-16 Thread Sean Dean (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-417?page=comments#action_12459073 ] Sean Dean commented on NUTCH-417: - Speculative execution is now off by default with Hadoop 0.9.2 as per issue HADOOP-827. Since there was only two other fixes with

[jira] Commented: (NUTCH-224) Nutch doesn't handle Korean text at all

2006-12-01 Thread Sean Dean (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-224?page=comments#action_12455065 ] Sean Dean commented on NUTCH-224: - Just a note on my comment above, it seems JIRA cant display (or wont display) Korean text after I accept the comment. If your

[jira] Commented: (NUTCH-224) Nutch doesn't handle Korean text at all

2006-12-01 Thread Sean Dean (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-224?page=comments#action_12455064 ] Sean Dean commented on NUTCH-224: - I just tested this today using 0.9-dev and it seems the changes made back in 0.7.2 to Lucene didnt fix the issue. At some point

[jira] Commented: (NUTCH-233) wrong regular expression hang reduce process for ever

2006-11-28 Thread Sean Dean (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-233?page=comments#action_12453919 ] Sean Dean commented on NUTCH-233: - Could I suggest that this change, from ".*(/.+?)/.*?\1/.*?\1/" to ".*(/[^/]+)/[^/]+\1/[^/]+\1/" be committ

[jira] Commented: (NUTCH-224) Nutch doesn't handle Korean text at all

2006-06-13 Thread Sean Dean (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-224?page=comments#action_12416108 ] Sean Dean commented on NUTCH-224: - Im still using 0.7.1 and also see this problem. In the Nutch 0.7.2 release they upgraded to Lucene 1.9.1, which included the above fixes