Re: 1.0 Release?

2008-11-23 Thread Doğacan Güney
I agree with this list and have nothing new to add.

(Except, I guess people also want NUTCH-92 to be fixed)

On Thu, Nov 20, 2008 at 6:51 PM, Andrzej Bialecki [EMAIL PROTECTED] wrote:
 Dennis Kubes wrote:

 What does everybody think of trying to do a Nutch 1.0 release in the next
 couple of weeks.  I have 8 different patches that are ready to be committed
 including:

 1) NUTCH-647: Resolve URLs tool
 2) NUTCH-635: LinkAnalysis Tool for Nutch
 3) NUTCH-646: New Indexing framework for Nutch
 4) NUTCH-594: Serve Nutch search results in XML and JSON
 5) Custom fields on index and plugins
 6) Upgrade Nutch to the most recent Hadoop version (18.2).
 7) Upgrade Nutch to the most recent Lucene version (2.4).
 8) Analysis plugins and improvments to analyzer factory for multiple
 languages per analysis plugin.  Language identifier.

 I am going to try to get those posted in the next couple of days and
 committed in the next week.  Are there other major improvements we want to
 put in before trying to do a 1.0 release for Nutch?  Thoughts and
 suggestions?

 A few recently opened ones that should be easy to fix:

 NUTCH-661errors when the uri contains space characters
 NUTCH-657Estonian N-gram profile has wrong name
 NUTCH-652AdaptiveFetchSchedule#setFetchSchedule doesn't calculate
 fetch interval correctly
 NUTCH-644RTF parser doesn't compile anymore
 NUTCH-643ClassCastException in PdfParser on encrypted PDF with empty
 password
 NUTCH-636Http client plug-in https doesn't work on IBM JRE
 NUTCH-631MoreIndexingFilter fails with NoSuchElementException
 NUTCH-626fetcher2 breaks out the domain with
 db.ignore.external.links set at cross domain redirects
 NUTCH-566Sun's URL class has bug in creation of relative query URLs
 NUTCH-542Null Pointer Exception on getSummary when segment no longer
 exists
 NUTCH-531Pages with no ContentType cause a Null Pointer exception

 And of course this one:

 NUTCH-442Integrate Solr/Nutch


 We should also review all other open issues marked as Blocker / Major,
 especially those with patches, and take some action - either fix them, or
 won't fix 'em, or postpone to the next release (the single Blocker issue
 should be fixed).


 --
 Best regards,
 Andrzej Bialecki 
  ___. ___ ___ ___ _ _   __
 [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
 ___|||__||  \|  ||  |  Embedded Unix, System Integration
 http://www.sigram.com  Contact: info at sigram dot com





-- 
Doğacan Güney


1.0 Release?

2008-11-20 Thread Dennis Kubes
What does everybody think of trying to do a Nutch 1.0 release in the 
next couple of weeks.  I have 8 different patches that are ready to be 
committed including:


1) NUTCH-647: Resolve URLs tool
2) NUTCH-635: LinkAnalysis Tool for Nutch
3) NUTCH-646: New Indexing framework for Nutch
4) NUTCH-594: Serve Nutch search results in XML and JSON
5) Custom fields on index and plugins
6) Upgrade Nutch to the most recent Hadoop version (18.2).
7) Upgrade Nutch to the most recent Lucene version (2.4).
8) Analysis plugins and improvments to analyzer factory for multiple 
languages per analysis plugin.  Language identifier.


I am going to try to get those posted in the next couple of days and 
committed in the next week.  Are there other major improvements we want 
to put in before trying to do a 1.0 release for Nutch?  Thoughts and 
suggestions?


Dennis


Re: 1.0 Release?

2008-11-20 Thread Andrzej Bialecki

Dennis Kubes wrote:
What does everybody think of trying to do a Nutch 1.0 release in the 
next couple of weeks.  I have 8 different patches that are ready to be 
committed including:


1) NUTCH-647: Resolve URLs tool
2) NUTCH-635: LinkAnalysis Tool for Nutch
3) NUTCH-646: New Indexing framework for Nutch
4) NUTCH-594: Serve Nutch search results in XML and JSON
5) Custom fields on index and plugins
6) Upgrade Nutch to the most recent Hadoop version (18.2).
7) Upgrade Nutch to the most recent Lucene version (2.4).
8) Analysis plugins and improvments to analyzer factory for multiple 
languages per analysis plugin.  Language identifier.


I am going to try to get those posted in the next couple of days and 
committed in the next week.  Are there other major improvements we want 
to put in before trying to do a 1.0 release for Nutch?  Thoughts and 
suggestions?


A few recently opened ones that should be easy to fix:

NUTCH-661errors when the uri contains space characters
NUTCH-657Estonian N-gram profile has wrong name
NUTCH-652   	 AdaptiveFetchSchedule#setFetchSchedule doesn't calculate 
fetch interval correctly

NUTCH-644RTF parser doesn't compile anymore
NUTCH-643   	 ClassCastException in PdfParser on encrypted PDF with 
empty password

NUTCH-636Http client plug-in https doesn't work on IBM JRE
NUTCH-631MoreIndexingFilter fails with NoSuchElementException
NUTCH-626   	 fetcher2 breaks out the domain with 
db.ignore.external.links set at cross domain redirects

NUTCH-566Sun's URL class has bug in creation of relative query URLs
NUTCH-542   	 Null Pointer Exception on getSummary when segment no 
longer exists

NUTCH-531Pages with no ContentType cause a Null Pointer exception

And of course this one:

NUTCH-442Integrate Solr/Nutch


We should also review all other open issues marked as Blocker / Major, 
especially those with patches, and take some action - either fix them, 
or won't fix 'em, or postpone to the next release (the single Blocker 
issue should be fixed).



--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: 1.0 Release?

2008-11-20 Thread Marc Boucher
Thank you Dennis for this work. I will have a look and provide  
feedback as soon as possible.


Marc

On 20-Nov-08, at 6:54 AM, Dennis Kubes wrote:

What does everybody think of trying to do a Nutch 1.0 release in the  
next couple of weeks.  I have 8 different patches that are ready to  
be committed including:


1) NUTCH-647: Resolve URLs tool
2) NUTCH-635: LinkAnalysis Tool for Nutch
3) NUTCH-646: New Indexing framework for Nutch
4) NUTCH-594: Serve Nutch search results in XML and JSON
5) Custom fields on index and plugins
6) Upgrade Nutch to the most recent Hadoop version (18.2).
7) Upgrade Nutch to the most recent Lucene version (2.4).
8) Analysis plugins and improvments to analyzer factory for multiple  
languages per analysis plugin.  Language identifier.


I am going to try to get those posted in the next couple of days and  
committed in the next week.  Are there other major improvements we  
want to put in before trying to do a 1.0 release for Nutch?   
Thoughts and suggestions?


Dennis