Re: Issues pending before 0.9 release

2007-03-05 Thread Sami Siren




P.S. I am going to contact Pitor and coordinate with him: I'd like to be
the
release manager for this Nutch release.




It would be more beneficial to everybody if the discussions (related to
release or Nutch) is
done on public (hey this is open source!). The off the list stuff IMO
smells.

--
Sami Siren


Re: Issues pending before 0.9 release

2007-03-05 Thread Andrzej Bialecki

Chris Mattmann wrote:

P.S. I am going to contact Pitor and coordinate with him: I'd like to be the
release manager for this Nutch release.
  


Everyone heard that? :) That's cool, thanks!

--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




Re: Issues pending before 0.9 release

2007-03-05 Thread Dennis Kubes



Chris Mattmann wrote:

Hi Guys,


Blocker

* NUTCH-400 (Update & add missing license headers) - I believe this is
fixed and should be closed


+1, thanks to Sami for closing it.


* NUTCH-353 (pages that serverside forwards will be refetched every
time) - this was partially fixed in NUTCH-273, but a more complete
solution would require significant changes to LinkDb. As there are no
patches implementing this, I left it open, but it's no longer as
critical as it was before. I propose to move it to "Major" and address
it in the next release.


+1


* NUTCH-233 (wrong regular expression hang reduce process for ever) - I
propose to apply the fix provided by Sean Dean and close this issue for now.


+1


Critical

* NUTCH-436 (Incorrect handling of relative paths when the embedded URL
path is empty). There is no patch available yet. If someone could
contribute a patch I'd like to see this fixed before the release.


Looks like Dennis is on this one


* NUTCH-427 (protocol-smb). This relies on a LGPL library, and it's
certainly not critical (as this is an optional new feature). I propose
to change it to Major, and make a decision - do we want another plugin
like parse-mp3 or parse-rtf, or not.


Let's hold off on this: it's not necessary for 0.9, and I don't think
there's been a bunch of traffic on the list identifying this as critical to
get into the sources for the release


* NUTCH-381 (Ignore external link not work as expected) - I'll try to
reproduce it, and if I find an easy fix I'd like to apply it before the
release.


+1


* NUTCH-277 (Fetcher dies because of "max. redirects") - I wasn't able
to reproduce it. If there is no updated information on this I propose to
close it with "Can't reproduce".


+1, I had to do something similar with NUTCH-258


* NUTCH-167 (Observation of ) -
there's a patch which I tested in a limited production env. If there are
no objections I'd like to apply it before the release.


+1


Major
=
There are 84 major issues, but some of them are either invalid, or
should be "minor", or no longer apply and should be closed. Please
review them if you can and provide some comments or recommendations if
you think you have some new information.


I will spend some time going through JIRA today and see if there's any
issues that I can find that:

1. Have a patch already
2. Sound like something quick, easy, and not so far-reaching across the
entire Nutch API



One decision also that we need to make is which version of Hadoop should
be included in the release. Current trunk uses 0.10.1, I have a set of
production-tested patches that use 0.11.2, and today the Hadoop team
released 0.12.0 (to be followed shortly by a 0.12.1, most likely in time
before our release). The most conservative option is to stay with
0.10.1, but by the time people start using Nutch this will be a fairly
old version already. I propose to upgrade to 0.11.2. We could use 0.12.1
- but in this case with the expectation that we release less than stable
version of Nutch to be soon followed by a minor stable release ...


I'd agree with the upgrade to 0.11.2, +1


Cheers,
  Chris

P.S. I am going to contact Pitor and coordinate with him: I'd like to be the
release manager for this Nutch release.


I would like to help with this as well, even if it is just watching how 
the process works this time.


Dennis






Re: Issues pending before 0.9 release

2007-03-05 Thread Chris Mattmann
Hi Guys,

> Blocker
> 
> * NUTCH-400 (Update & add missing license headers) - I believe this is
> fixed and should be closed

+1, thanks to Sami for closing it.

> 
> * NUTCH-353 (pages that serverside forwards will be refetched every
> time) - this was partially fixed in NUTCH-273, but a more complete
> solution would require significant changes to LinkDb. As there are no
> patches implementing this, I left it open, but it's no longer as
> critical as it was before. I propose to move it to "Major" and address
> it in the next release.

+1

> 
> * NUTCH-233 (wrong regular expression hang reduce process for ever) - I
> propose to apply the fix provided by Sean Dean and close this issue for now.

+1

> 
> Critical
> 
> * NUTCH-436 (Incorrect handling of relative paths when the embedded URL
> path is empty). There is no patch available yet. If someone could
> contribute a patch I'd like to see this fixed before the release.

Looks like Dennis is on this one

> 
> * NUTCH-427 (protocol-smb). This relies on a LGPL library, and it's
> certainly not critical (as this is an optional new feature). I propose
> to change it to Major, and make a decision - do we want another plugin
> like parse-mp3 or parse-rtf, or not.

Let's hold off on this: it's not necessary for 0.9, and I don't think
there's been a bunch of traffic on the list identifying this as critical to
get into the sources for the release

> 
> * NUTCH-381 (Ignore external link not work as expected) - I'll try to
> reproduce it, and if I find an easy fix I'd like to apply it before the
> release.

+1

> 
> * NUTCH-277 (Fetcher dies because of "max. redirects") - I wasn't able
> to reproduce it. If there is no updated information on this I propose to
> close it with "Can't reproduce".

+1, I had to do something similar with NUTCH-258

> 
> * NUTCH-167 (Observation of ) -
> there's a patch which I tested in a limited production env. If there are
> no objections I'd like to apply it before the release.

+1

> 
> Major
> =
> There are 84 major issues, but some of them are either invalid, or
> should be "minor", or no longer apply and should be closed. Please
> review them if you can and provide some comments or recommendations if
> you think you have some new information.

I will spend some time going through JIRA today and see if there's any
issues that I can find that:

1. Have a patch already
2. Sound like something quick, easy, and not so far-reaching across the
entire Nutch API

> 
> 
> One decision also that we need to make is which version of Hadoop should
> be included in the release. Current trunk uses 0.10.1, I have a set of
> production-tested patches that use 0.11.2, and today the Hadoop team
> released 0.12.0 (to be followed shortly by a 0.12.1, most likely in time
> before our release). The most conservative option is to stay with
> 0.10.1, but by the time people start using Nutch this will be a fairly
> old version already. I propose to upgrade to 0.11.2. We could use 0.12.1
> - but in this case with the expectation that we release less than stable
> version of Nutch to be soon followed by a minor stable release ...

I'd agree with the upgrade to 0.11.2, +1


Cheers,
  Chris

P.S. I am going to contact Pitor and coordinate with him: I'd like to be the
release manager for this Nutch release.





Re: java.io.FileNotFoundException: / (Is a directory)

2007-03-05 Thread Dennis Kubes
That is a hadoop.log.dir problem value not being set.  It is trying to 
use the DRFA appender to a file and can't find the log directory.


Dennis

Gal Nitzan wrote:


Just installed latest from trunk.

I run mergesegs and I get the following error in all tasks log files (I use
default log4j.properties):

log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: / (Is a directory)
at java.io.FileOutputStream.openAppend(Native Method)
at java.io.FileOutputStream.(FileOutputStream.java:177)
at java.io.FileOutputStream.(FileOutputStream.java:102)
at org.apache.log4j.FileAppender.setFile(FileAppender.java:289)
at
org.apache.log4j.FileAppender.activateOptions(FileAppender.java:163)
at
org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAp
pender.java:215)
at
org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:256)
at
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:132
)
at
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:96)
at
org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.jav
a:654)
at
org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.jav
a:612)
at
org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigur
ator.java:509)
at
org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:
415)
at
org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:
441)
at
org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.
java:468)
at org.apache.log4j.LogManager.(LogManager.java:122)
at org.apache.log4j.Logger.getLogger(Logger.java:104)
at
org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:229)
at org.apache.commons.logging.impl.Log4JLogger.(Log4JLogger.java:65)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAcces
sorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstruc
torAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at
org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.ja
va:529)
at
org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.ja
va:235)
at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:370)
at org.apache.hadoop.mapred.TaskTracker.(TaskTracker.java:59)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1346)
log4j:ERROR Either File or DatePattern options are not set for appender
[DRFA].




SSL & Nutch (SecureProtocolSocketFactory)

2007-03-05 Thread Gavino Marras

why DummySSLPtotocolSocketFactory class, in httpclient plugin implements
ProtocolSocketFactory & not SecureProtocolSocketFactory ?
please help me




SSL & Nutch (SecureProtocolSocketFactory)

2007-03-05 Thread g . marras



- Messaggio inoltrato da [EMAIL PROTECTED] -
Data: Mon, 05 Mar 2007 12:02:54 +0100
  Da: [EMAIL PROTECTED]
Rispondi-A:[EMAIL PROTECTED]
 Oggetto: Fwd: SSL & Nutch (SecureProtocolSocketFactory)
   A: nutch-dev@lucene.apache.org


why DummySSLPtotocolSocketFactory class, in httpclient plugin
implements ProtocolSocketFactory & not SecureProtocolSocketFactory ?
please help me





- Fine del messaggio inoltrato -



This message was sent using IMP at ifc.cnr.it