Re: [Fwd: Crawler submits forms?]

2005-12-15 Thread Doug Cutting

Andrzej Bialecki wrote:
Please also don't forget that the trunk/ will soon be invaded by the 
code from mapred, I guess some time around the middle of January (Doug?) 


Thinking about this more, perhaps we should do it sooner.  There's 
already a branch for 0.7.x releases, so what point is there in not 
merging mapred to trunk now?  We'd have fewer branches to maintain, and 
start getting nightly builds of mapred.  Folks who require 0.7.x 
compatibility can continue to use (and patch) the 0.7.x branch.  Objections?


Doug


Re: [Fwd: Crawler submits forms?]

2005-12-15 Thread Piotr Kosiorowski

Doug Cutting wrote:

Andrzej Bialecki wrote:

Please also don't forget that the trunk/ will soon be invaded by the 
code from mapred, I guess some time around the middle of January (Doug?) 



Thinking about this more, perhaps we should do it sooner.  There's 
already a branch for 0.7.x releases, so what point is there in not 
merging mapred to trunk now?  We'd have fewer branches to maintain, and 
start getting nightly builds of mapred.  Folks who require 0.7.x 
compatibility can continue to use (and patch) the 0.7.x branch.  
Objections?


Doug

+1. Looking at the questions on mailing lists I do not think many people 
use trunk now.


Piotr


Re: [Fwd: Crawler submits forms?]

2005-12-15 Thread Doug Cutting

Andrzej Bialecki wrote:
I agree. I just thought that we would prepare the relase based on the 
code in trunk/ , and in that case we would like to wait with the merge 
before we do the release.


My definition of trunk is that it should be where the majority of 
development happens.  It is what we should build nightly, etc.


Major versions should be branched from trunk, and point releases created 
as tags from the version branches.


A development branch (e.g., mapred) should be used when a few developers 
need to make radical changes and do not want to disrupt other developers.


So if most developers are now comfortable working on mapred, then we no 
longer need to keep it in a branch.  And we already have a version 
branch for 0.7, so we don't need to reserve trunk for that.


Does this analysis sound right?

Doug


Re: [Fwd: Crawler submits forms?]

2005-12-15 Thread Andrzej Bialecki

Doug Cutting wrote:


Andrzej Bialecki wrote:

I agree. I just thought that we would prepare the relase based on the 
code in trunk/ , and in that case we would like to wait with the 
merge before we do the release.



My definition of trunk is that it should be where the majority of 
development happens.  It is what we should build nightly, etc.


Major versions should be branched from trunk, and point releases 
created as tags from the version branches.


A development branch (e.g., mapred) should be used when a few 
developers need to make radical changes and do not want to disrupt 
other developers.


So if most developers are now comfortable working on mapred, then we 
no longer need to keep it in a branch.  And we already have a version 
branch for 0.7, so we don't need to reserve trunk for that.


Does this analysis sound right?



Yes, we just need to make sure that all important bits from trunk are on 
the 0.7 branch, before we start.


--
Best regards,
Andrzej Bialecki 
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




Re: [Fwd: Crawler submits forms?]

2005-12-15 Thread Doug Cutting

Andrzej Bialecki wrote:
Yes, we just need to make sure that all important bits from trunk are on 
the 0.7 branch, before we start.


I will sync mapred with the trunk prior to the merge, so we should still 
be able to get anything we need after mapred is merged back to trunk.


BTW, we're pretty closely following the recommendations in:

http://svnbook.red-bean.com/en/1.1/ch04s04.html#svn-ch-4-sect-4.4

The mapred branch is a 'feature' branch.  At the end of this section 
they describe how to merge a feature branch back into the trunk.


Doug


Re: [Fwd: Crawler submits forms?]

2005-12-14 Thread Andrzej Bialecki

Zaheed Haque wrote:


what about the following:

http://issues.apache.org/jira/browse/NUTCH-125
 



On its way ... ;-) I'll add it during this week.

--
Best regards,
Andrzej Bialecki 
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




Re: [Fwd: Crawler submits forms?]

2005-12-14 Thread Jérôme Charron
 What people think if we collect a list of issues and make a voting
 iteration?

+1


Re: [Fwd: Crawler submits forms?]

2005-12-13 Thread Stefan Groschupf
This has been fixed in the mapred branch, but that patch is not in  
0.7.1.  This alone might be a reason to make a 0.7.2 release.


May we can get fixed some more parser selection related issue until  
next days also and get this into a 0.7.2 release.
I would be happy to see some more parser selection problems fixed but  
looks like Jerome is working  hard also to get stuff fixed, may we  
can wait until that.


Stefan 


Re: [Fwd: Crawler submits forms?]

2005-12-13 Thread Jérôme Charron
+1 for a 0.7.2 release.
Here are the issues/revisions I can merge to 0.7 branch.
These changes mainly concern the parser-factory changes (NUTCH-88)

http://issues.apache.org/jira/browse/NUTCH-112
http://issues.apache.org/jira/browse/NUTCH-135
http://svn.apache.org/viewcvs.cgi?rev=356532view=rev
http://svn.apache.org/viewcvs.cgi?rev=355809view=rev
http://svn.apache.org/viewcvs.cgi?rev=354398view=rev
http://svn.apache.org/viewcvs.cgi?rev=326889view=rev
http://svn.apache.org/viewcvs.cgi?rev=321250view=rev
http://svn.apache.org/viewcvs.cgi?rev=321231view=rev
http://svn.apache.org/viewcvs.cgi?rev=306808view=rev
http://svn.apache.org/viewcvs.cgi?rev=293370view=rev
http://svn.apache.org/viewcvs.cgi?rev=292865view=rev
http://svn.apache.org/viewcvs.cgi?rev=292035view=rev

 [EMAIL PROTECTED]
Piotr, what about the italian translation?
0.7.2 could be a good candidate for a commit. no?

 This has been fixed in the mapred branch, but that patch is not in
  0.7.1 .  This alone might be a reason to make a 0.7.2 release.

http://svn.apache.org/viewcvs.cgi?view=revrev=348533

 I would be happy to see some more parser selection problems fixed but
  looks like Jerome is working  hard also to get stuff fixed, may we  can
  wait until that.

I think we can wait for the enhancement proposed by Chris today: Adding an
alias in parse-plugin.xml file and use a content-type/extension-id mapping
instead of content-type/plugin-id.
For further improvements, the new mime-type repository based on freedesktop
mime-type will be needed.
I cannot reasonably include this in 0.7.2, but I think it will be in trunk
by the end of the year.

What reasonable target date can we planned for a 0.7.2 ?

Regards

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/


Re: [Fwd: Crawler submits forms?]

2005-12-13 Thread Andrzej Bialecki

Jérôme Charron wrote:


+1 for a 0.7.2 release.
 



+1.

Things are going well on the mapred branch, all basic tools are almost 
in place, so after this release we will probably start merging... so, 
this looks like the last release of the 0.7.x line (from the code in 
trunk/ - I'm sure there will be maintenance releases afterwards).



I think we can wait for the enhancement proposed by Chris today: Adding an
alias in parse-plugin.xml file and use a content-type/extension-id mapping
instead of content-type/plugin-id.
 



IMHO, this needs to be really well tested before going into a release 
... possibilities for confusion are great.



For further improvements, the new mime-type repository based on freedesktop
mime-type will be needed.
I cannot reasonably include this in 0.7.2, but I think it will be in trunk
by the end of the year.

 



Please also don't forget that the trunk/ will soon be invaded by the 
code from mapred, I guess some time around the middle of January (Doug?) ...


--
Best regards,
Andrzej Bialecki 
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com