[jira] Updated: (NUTCH-169) remove static NutchConf

2006-01-30 Thread Marko Bauhardt (JIRA)
 [ http://issues.apache.org/jira/browse/NUTCH-169?page=all ]

Marko Bauhardt updated NUTCH-169:
-

Attachment: NutchConf.370854.patch

 * a general comment: plugins now implement NutchConfigurable, which means 
 that you had to add two new methods, 
 looking exactly the same, to many classes. That's why the NutchConfigured 
 class was created. 
 I suggest replacing implements NutchConfigurable with extends 
 NutchConfigured where appropriate. 

In any case i have to implement two methods in the one hand I have to implement 
set and getConf in the other hand I have to overwrite a constructor and a 
setConf. 
Since in many cases a constructor wouldn't helpful i decide to use the 
interface. 
In general it may would be make sense to have interface or an abstract class 
that has just a configure method nothing more. 

 * 1094: I think a better place to set the current config on a protocol 
 instance is inside the ProtocolFactory.getProtocol()
 because now the factory itself is instantiated with an instance of nutchConf, 
 so it keeps a reference to that config. 

Done.

 * 1256: what is this constructor for? I think only the public constructor is 
 used. 

My mistake, fixed


 * 1311: please replace getExtentens() with getExtensions() 
 * 1346, 1375: these classes should be static, I think 
 * 1542, 1570: should be static 

Done.

 * 1903,8476,10136: I wonder, shouldn't we cache these in nutchConf? 

1903: Done.
8476,10136: we cache PluginRepository and not the extensions itself. From my 
point of view in general to move caching or object recycling to the tools that 
use the extensions / objects but not cache the object in it self. 

 * 3154, 3650: what's the point of this line? it was already determined that 
 there is nothing useful there... this line exists also in other similar 
 facades. 

In case the cache is empty I fill the cache inside the if condition in line 
3150. So get the freshly cached values to assign them to the field.


 * 3627: typo, should be indexingFilters. 
Done.



 * 3718, 6514: I think it would be better to create filters once, and keep 
 them around. 
 * 5020: is this an intentional change?? 
 * 6467: I think this change is an error. 
 * 6651: I don't understand this comment... 
 * 6777: should be static 
 * 7045: shouldn't we store these filters too, like all other filters, in 
 nutchConf? 
 * 7132: I think we can cache CLIENT in nutchConf too. 

Fixed.

 * 7782: either we should remove this, or use caching in nutchConf. 

Done, I removed this.

 * 10638: local variable overshadows a superclass variable. 

Done.


 * 1337 and following, inside CommonGrams.java: spurious whitespace, bad 
 formatting 
 * 1510-1539, 1748-1772, 1796, 1896-1907, 2556-2582, 2880, 3124-3160, 3207, 
 3211-3217, 4295, 
 4405,4657,4848,5343,5493,6566,6806-6822,6872,7295,7404,7441,7503,7540,7644,7680,7720,
  
 7859,7896,7964,7977,8011,8049,8214,8226,8244,8280,8456,8471,9045,9162,9227,9261,9323,9342,
  
 9380,9403,9580,9627,9677,9702,9779,9816,9820,9863,9871,9944,9961,10045,10130,10394,10415,
  
 11079,11129: inconsistent indenting, should be 2 spaces. Some missing 
 whitespace. 
 * 1613, 1629, 1691, 2262, 2515, 2687, 2861, 3244, 3510, 3774, 3929, 4010, 
 4157, 4273, 
 4491,6578,6831,6840,6867,6900,6932,6956,6972,6981,7045,7065,7084,7140,7169,7357,7477,
  
 7882,9171,9195,9204,9765,9910: whitespace 
 * 1659, 
 2905,7404,7503,7540,7644,7683,7728,7890,7972,8015,8053,8151,8393,8471,8614,8741,8799,9005,
  
 9049,9221,9342,9403,9589,9627,9702,9779,9820,9871,9961,10130,10410,10672,10765,
  
 11129: non-javadoc generated comments should be removed 
 * 2241, 2498, 2534-2538, 4244,6176,6798,7096,7256-7264: junk 

i had done my best, to get all of this fixed.


I fixed also some other problems beside these you mentioned. Anyway, the test 
suite, the crawl-process and the search runs local and in the ndfs successfully 
for me.
Anyway it is a really big thing so please test it again.

Thanks, Marko


 remove static NutchConf
 ---

  Key: NUTCH-169
  URL: http://issues.apache.org/jira/browse/NUTCH-169
  Project: Nutch
 Type: Improvement
 Reporter: Stefan Groschupf
 Priority: Critical
  Fix For: 0.8-dev
  Attachments: NutchConf.367837.patch, NutchConf.370854.patch, 
 NutchConf.Fetcher.060111.patch, NutchConf.Http.060111.patch, 
 NutchConf.RegexURLFilter.060111.patch, nutchConf.patch

 Removing the static NutchConf.get is required for a set of improvements and 
 new features.
 + it allows a better integration of nutch in j2ee or other systems.
 + it allows the management of nutch from a web based gui (a kind of nutch 
 appliance) which will improve the usability and also increase the user 
 acceptance of nutch
 + it allows to change configuration properties until runtime
 + it allows to implement NutchConf as a abstract class or interface to 
 provide other configuration value sources than xml files. (community request)

-- 

[jira] Updated: (NUTCH-169) remove static NutchConf

2006-01-13 Thread Stefan Groschupf (JIRA)
 [ http://issues.apache.org/jira/browse/NUTCH-169?page=all ]

Stefan Groschupf updated NUTCH-169:
---

Attachment: NutchConf.367837.patch

Next preview of the nutchConf patch. First this patch still need a lot of 
cleanup but we would love already share the patch, in the hope to get more 
feedback and poeple that test this patch. 
We now use nutchConf as a cache e.g. QueryFilters.
Also we patched all JobConf related code in a similar manner as NutchConf. 
There are some case where I'm not that happy with the result but we was trying 
to change as less as possible and we may later can do some more refactorings. 
We tested this patch and looks like everything is running, but since the change 
is that complex we may need to run some more tests, please help with that.


 remove static NutchConf
 ---

  Key: NUTCH-169
  URL: http://issues.apache.org/jira/browse/NUTCH-169
  Project: Nutch
 Type: Improvement
 Reporter: Stefan Groschupf
 Priority: Critical
  Fix For: 0.8-dev
  Attachments: NutchConf.367837.patch, NutchConf.Fetcher.060111.patch, 
 NutchConf.Http.060111.patch, NutchConf.RegexURLFilter.060111.patch, 
 nutchConf.patch

 Removing the static NutchConf.get is required for a set of improvements and 
 new features.
 + it allows a better integration of nutch in j2ee or other systems.
 + it allows the management of nutch from a web based gui (a kind of nutch 
 appliance) which will improve the usability and also increase the user 
 acceptance of nutch
 + it allows to change configuration properties until runtime
 + it allows to implement NutchConf as a abstract class or interface to 
 provide other configuration value sources than xml files. (community request)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



[jira] Updated: (NUTCH-169) remove static NutchConf

2006-01-10 Thread Stefan Groschupf (JIRA)
 [ http://issues.apache.org/jira/browse/NUTCH-169?page=all ]

Stefan Groschupf updated NUTCH-169:
---

Attachment: nutchConf.patch

The patch was created by Marko Bauhardt with some  help from me, so full 
credits to Marko! 
It remove any access of nutchConf via the static method 'get'. Therefore some 
API was changed to pass a instance of the NutchConf down the call stack to be 
available in all required objects. 
For performance reasons the PluginRepository is now cached in the nutchConf. 
The Repository will not be serialized and re-instantiated as soon it is 
requested.  
The complete test suite is passing with this patch, only Jerome's new HTML 
protocol need to port to the new NutchConf API also. 
Jerome mentioned that he will do this. (Thanks)  
I would be great if we can bring this patch somehow soon to the svn, since this 
is just the first step to a nutch gui.
 

 remove static NutchConf
 ---

  Key: NUTCH-169
  URL: http://issues.apache.org/jira/browse/NUTCH-169
  Project: Nutch
 Type: Improvement
 Reporter: Stefan Groschupf
 Priority: Critical
  Fix For: 0.8-dev
  Attachments: nutchConf.patch

 Removing the static NutchConf.get is required for a set of improvements and 
 new features.
 + it allows a better integration of nutch in j2ee or other systems.
 + it allows the management of nutch from a web based gui (a kind of nutch 
 appliance) which will improve the usability and also increase the user 
 acceptance of nutch
 + it allows to change configuration properties until runtime
 + it allows to implement NutchConf as a abstract class or interface to 
 provide other configuration value sources than xml files. (community request)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



[jira] Updated: (NUTCH-169) remove static NutchConf

2006-01-10 Thread Jerome Charron (JIRA)
 [ http://issues.apache.org/jira/browse/NUTCH-169?page=all ]

Jerome Charron updated NUTCH-169:
-

Attachment: NutchConf.Http.060111.patch

Attached is the patch for http related classes (lib-http, protocol-http and 
protocol-httpclient).
Pfou, Stefan, it was a huge work since a lot of code was static and use the 
static NutchConf !!!
;-)

But it is ok and it works (with a patch to the Fetcher that I will submit just 
after).
Please notice, that it is a raw version, and it probably needs a full review 
after commit.

 remove static NutchConf
 ---

  Key: NUTCH-169
  URL: http://issues.apache.org/jira/browse/NUTCH-169
  Project: Nutch
 Type: Improvement
 Reporter: Stefan Groschupf
 Priority: Critical
  Fix For: 0.8-dev
  Attachments: NutchConf.Http.060111.patch, nutchConf.patch

 Removing the static NutchConf.get is required for a set of improvements and 
 new features.
 + it allows a better integration of nutch in j2ee or other systems.
 + it allows the management of nutch from a web based gui (a kind of nutch 
 appliance) which will improve the usability and also increase the user 
 acceptance of nutch
 + it allows to change configuration properties until runtime
 + it allows to implement NutchConf as a abstract class or interface to 
 provide other configuration value sources than xml files. (community request)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



[jira] Updated: (NUTCH-169) remove static NutchConf

2006-01-10 Thread Jerome Charron (JIRA)
 [ http://issues.apache.org/jira/browse/NUTCH-169?page=all ]

Jerome Charron updated NUTCH-169:
-

Attachment: NutchConf.Fetcher.060111.patch

Same as the one provided in Stefan patch + the Fetcher set the NutchConf to 
protocol.
Not sure it is the right way: it could be better that the ProtocolFactory set 
the NutchConf to protocols.
???

 remove static NutchConf
 ---

  Key: NUTCH-169
  URL: http://issues.apache.org/jira/browse/NUTCH-169
  Project: Nutch
 Type: Improvement
 Reporter: Stefan Groschupf
 Priority: Critical
  Fix For: 0.8-dev
  Attachments: NutchConf.Fetcher.060111.patch, NutchConf.Http.060111.patch, 
 nutchConf.patch

 Removing the static NutchConf.get is required for a set of improvements and 
 new features.
 + it allows a better integration of nutch in j2ee or other systems.
 + it allows the management of nutch from a web based gui (a kind of nutch 
 appliance) which will improve the usability and also increase the user 
 acceptance of nutch
 + it allows to change configuration properties until runtime
 + it allows to implement NutchConf as a abstract class or interface to 
 provide other configuration value sources than xml files. (community request)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



[jira] Updated: (NUTCH-169) remove static NutchConf

2006-01-10 Thread Jerome Charron (JIRA)
 [ http://issues.apache.org/jira/browse/NUTCH-169?page=all ]

Jerome Charron updated NUTCH-169:
-

Attachment: NutchConf.RegexURLFilter.060111.patch

This patch is a merge of the version provided in Stefan's patch and the last 
changes committed by Doug (use JDK regexp).

 remove static NutchConf
 ---

  Key: NUTCH-169
  URL: http://issues.apache.org/jira/browse/NUTCH-169
  Project: Nutch
 Type: Improvement
 Reporter: Stefan Groschupf
 Priority: Critical
  Fix For: 0.8-dev
  Attachments: NutchConf.Fetcher.060111.patch, NutchConf.Http.060111.patch, 
 NutchConf.RegexURLFilter.060111.patch, nutchConf.patch

 Removing the static NutchConf.get is required for a set of improvements and 
 new features.
 + it allows a better integration of nutch in j2ee or other systems.
 + it allows the management of nutch from a web based gui (a kind of nutch 
 appliance) which will improve the usability and also increase the user 
 acceptance of nutch
 + it allows to change configuration properties until runtime
 + it allows to implement NutchConf as a abstract class or interface to 
 provide other configuration value sources than xml files. (community request)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira