[jira] Updated: (NUTCH-169) remove static NutchConf
[ http://issues.apache.org/jira/browse/NUTCH-169?page=all ] Marko Bauhardt updated NUTCH-169: - Attachment: NutchConf.370854.patch * a general comment: plugins now implement NutchConfigurable, which means that you had to add two new methods, looking exactly the same, to many classes. That's why the NutchConfigured class was created. I suggest replacing implements NutchConfigurable with extends NutchConfigured where appropriate. In any case i have to implement two methods in the one hand I have to implement set and getConf in the other hand I have to overwrite a constructor and a setConf. Since in many cases a constructor wouldn't helpful i decide to use the interface. In general it may would be make sense to have interface or an abstract class that has just a configure method nothing more. * 1094: I think a better place to set the current config on a protocol instance is inside the ProtocolFactory.getProtocol() because now the factory itself is instantiated with an instance of nutchConf, so it keeps a reference to that config. Done. * 1256: what is this constructor for? I think only the public constructor is used. My mistake, fixed * 1311: please replace getExtentens() with getExtensions() * 1346, 1375: these classes should be static, I think * 1542, 1570: should be static Done. * 1903,8476,10136: I wonder, shouldn't we cache these in nutchConf? 1903: Done. 8476,10136: we cache PluginRepository and not the extensions itself. From my point of view in general to move caching or object recycling to the tools that use the extensions / objects but not cache the object in it self. * 3154, 3650: what's the point of this line? it was already determined that there is nothing useful there... this line exists also in other similar facades. In case the cache is empty I fill the cache inside the if condition in line 3150. So get the freshly cached values to assign them to the field. * 3627: typo, should be indexingFilters. Done. * 3718, 6514: I think it would be better to create filters once, and keep them around. * 5020: is this an intentional change?? * 6467: I think this change is an error. * 6651: I don't understand this comment... * 6777: should be static * 7045: shouldn't we store these filters too, like all other filters, in nutchConf? * 7132: I think we can cache CLIENT in nutchConf too. Fixed. * 7782: either we should remove this, or use caching in nutchConf. Done, I removed this. * 10638: local variable overshadows a superclass variable. Done. * 1337 and following, inside CommonGrams.java: spurious whitespace, bad formatting * 1510-1539, 1748-1772, 1796, 1896-1907, 2556-2582, 2880, 3124-3160, 3207, 3211-3217, 4295, 4405,4657,4848,5343,5493,6566,6806-6822,6872,7295,7404,7441,7503,7540,7644,7680,7720, 7859,7896,7964,7977,8011,8049,8214,8226,8244,8280,8456,8471,9045,9162,9227,9261,9323,9342, 9380,9403,9580,9627,9677,9702,9779,9816,9820,9863,9871,9944,9961,10045,10130,10394,10415, 11079,11129: inconsistent indenting, should be 2 spaces. Some missing whitespace. * 1613, 1629, 1691, 2262, 2515, 2687, 2861, 3244, 3510, 3774, 3929, 4010, 4157, 4273, 4491,6578,6831,6840,6867,6900,6932,6956,6972,6981,7045,7065,7084,7140,7169,7357,7477, 7882,9171,9195,9204,9765,9910: whitespace * 1659, 2905,7404,7503,7540,7644,7683,7728,7890,7972,8015,8053,8151,8393,8471,8614,8741,8799,9005, 9049,9221,9342,9403,9589,9627,9702,9779,9820,9871,9961,10130,10410,10672,10765, 11129: non-javadoc generated comments should be removed * 2241, 2498, 2534-2538, 4244,6176,6798,7096,7256-7264: junk i had done my best, to get all of this fixed. I fixed also some other problems beside these you mentioned. Anyway, the test suite, the crawl-process and the search runs local and in the ndfs successfully for me. Anyway it is a really big thing so please test it again. Thanks, Marko remove static NutchConf --- Key: NUTCH-169 URL: http://issues.apache.org/jira/browse/NUTCH-169 Project: Nutch Type: Improvement Reporter: Stefan Groschupf Priority: Critical Fix For: 0.8-dev Attachments: NutchConf.367837.patch, NutchConf.370854.patch, NutchConf.Fetcher.060111.patch, NutchConf.Http.060111.patch, NutchConf.RegexURLFilter.060111.patch, nutchConf.patch Removing the static NutchConf.get is required for a set of improvements and new features. + it allows a better integration of nutch in j2ee or other systems. + it allows the management of nutch from a web based gui (a kind of nutch appliance) which will improve the usability and also increase the user acceptance of nutch + it allows to change configuration properties until runtime + it allows to implement NutchConf as a abstract class or interface to provide other configuration value sources than xml files. (community request) --
[jira] Updated: (NUTCH-169) remove static NutchConf
[ http://issues.apache.org/jira/browse/NUTCH-169?page=all ] Stefan Groschupf updated NUTCH-169: --- Attachment: NutchConf.367837.patch Next preview of the nutchConf patch. First this patch still need a lot of cleanup but we would love already share the patch, in the hope to get more feedback and poeple that test this patch. We now use nutchConf as a cache e.g. QueryFilters. Also we patched all JobConf related code in a similar manner as NutchConf. There are some case where I'm not that happy with the result but we was trying to change as less as possible and we may later can do some more refactorings. We tested this patch and looks like everything is running, but since the change is that complex we may need to run some more tests, please help with that. remove static NutchConf --- Key: NUTCH-169 URL: http://issues.apache.org/jira/browse/NUTCH-169 Project: Nutch Type: Improvement Reporter: Stefan Groschupf Priority: Critical Fix For: 0.8-dev Attachments: NutchConf.367837.patch, NutchConf.Fetcher.060111.patch, NutchConf.Http.060111.patch, NutchConf.RegexURLFilter.060111.patch, nutchConf.patch Removing the static NutchConf.get is required for a set of improvements and new features. + it allows a better integration of nutch in j2ee or other systems. + it allows the management of nutch from a web based gui (a kind of nutch appliance) which will improve the usability and also increase the user acceptance of nutch + it allows to change configuration properties until runtime + it allows to implement NutchConf as a abstract class or interface to provide other configuration value sources than xml files. (community request) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (NUTCH-169) remove static NutchConf
[ http://issues.apache.org/jira/browse/NUTCH-169?page=all ] Stefan Groschupf updated NUTCH-169: --- Attachment: nutchConf.patch The patch was created by Marko Bauhardt with some help from me, so full credits to Marko! It remove any access of nutchConf via the static method 'get'. Therefore some API was changed to pass a instance of the NutchConf down the call stack to be available in all required objects. For performance reasons the PluginRepository is now cached in the nutchConf. The Repository will not be serialized and re-instantiated as soon it is requested. The complete test suite is passing with this patch, only Jerome's new HTML protocol need to port to the new NutchConf API also. Jerome mentioned that he will do this. (Thanks) I would be great if we can bring this patch somehow soon to the svn, since this is just the first step to a nutch gui. remove static NutchConf --- Key: NUTCH-169 URL: http://issues.apache.org/jira/browse/NUTCH-169 Project: Nutch Type: Improvement Reporter: Stefan Groschupf Priority: Critical Fix For: 0.8-dev Attachments: nutchConf.patch Removing the static NutchConf.get is required for a set of improvements and new features. + it allows a better integration of nutch in j2ee or other systems. + it allows the management of nutch from a web based gui (a kind of nutch appliance) which will improve the usability and also increase the user acceptance of nutch + it allows to change configuration properties until runtime + it allows to implement NutchConf as a abstract class or interface to provide other configuration value sources than xml files. (community request) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (NUTCH-169) remove static NutchConf
[ http://issues.apache.org/jira/browse/NUTCH-169?page=all ] Jerome Charron updated NUTCH-169: - Attachment: NutchConf.Http.060111.patch Attached is the patch for http related classes (lib-http, protocol-http and protocol-httpclient). Pfou, Stefan, it was a huge work since a lot of code was static and use the static NutchConf !!! ;-) But it is ok and it works (with a patch to the Fetcher that I will submit just after). Please notice, that it is a raw version, and it probably needs a full review after commit. remove static NutchConf --- Key: NUTCH-169 URL: http://issues.apache.org/jira/browse/NUTCH-169 Project: Nutch Type: Improvement Reporter: Stefan Groschupf Priority: Critical Fix For: 0.8-dev Attachments: NutchConf.Http.060111.patch, nutchConf.patch Removing the static NutchConf.get is required for a set of improvements and new features. + it allows a better integration of nutch in j2ee or other systems. + it allows the management of nutch from a web based gui (a kind of nutch appliance) which will improve the usability and also increase the user acceptance of nutch + it allows to change configuration properties until runtime + it allows to implement NutchConf as a abstract class or interface to provide other configuration value sources than xml files. (community request) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (NUTCH-169) remove static NutchConf
[ http://issues.apache.org/jira/browse/NUTCH-169?page=all ] Jerome Charron updated NUTCH-169: - Attachment: NutchConf.Fetcher.060111.patch Same as the one provided in Stefan patch + the Fetcher set the NutchConf to protocol. Not sure it is the right way: it could be better that the ProtocolFactory set the NutchConf to protocols. ??? remove static NutchConf --- Key: NUTCH-169 URL: http://issues.apache.org/jira/browse/NUTCH-169 Project: Nutch Type: Improvement Reporter: Stefan Groschupf Priority: Critical Fix For: 0.8-dev Attachments: NutchConf.Fetcher.060111.patch, NutchConf.Http.060111.patch, nutchConf.patch Removing the static NutchConf.get is required for a set of improvements and new features. + it allows a better integration of nutch in j2ee or other systems. + it allows the management of nutch from a web based gui (a kind of nutch appliance) which will improve the usability and also increase the user acceptance of nutch + it allows to change configuration properties until runtime + it allows to implement NutchConf as a abstract class or interface to provide other configuration value sources than xml files. (community request) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (NUTCH-169) remove static NutchConf
[ http://issues.apache.org/jira/browse/NUTCH-169?page=all ] Jerome Charron updated NUTCH-169: - Attachment: NutchConf.RegexURLFilter.060111.patch This patch is a merge of the version provided in Stefan's patch and the last changes committed by Doug (use JDK regexp). remove static NutchConf --- Key: NUTCH-169 URL: http://issues.apache.org/jira/browse/NUTCH-169 Project: Nutch Type: Improvement Reporter: Stefan Groschupf Priority: Critical Fix For: 0.8-dev Attachments: NutchConf.Fetcher.060111.patch, NutchConf.Http.060111.patch, NutchConf.RegexURLFilter.060111.patch, nutchConf.patch Removing the static NutchConf.get is required for a set of improvements and new features. + it allows a better integration of nutch in j2ee or other systems. + it allows the management of nutch from a web based gui (a kind of nutch appliance) which will improve the usability and also increase the user acceptance of nutch + it allows to change configuration properties until runtime + it allows to implement NutchConf as a abstract class or interface to provide other configuration value sources than xml files. (community request) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira