Implement a different caching mechanism for objects cached in configuration In-Reply-To: <[EMAIL PROTECTED]> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable
[ https://issues.apache.org/jira/browse/NUTCH-501?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505807 ]=20 Andrzej Bialecki commented on NUTCH-501: ----------------------------------------- ObjectCache should support caching objects that fall under the same key, bu= t are differently configured. This situation occurs when running in "local"= mode, and using Nutch tools to perform several workflows with different co= nfigs - in such cases there is a single instance of ObjectCache created wit= hin a JVM, and using this implementation of ObjectCache objects coming from= different configuration contexts would be set/retrieved in wrong contexts. This is very much similar to the issue in NUTCH-169. If we use ObjectCache = the way you proposed we would revert to the situation before NUTCH-169. I propose to modify ObjectCache to store multiple objects under the same ke= y, additionally indexed by Configuration id - and to modify all ObjectCache= methods to take a Configuration parameter. Currently Configuration instances don't have a unique id (unless you count = a job id available in mapred.job.id - but this becomes available only after= you submit a job), and they don't implement any sensible hashCode(), so it= 's difficult to produce a key uniquely tied to a config instance. The way N= utch uses Configuration, it's always created either via NutchConfiguration.= create() or new NutchJob(getConf()) - we could generate unique object.cache= .id property there, and use it later on in ObjectCache to retrieve the righ= t set of key/value pairs. Similarly, if ObjectCache gets a Configuration in= stance without a unique key, it could create one, stick it into Configurati= on, and use it from now on. The problem with this approach is that over time the ObjectCache would accu= mulate values from past, no longer valid contexts. > implementing a different caching mechanism for objects Implement a different caching mechanism for objects cached in configuration > -------------------------------------------------------------------------= ---------------------------------------------------------- > > Key: NUTCH-501 > URL: https://issues.apache.org/jira/browse/NUTCH-501 > Project: Nutch > Issue Type: Improvement > Affects Versions: 1.0.0 > Reporter: Do=C4=9Facan G=C3=BCney > Fix For: 1.0.0 > > Attachments: NUTCH-501_draft.patch > > > As per HADOOP-1343, Configuration.setObject and Configuration.getObject (= which are used by Nutch to cache arbitrary objects) are deprecated and will= be removed soon. We have to implement an alternative caching mechanism and= replace all usages of Configuration.{getObject,setObject} with the new mec= hanism. --=20 This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers