>Le 09-mars-09 à 22:29, Fergus McMenemie a écrit : >>> how would I implement entity-processor if I were able to get the list >>> of recently changed documents of our sites? >> >> Hmmmm, this sounds like a job for my manifestEnityProcessor >> see if you can find the thread titled:- >> >> "a new DIH manifestEnityProcessor" >> >> is your list of changed documents a list of additions and >> updates only, or does it contain deletes as well? > >Fergus, > >I think you should then rename it... Manifest is not the right name to >me (manifest refers to something such as the manifest of a jar or of >an IMS-content-package, both are a metadata of the data).
Its all in the jargon, I guess. Our content repositories are changed by update kits, some of the kits come with manifests or in other cases we capture the output from un-tar or un-zip commands and we call these manifests. The name is up for grabs if a better suggestion comes along; I would have used FileListEntityProcessor except the name was taken;-) >I looked at your original description and I could not read anything >about the changed files. >The regex approach is a nice one for sure... Yep, our "manifest"s quite often include jpegs, avis etc which we do not want indexed. And if it's a tar output it will contain directory stubs as well. >I think a useful DIH Entity-processor that would maintain its deltas >well would have as parameters, url to a list of recently updated urls, >url to a list of recently deleted urls. Is this yours? urls hu! Never thought of that, i was just assuming it would be a local file. However I guess that could be added... so "manifestFileName" would become "manifestURL"? In my use cases some of the "manifests" are along the lines of ADD xxxx-checksum-xxx --pathname_1-- DEL --pathname_b-- Hence "manifestAddRegex" and "manifestDelRegex". I also, in other cases, have separate files, one for adding another for deleting. This I was going to deal with as two separate DIH imports. >I would have one for URLs with the list of recent things basically >from an RSS; the transformer is custom in all cases. The output from my manifestEnityProcessor is fed to an XPathEntityProcessor > >paul > Fergus. -- =============================================================== Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===============================================================