>Le 09-mars-09 à 22:29, Fergus McMenemie a écrit :
>>> how would I implement entity-processor if I were able to get the list
>>> of recently changed documents of our sites?
>>
>> Hmmmm, this sounds like a job for my manifestEnityProcessor
>> see if you can find the thread titled:-
>>
>>   "a new DIH manifestEnityProcessor"
>>
>> is your list of changed documents a list of additions and
>> updates only, or does it contain deletes as well?
>
>Fergus,
>
>I think you should then rename it... Manifest is not the right name to  
>me (manifest refers to something such as the manifest of a jar or of  
>an IMS-content-package, both are a metadata of the data).

Its all in the jargon, I guess. Our content repositories are changed
by update kits, some of the kits come with manifests or in other cases
we capture the output from un-tar or un-zip commands and we call these
manifests. The name is up for grabs if a better suggestion comes along;
I would have used FileListEntityProcessor except the name was taken;-)


>I looked at your original description and I could not read anything  
>about the changed files.
>The regex approach is a nice one for sure...

Yep, our "manifest"s quite often include jpegs, avis etc which we
do not want indexed. And if it's a tar output it will contain
directory stubs as well.

>I think a useful DIH Entity-processor that would maintain its deltas  
>well would have as parameters, url to a list of recently updated urls,  
>url to a list of recently deleted urls. Is this yours?

urls hu! Never thought of that, i was just assuming it would be a local
file. However I guess that could be added... so "manifestFileName" would
become "manifestURL"? In my use cases some of the "manifests" are along
the  lines of 

   ADD xxxx-checksum-xxx  --pathname_1--
   DEL --pathname_b--

Hence "manifestAddRegex" and "manifestDelRegex". I also, in other 
cases, have separate files, one for adding another for deleting.
This I was going to deal with as two separate DIH imports.

>I would have one for URLs with the list of recent things basically  
>from an RSS; the transformer is custom in all cases.

The output from my manifestEnityProcessor is fed to an
XPathEntityProcessor

>
>paul
>
Fergus.
-- 

===============================================================
Fergus McMenemie               Email:fer...@twig.me.uk
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================

Reply via email to