ChangeSetEntityProcessor, on there I would jump with two feet.
paulLe 10-mars-09 à 05:40, Noble Paul നോബിള് नोब्ळ् a écrit :
Hi Fergus open a JIRA issue anyway. put in your thoughts and we can refine the requirements as a part of the discussion. Basically the requirements are , 1)read a file line by line 2) filter out lines (include or exclude ) based on a regex 3) extract parts (named parts) from the line using another regex NobleOn Tue, Mar 10, 2009 at 1:50 AM, Fergus McMenemie <fer...@twig.me.uk> wrote:Hi Fergus,The idea is that we have something generic which can be applicable to a large set of users. If the manifest is a text file it can be read in somestandard way (say line by line). So we can have an EntityProcessorwhich reads a text file line and filer it by a regex like the way 'grep' works.Yes. That is what I have written. It is just an alternate form of theFileListEntityProcessor except that rather than walking the file systemit reads from a file, line by line, and identifies the portion of the line containing the filename using a regexp.On Mon, Mar 9, 2009 at 10:44 PM, Fergus McMenemie <fer...@twig.me.uk> wrote:manifest processing has a very limited usecase. Why can't it beprocessed using a PlainTextEntityProcessor and write a Tranformer toread lines using regex?Ehmmm Ok. The PlainTextEntityProcessor docs do not give me enough insight to see how this could be used to index each of the files listed by a 'tar xvf' report. Can you explain further? About the limited usecase. Verity thought it was useful enough to have there own "bulk insert file" or bif file format that did the same and was far less flexible. In my experience we generally start off with some kind of file walker or crawler looking after file repositories. But these always proved slow and unreliable and over time they were always replaced it with some kind of manifest based control of the indexer. Where we could get a report of changes we always used it, and only relied on walkers or crawlers where we had to. Fergus--NobleOn Mon, Mar 9, 2009 at 8:30 PM, Fergus McMenemie <fer...@twig.me.uk > wrote:Hello, I have almost finished a new DIH EntityProcessor which I am calling the manifestEnityProcessor. It is designed around the idea that whatever demon is used to maintain your set of a few 100,000 xml documents it is likely to drop a report or log file explaining what has been changed within your content store. This assumes a file based content repository. The manifestEnityProcessor is used as follows <entity name="jc" processor="ManifestEntityProcessor" baseDir="/Volumes/Techmore/ts/aaa/schema/data" rootEntity="false" dataSource="null" allowRegex="^.*\.xml$" manifestFileName="/Volumes/ts/man-find.txt" manifestAddRegex="(.*)$" > The idea is you have a log file or other report, perhaps from tar or zip, and you wish to use this to control the indexing of the new content. The new entity fields are as follows. manifestFileName is the name of the manifest file. If this value is relative, it assumed to be relative to baseDir. Required. manifestAddRegex is a required regex to identify lines which when matched should cause docs to be added to the index. manifestDelRegex is an optional value of a regex to identify documents which when matched should be deleted from the index **PLANNED** allowRegex a required regex to identify the portion of the ADD/DELete line identified above which contains the file or pathname to ADDed or DELeted. If the resulting value relative, it assumed to be relative to baseDir. What do I do next? Raise a JIRA issue and add the code? Is DIH the right place to add this? Suggestions for a different name?Suggestions on how to do the delete bitty from within an entity?Regards Fergus.--Noble Paul-- =============================================================== Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===============================================================-- --Noble Paul-- =============================================================== Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===============================================================-- --Noble Paul
smime.p7s
Description: S/MIME cryptographic signature