Hello,

I have a question regarding the Document filter transformation connector and 
the log about it.
I would like to have a look of all the documents excluded by the rules 
configured in the Document filter transformation connector by looking at the 
Simple history or by the MCF log but it is not easy so far.

Let’s say that I want to crawl a website and I want to index html pages only. 
So I configure a web repository connector with a Document filter transformation 
connector and I create the rule with only one allowed mime type content and one 
file extension. So far so good, the job works well but if I want to visualize 
on the MCF log or by the simple history all the files that were excluded by the 
transformation connector it is quickly complicated : I have to search manually 
all the files that were fetched but not processed by Tika transformation 
connector or ingested by the output connector.

Of my understanding of the code, the document filter transformation connector 
can communicate directly with the repo transformation connector to indicate the 
rules of exclusion of the documents and so the document that need to be 
excluded are not processed in the Document filter transformation connector but 
directly excluded by the web repo connector.
So in the simple history, I can see that a document that will be excluded is in 
"activity fetch" and that’s it, there is no additional information about it.
Could it be possible to add a log entry with an explicit result code as 
excluded by "document filter connector" or something like when the document is 
excluded by the repository connector?
 
Thank you,
Best regards,
Olivier 

Reply via email to