Re: Multiple output documents from one input document in transformation connector

Karl Wright Fri, 20 May 2016 06:09:42 -0700

Hi Julien,

It seems I misunderstood something about your description of what your
transformation connector needs to do.

A transformation connector cannot invent additional documents.  It can only
transform the document it is handed.  Since you are apparently trying to
create multiple documents out of one, you can't do that via a
transformation connector.  That functionality *must* be moved upstream and
you will need to write a custom repository connector instead.

The repository connector API has a concept that you may find helpful for
your case, which is compound documents.  The compound document construct
basically allows you to have multiple child documents that are all derived
from a single original root document.  The API for this is well described:
see IProcessActivity for details about how to send multiple child documents
down the processing chain to the output connectors.

I agree it would be wonderful if transformation connectors could break up
documents in the way you assumed, but unfortunately this would not coexist
amicably with incremental document processing.

Thanks,
Karl

On Fri, May 20, 2016 at 8:20 AM, Julien Massiera <
[email protected]> wrote:

> Thanks for the answer Karl.
>
> So I tried and have created a TransformationConnector, extending the
> BaseTransformationConnector and overriding the addOrReplaceWithException
> method. In this method I have two calls to activities.sendDocument(), one
> for the incoming document and another for a fresh new one (not a duplicate
> of the first). The problem is that when it comes to the second call, the
> following exception is thrown : "java.lang.IllegalStateException: Document
> cannot have multiple dispositions". It seems like a transformation
> connector can only output 1 doc but no more.
>
> Am I missing something ?
>
> Julien
>
>
> On 19/05/2016 21:14, Karl Wright wrote:
>
>> This sounds like it would work.
>> Karl
>>
>> Sent from my Windows Phone
>> From: Julien Massiera
>> Sent: 5/19/2016 12:44 PM
>> To:[email protected]
>> Subject: Multiple output documents from one input document in
>> transformation connector
>> Hi ManifoldCF community,
>>
>> here is my problem : I would like to crawl '.pst' documents with
>> ManifoldCF and index each email within them into a Solr instance.
>> I'm thinking to crawl the '.pst' files with a FileSystem repository
>> connection and then use my custom Transformation connection to extract
>> the emails and send them for Solr ingestion through the activities object.
>>
>> Is my approach correct ? or do I need to consider another solution ?
>>
>> Thanks for your help.
>>
>> Julien Massiera
>>
>
>

Re: Multiple output documents from one input document in transformation connector

Reply via email to