Hi,

is there any example implementation of the new document component feature 
invented with CONNECTORS-989?

I read the section Document components in [0] but i still do not know how to 
actually write a repository connector
that ingests multiple documents originating from a single document of a 
repository.

My first guess was to call the method "activities.ingestDocumentWithException" 
multiple times with the same identifier but distinct component identifiers 
during document processing.

I wrote a simple TestConnector: 
The processDocuments method looks like:

    public void processDocuments(String[] documentIdentifiers,
            String[] versions, IProcessActivity activities,
            DocumentSpecification spec, boolean[] scanOnly)
            throws ManifoldCFException, ServiceInterruption {
        
        int i = 0;
        for (String identifier : documentIdentifiers) {

            byte[] content1 = "test content 1".getBytes();
            byte[] content2 = "test content 2".getBytes();
            byte[] content3 = "test content 3".getBytes();

                        
            RepositoryDocument rd1 = new RepositoryDocument();
            rd1.setBinary(new ByteArrayInputStream(content1), content1.length);

            RepositoryDocument rd2 = new RepositoryDocument();
            rd2.setBinary(new ByteArrayInputStream(content2), content2.length);

            
            RepositoryDocument rd3 = new RepositoryDocument();
            rd3.setBinary(new ByteArrayInputStream(content3), content3.length);

            
            System.out.println("process " + identifier);
            
            try {
                activities.ingestDocumentWithException(identifier, "comp1", 
versions[i], identifier+"/comp1", rd1);
                activities.ingestDocumentWithException(identifier, "comp2", 
versions[i], identifier+"/comp2", rd2);
                activities.ingestDocumentWithException(identifier, "comp3", 
versions[i], identifier+"/comp3", rd3);
            } catch (IOException e) {
                e.printStackTrace();
            }
            
            i++;
        }
        
    }

For seeding the method getDocumentIdentifiers() returns a stream with a single 
document identifier "testidentifier1".
Full Code available at [1].

But subsequent calls of ingestDocumentWithException result in deletions of a 
previously added component.

job end 1416573910333(copmtest1) 0      1       
document ingest         testidentifier1/comp3 OK 12     8       
document deletion       testidentifier1/comp1 OK 0      2       
document ingest         testidentifier1/comp2 OK 12     10      
document deletion       testidentifier1/comp1 OK 0      3       
document ingest         testidentifier1/comp1 OK 12     13      
job start       1416573910333(copmtest1) 0      1

Only testidentifier1/comp2 and testidentifier1/comp3 exist in the output 
connection after the job is finished.

I feed i might have a false understanding of the concept...

Any help is appreciated.

Thanks in advance
Markus

--
Using ManifoldCF 1.7.1 
[0] 
http://manifoldcf.apache.org/release/release-1.7.2/en_US/writing-repository-connectors.html
[1] https://gist.github.com/schuch/43809594ad8f81ddc625

Reply via email to