Hi,
is there any example implementation of the new document component feature
invented with CONNECTORS-989?
I read the section Document components in [0] but i still do not know how to
actually write a repository connector
that ingests multiple documents originating from a single document of a
repository.
My first guess was to call the method "activities.ingestDocumentWithException"
multiple times with the same identifier but distinct component identifiers
during document processing.
I wrote a simple TestConnector:
The processDocuments method looks like:
public void processDocuments(String[] documentIdentifiers,
String[] versions, IProcessActivity activities,
DocumentSpecification spec, boolean[] scanOnly)
throws ManifoldCFException, ServiceInterruption {
int i = 0;
for (String identifier : documentIdentifiers) {
byte[] content1 = "test content 1".getBytes();
byte[] content2 = "test content 2".getBytes();
byte[] content3 = "test content 3".getBytes();
RepositoryDocument rd1 = new RepositoryDocument();
rd1.setBinary(new ByteArrayInputStream(content1), content1.length);
RepositoryDocument rd2 = new RepositoryDocument();
rd2.setBinary(new ByteArrayInputStream(content2), content2.length);
RepositoryDocument rd3 = new RepositoryDocument();
rd3.setBinary(new ByteArrayInputStream(content3), content3.length);
System.out.println("process " + identifier);
try {
activities.ingestDocumentWithException(identifier, "comp1",
versions[i], identifier+"/comp1", rd1);
activities.ingestDocumentWithException(identifier, "comp2",
versions[i], identifier+"/comp2", rd2);
activities.ingestDocumentWithException(identifier, "comp3",
versions[i], identifier+"/comp3", rd3);
} catch (IOException e) {
e.printStackTrace();
}
i++;
}
}
For seeding the method getDocumentIdentifiers() returns a stream with a single
document identifier "testidentifier1".
Full Code available at [1].
But subsequent calls of ingestDocumentWithException result in deletions of a
previously added component.
job end 1416573910333(copmtest1) 0 1
document ingest testidentifier1/comp3 OK 12 8
document deletion testidentifier1/comp1 OK 0 2
document ingest testidentifier1/comp2 OK 12 10
document deletion testidentifier1/comp1 OK 0 3
document ingest testidentifier1/comp1 OK 12 13
job start 1416573910333(copmtest1) 0 1
Only testidentifier1/comp2 and testidentifier1/comp3 exist in the output
connection after the job is finished.
I feed i might have a false understanding of the concept...
Any help is appreciated.
Thanks in advance
Markus
--
Using ManifoldCF 1.7.1
[0]
http://manifoldcf.apache.org/release/release-1.7.2/en_US/writing-repository-connectors.html
[1] https://gist.github.com/schuch/43809594ad8f81ddc625