Hi Karl,

thanks for the fast response.

We have a simple connector (written before 1.7), that produces documents from 
an XML file and we use the empty version string to trigger ingestion on every 
job run. Meaning the empty version string is considered as "alwaysRefetch" and 
the created document is always sent down the pipeline along with this empty 
version string.
(the connector was relying on the 1.x BaseRepositoryConnector)
 
I noticed the backward compatibility code in the BaseRepositoryConnector in 
1.7+ and i used this code to wire our custom connector code to the new 2.3 
interface.
I debugged the document processing and - as expected - 
ingestDocumentWithException is still called every time, as before, since an 
empty version string is still considered as alwaysRefetch. But the sent 
document is only ingested to the ouputrepository at the first time the job 
runs. On consecutive runs the output step stays inactive.
 
I think we can boil my issue down to a specific question about one method of 
IProcessActivity interface:
 
  ingestDocumentWithException(String documentIdentifier, String version, String 
documentURI, RepositoryDocument data)
 

Let's assume the following example flow (starting from an empty and clean MCF 
2.3 system):
 
(1) In a first run of my job 

      ingestDocumentWithException( "identiferX", "", "documentUriX", repoDoc) 
// second param is empty version string

    is called. This leads to ingestion of the document with the URI 
"documentUriX".

(2) In a second run of my job

      ingestDocumentWithException( "identiferX", "", "documentUriX", repoDoc) 
// second param is empty version string

    is called again (with the same arguments).

What is the expected behavior here?
Should the document be ingested again or not?
And if not, how should i trigger ingestion? By sending always a null version 
down the pipeline?

The actual behavior
- In 1.7 it is ingested again.
- in 2.3 it is _not_ ingested again.

Regards,
Markus





Gesendet: Freitag, 04. März 2016 um 12:11 Uhr
Von: "Karl Wright" <[email protected]>
An: "[email protected]" <[email protected]>
Betreff: Re: Should a document with an empty version string always be 
reingested?

Hi Markus,
 
The canonical way that a connector handles incrementality changed from 1.7 to 
1.10.  We maintained backwards compatibility through the inclusion of legacy 
base connector methods.  CONNECTORS-1153 reported a problem in one of those 
base connector methods, which has been fixed by 1.10.  I can't tell whether 
this applies to your situation.
 
On 2.x the base connector methods no longer have all of the legacy base 
connector methods at all, so if you have a custom connector you will need to 
rework your connector class to adhere to the newer model.  Specifically, there 
is no such method anymore as "getDocumentVersions()".  Instead, your connector 
must signal its disposition of any document using the IProcessActivity methods 
available for that purpose.
 
Can you describe in more detail what you are doing here?
(a) Is this a custom connector?
(b) Was it developed on 1.7 or before?
(c) Are you trying to run it on 1.10 or on 2.x?
 
That will help me give you better responses.
 
Karl
 
 
On Fri, Mar 4, 2016 at 5:28 AM, Markus Schuch <[email protected]> wrote:

Hi,
 
we ran on MCF 1.7 for quite a while and in this environment a document send to 
the ingestion pipeline together with an empty version string was always 
reingested.
On MCF 2.3 this is no longer the case.
 
I found 
https://issues.apache.org/jira/browse/CONNECTORS-1153[https://issues.apache.org/jira/browse/CONNECTORS-1153]
 and may be the 1.7 behavior we were relying on was always a bug.
 
Question:
Is the new 2.3 behavior the expected case how the ingestion pipeline handles an 
empty version string?
And how can "always reingestion" be triggered?
 
Thanks in Advance,
Markus

Reply via email to