Hi all,

In follow up to my recent email (below) I thought I would share my findings 
with the ‘Alfresco Indexer’ connector (https://github.com/maoo/alfresco-indexer 
<https://github.com/maoo/alfresco-indexer>) in case someone may be able to 
advise on it’s usage. 

The reason I went to this is due to the lack of change control detection with 
either of the packaged Manifold Alfresco connectors (AtomPub or WebService). I 
needed a method whereby the crawl runs each night and picks up any and all 
changes to the documents from the previous 24 hours. A common scenario.

Unfortunately, I am still to achieve this. 

Having built and installed both the AMP and JAR files needed for the new 
connector, changes are still not coming through. In fact, I have two 
observations so far:

1. Changes to document content or properties does not cause the same document 
to be picked up by the Alfresco connector on the next run
2. Adding ‘Filter Configuration’ seems to do very little to change what is 
picked up

IN DETAIL
1. Failing to pick up modified content

Looking at the log files (which are set to debug) I can see that, upon the 
first crawl of Alfresco, Manifold sends the following requests:

DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - Executing request GET 
/alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
 HTTP/1.1
DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> GET 
/alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
 HTTP/1.1
DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> "GET 
/alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
 HTTP/1.1[\r][\n]"
DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - Executing request GET 
/alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
 HTTP/1.1
DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> GET 
/alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
 HTTP/1.1
DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> "GET 
/alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
 HTTP/1.1[\r][\n]"
DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - Executing request GET 
/alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
 HTTP/1.1
DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> GET 
/alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
 HTTP/1.1
DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> "GET 
/alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
 HTTP/1.1[\r][\n]"
DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - Executing request GET 
/alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
 HTTP/1.1
DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> GET 
/alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
 HTTP/1.1
DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> "GET 
/alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
 HTTP/1.1[\r][\n]"

This picks up all of the content e.g. documents. 

Running a second crawl, without any other actions being done, results in the 
following requests:

DEBUG 2015-10-28 05:26:31,854 (Startup thread) - Executing request GET 
/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
 HTTP/1.1
DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> GET 
/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
 HTTP/1.1
DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> "GET 
/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
 HTTP/1.1[\r][\n]”

So I can see that, in the first instance, we are targeting content directly 
while, in the second, we are asking for changes. The problem is that no changes 
are returned from the second set of requests. The response from these calls is:

DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  
"totalNodes" : "0", [\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  
"elapsedTime" : "8",[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  "docs" 
: [[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  
],[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "    
"last_txn_id" : "352",[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "    
"last_acl_changeset_id" : "13",[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  
"store_id" : "SpacesStore",[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  
"store_protocol" : "workspace"[\r][\n]"
DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << “}"

Regardless of what changes I make to a document that I have been using for 
testing, the document is not updated. The response from the calls for changes 
(totalNodes) is always ‘0’.


2. Adding ‘Filter Configuration’ seems to do very little to change what is 
picked up

Within my test Alfresco environment I have one site set up (Finance). Within 
the Finance doc library I have three test docs. No other changes have been made 
to the Alfresco instance. 
Running a crawl with no filter configurations set returns 81 items. This is via 
the URL in a browser.
If I then set the Site Filter configuration to ‘Finance’ and apply, I still get 
81 items when I re-run the crawl. 
I can see that the term ‘Finance’ is being added to the URL but this does not 
seem to change the behaviour. 


I am happy to spend time diagnosing this is there is anyone available to 
assist. 

Thanks

Paul



> On 27 Oct 2015, at 18:14, [email protected] wrote:
> 
> Hi all,
> 
> This is a question regarding the relatively new Alfresco Webscript connector. 
> 
> SETUP
> I have a vanilla Alfresco Community 5.0 installation
> One site has been created called 'Finance'
> A handful of documents have been created in 'Finance' Doc Library.
> I have cloned and packaged up the 'alfresco-indexer' 
> (https://github.com/maoo/alfresco-indexer) and have applied the AMP and 
> CLIENT packages to their respective environments. 
> 
> 
> ISSUE
> The issue is that the default API call used by Manifold is returning nothing. 
> The full API call used by Manifold, and based on my config, is :
> 
> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
> 
> 
> TESTS
> I have identified two streamlined URL's. The first one returns the documents 
> that exist in the doc library of the 'Finance' site. This URL is:
> 
> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%7D
> 
> The second URL simply adds the site restriction. This URL returns nothing:
> 
> http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D
> 
> 
> 
> Can anyone explain why the documents do not return when only the containing 
> site is named in the API URL?
> 
> Cheers
> 
> Paul
> 
> 

Reply via email to