Hi all, In follow up to my recent email (below) I thought I would share my findings with the ‘Alfresco Indexer’ connector (https://github.com/maoo/alfresco-indexer <https://github.com/maoo/alfresco-indexer>) in case someone may be able to advise on it’s usage.
The reason I went to this is due to the lack of change control detection with either of the packaged Manifold Alfresco connectors (AtomPub or WebService). I needed a method whereby the crawl runs each night and picks up any and all changes to the documents from the previous 24 hours. A common scenario. Unfortunately, I am still to achieve this. Having built and installed both the AMP and JAR files needed for the new connector, changes are still not coming through. In fact, I have two observations so far: 1. Changes to document content or properties does not cause the same document to be picked up by the Alfresco connector on the next run 2. Adding ‘Filter Configuration’ seems to do very little to change what is picked up IN DETAIL 1. Failing to pick up modified content Looking at the log files (which are set to debug) I can see that, upon the first crawl of Alfresco, Manifold sends the following requests: DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - Executing request GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1 DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1 DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> "GET /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1[\r][\n]" DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - Executing request GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1 DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1 DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> "GET /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 HTTP/1.1[\r][\n]" DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - Executing request GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content HTTP/1.1 DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content HTTP/1.1 DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> "GET /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content HTTP/1.1[\r][\n]" DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - Executing request GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a HTTP/1.1 DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a HTTP/1.1 DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> "GET /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a HTTP/1.1[\r][\n]" This picks up all of the content e.g. documents. Running a second crawl, without any other actions being done, results in the following requests: DEBUG 2015-10-28 05:26:31,854 (Startup thread) - Executing request GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D HTTP/1.1 DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D HTTP/1.1 DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> "GET /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D HTTP/1.1[\r][\n]” So I can see that, in the first instance, we are targeting content directly while, in the second, we are asking for changes. The problem is that no changes are returned from the second set of requests. The response from these calls is: DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " "totalNodes" : "0", [\r][\n]" DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " "elapsedTime" : "8",[\r][\n]" DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " "docs" : [[\r][\n]" DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " ],[\r][\n]" DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " "last_txn_id" : "352",[\r][\n]" DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " "last_acl_changeset_id" : "13",[\r][\n]" DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " "store_id" : "SpacesStore",[\r][\n]" DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " "store_protocol" : "workspace"[\r][\n]" DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << “}" Regardless of what changes I make to a document that I have been using for testing, the document is not updated. The response from the calls for changes (totalNodes) is always ‘0’. 2. Adding ‘Filter Configuration’ seems to do very little to change what is picked up Within my test Alfresco environment I have one site set up (Finance). Within the Finance doc library I have three test docs. No other changes have been made to the Alfresco instance. Running a crawl with no filter configurations set returns 81 items. This is via the URL in a browser. If I then set the Site Filter configuration to ‘Finance’ and apply, I still get 81 items when I re-run the crawl. I can see that the term ‘Finance’ is being added to the URL but this does not seem to change the behaviour. I am happy to spend time diagnosing this is there is anyone available to assist. Thanks Paul > On 27 Oct 2015, at 18:14, [email protected] wrote: > > Hi all, > > This is a question regarding the relatively new Alfresco Webscript connector. > > SETUP > I have a vanilla Alfresco Community 5.0 installation > One site has been created called 'Finance' > A handful of documents have been created in 'Finance' Doc Library. > I have cloned and packaged up the 'alfresco-indexer' > (https://github.com/maoo/alfresco-indexer) and have applied the AMP and > CLIENT packages to their respective environments. > > > ISSUE > The issue is that the default API call used by Manifold is returning nothing. > The full API call used by Manifold, and based on my config, is : > > /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D > > > TESTS > I have identified two streamlined URL's. The first one returns the documents > that exist in the doc library of the 'Finance' site. This URL is: > > /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%7D > > The second URL simply adds the site restriction. This URL returns nothing: > > http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D > > > > Can anyone explain why the documents do not return when only the containing > site is named in the API URL? > > Cheers > > Paul > >
