I see. That makes sense. No problem. Thanks for the feedback Rafa. Much appreciated.
Paul Farrell Senior Search Consultant 109-123 Clifton Street, London EC2A 4LD T +44 (0) 207 183 6865 | funnelback.com <http://www.funnelback.com/> UNITED KINGDOM | AUSTRALIA | NEW ZEALAND | POLAND | UNITED STATES Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> - Twitter <https://twitter.com/funnelback> Funnelback UK Ltd is a limited liability company registered in England & Wales. Registered address: Zetland House 109-123, Clifton Street, London. EC2A 4LD. Company registration number: 07004264. > On 28 Oct 2015, at 10:45, Rafa Haro <[email protected]> wrote: > > Hi Paul, > > Before contributing the Alfresco connector, we performed several tests > similar to yours using an Alfresco 4.x version. Therefore, initially, my > guess is the Webscript is not behaving correctly for Alfresco 5 instances. > I’m including Maurizio Pillitu (Alfresco Indexer main developer) in the email > thread. He might can provide some feedback about this or just confirm my > suspicions. > > Cheers, > Rafa > > > > > On Wed, Oct 28, 2015 at 11:33 AM, Paul Farrell <[email protected] > <mailto:[email protected]>> wrote: > > Hi all, > > In follow up to my recent email (below) I thought I would share my findings > with the ‘Alfresco Indexer’ connector > (https://github.com/maoo/alfresco-indexer > <https://github.com/maoo/alfresco-indexer>) in case someone may be able to > advise on it’s usage. > > The reason I went to this is due to the lack of change control detection with > either of the packaged Manifold Alfresco connectors (AtomPub or WebService). > I needed a method whereby the crawl runs each night and picks up any and all > changes to the documents from the previous 24 hours. A common scenario. > > Unfortunately, I am still to achieve this. > > Having built and installed both the AMP and JAR files needed for the new > connector, changes are still not coming through. In fact, I have two > observations so far: > > 1. Changes to document content or properties does not cause the same document > to be picked up by the Alfresco connector on the next run > 2. Adding ‘Filter Configuration’ seems to do very little to change what is > picked up > > IN DETAIL > 1. Failing to pick up modified content > > Looking at the log files (which are set to debug) I can see that, upon the > first crawl of Alfresco, Manifold sends the following requests: > > DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - Executing request GET > /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 > HTTP/1.1 > DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> GET > /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 > HTTP/1.1 > DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> "GET > /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 > HTTP/1.1[\r][\n]" > DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - Executing request GET > /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 > HTTP/1.1 > DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> GET > /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 > HTTP/1.1 > DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> "GET > /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 > HTTP/1.1[\r][\n]" > DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - Executing request GET > /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content > HTTP/1.1 > DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> GET > /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content > HTTP/1.1 > DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> "GET > /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content > HTTP/1.1[\r][\n]" > DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - Executing request GET > /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a > HTTP/1.1 > DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> GET > /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a > HTTP/1.1 > DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> "GET > /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a > HTTP/1.1[\r][\n]" > > This picks up all of the content e.g. documents. > > Running a second crawl, without any other actions being done, results in the > following requests: > > DEBUG 2015-10-28 05:26:31,854 (Startup thread) - Executing request GET > /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D > HTTP/1.1 > DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> GET > /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D > HTTP/1.1 > DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> "GET > /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D > HTTP/1.1[\r][\n]” > > So I can see that, in the first instance, we are targeting content directly > while, in the second, we are asking for changes. The problem is that no > changes are returned from the second set of requests. The response from these > calls is: > > DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " > "totalNodes" : "0", [\r][\n]" > DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " > "elapsedTime" : "8",[\r][\n]" > DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " > "docs" : [[\r][\n]" > DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " > ],[\r][\n]" > DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " > "last_txn_id" : "352",[\r][\n]" > DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " > "last_acl_changeset_id" : "13",[\r][\n]" > DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " > "store_id" : "SpacesStore",[\r][\n]" > DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " > "store_protocol" : "workspace"[\r][\n]" > DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << “}" > > Regardless of what changes I make to a document that I have been using for > testing, the document is not updated. The response from the calls for changes > (totalNodes) is always ‘0’. > > > 2. Adding ‘Filter Configuration’ seems to do very little to change what is > picked up > > Within my test Alfresco environment I have one site set up (Finance). Within > the Finance doc library I have three test docs. No other changes have been > made to the Alfresco instance. > Running a crawl with no filter configurations set returns 81 items. This is > via the URL in a browser. > If I then set the Site Filter configuration to ‘Finance’ and apply, I still > get 81 items when I re-run the crawl. > I can see that the term ‘Finance’ is being added to the URL but this does not > seem to change the behaviour. > > > I am happy to spend time diagnosing this is there is anyone available to > assist. > > Thanks > > Paul > > > >> On 27 Oct 2015, at 18:14, [email protected] >> <mailto:[email protected]> wrote: >> >> Hi all, >> >> This is a question regarding the relatively new Alfresco Webscript >> connector. >> >> SETUP >> I have a vanilla Alfresco Community 5.0 installation >> One site has been created called 'Finance' >> A handful of documents have been created in 'Finance' Doc Library. >> I have cloned and packaged up the 'alfresco-indexer' >> (https://github.com/maoo/alfresco-indexer >> <https://github.com/maoo/alfresco-indexer>) and have applied the AMP and >> CLIENT packages to their respective environments. >> >> >> ISSUE >> The issue is that the default API call used by Manifold is returning >> nothing. The full API call used by Manifold, and based on my config, is : >> >> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D >> >> >> TESTS >> I have identified two streamlined URL's. The first one returns the documents >> that exist in the doc library of the 'Finance' site. This URL is: >> >> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%7D >> >> The second URL simply adds the site restriction. This URL returns nothing: >> >> http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D >> >> <http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D> >> >> >> >> Can anyone explain why the documents do not return when only the containing >> site is named in the API URL? >> >> Cheers >> >> Paul >> >> > >
