I see. That makes sense. 

No problem. Thanks for the feedback Rafa. Much appreciated. 



Paul Farrell
Senior Search Consultant
 
109-123 Clifton Street, London EC2A 4LD
T +44 (0) 207 183 6865 | funnelback.com <http://www.funnelback.com/>

UNITED KINGDOM | AUSTRALIA | NEW ZEALAND | POLAND | UNITED STATES

Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> - 
Twitter <https://twitter.com/funnelback>

Funnelback UK Ltd is a limited liability company registered in England & Wales. 
Registered address: Zetland House 109-123, Clifton Street, London. EC2A 4LD. 
Company registration number: 07004264.

> On 28 Oct 2015, at 10:45, Rafa Haro <[email protected]> wrote:
> 
> Hi Paul, 
> 
> Before contributing the Alfresco connector, we performed several tests 
> similar to yours using an Alfresco 4.x version. Therefore, initially, my 
> guess is the Webscript is not behaving correctly for Alfresco 5 instances. 
> I’m including Maurizio Pillitu (Alfresco Indexer main developer) in the email 
> thread. He might can provide some feedback about this or just confirm my 
> suspicions. 
> 
> Cheers,
> Rafa
> 
> 
> 
> 
> On Wed, Oct 28, 2015 at 11:33 AM, Paul Farrell <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> Hi all,
> 
> In follow up to my recent email (below) I thought I would share my findings 
> with the ‘Alfresco Indexer’ connector 
> (https://github.com/maoo/alfresco-indexer 
> <https://github.com/maoo/alfresco-indexer>) in case someone may be able to 
> advise on it’s usage. 
> 
> The reason I went to this is due to the lack of change control detection with 
> either of the packaged Manifold Alfresco connectors (AtomPub or WebService). 
> I needed a method whereby the crawl runs each night and picks up any and all 
> changes to the documents from the previous 24 hours. A common scenario.
> 
> Unfortunately, I am still to achieve this. 
> 
> Having built and installed both the AMP and JAR files needed for the new 
> connector, changes are still not coming through. In fact, I have two 
> observations so far:
> 
> 1. Changes to document content or properties does not cause the same document 
> to be picked up by the Alfresco connector on the next run
> 2. Adding ‘Filter Configuration’ seems to do very little to change what is 
> picked up
> 
> IN DETAIL
> 1. Failing to pick up modified content
> 
> Looking at the log files (which are set to debug) I can see that, upon the 
> first crawl of Alfresco, Manifold sends the following requests:
> 
> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - Executing request GET 
> /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>  HTTP/1.1
> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> GET 
> /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>  HTTP/1.1
> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> "GET 
> /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>  HTTP/1.1[\r][\n]"
> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - Executing request GET 
> /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>  HTTP/1.1
> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> GET 
> /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>  HTTP/1.1
> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> "GET 
> /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>  HTTP/1.1[\r][\n]"
> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - Executing request GET 
> /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
>  HTTP/1.1
> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> GET 
> /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
>  HTTP/1.1
> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> "GET 
> /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
>  HTTP/1.1[\r][\n]"
> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - Executing request GET 
> /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
>  HTTP/1.1
> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> GET 
> /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
>  HTTP/1.1
> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> "GET 
> /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
>  HTTP/1.1[\r][\n]"
> 
> This picks up all of the content e.g. documents. 
> 
> Running a second crawl, without any other actions being done, results in the 
> following requests:
> 
> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - Executing request GET 
> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>  HTTP/1.1
> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> GET 
> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>  HTTP/1.1
> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> "GET 
> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>  HTTP/1.1[\r][\n]”
> 
> So I can see that, in the first instance, we are targeting content directly 
> while, in the second, we are asking for changes. The problem is that no 
> changes are returned from the second set of requests. The response from these 
> calls is:
> 
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  
> "totalNodes" : "0", [\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  
> "elapsedTime" : "8",[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  
> "docs" : [[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  
> ],[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "    
> "last_txn_id" : "352",[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "    
> "last_acl_changeset_id" : "13",[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  
> "store_id" : "SpacesStore",[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  
> "store_protocol" : "workspace"[\r][\n]"
> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << “}"
> 
> Regardless of what changes I make to a document that I have been using for 
> testing, the document is not updated. The response from the calls for changes 
> (totalNodes) is always ‘0’.
> 
> 
> 2. Adding ‘Filter Configuration’ seems to do very little to change what is 
> picked up
> 
> Within my test Alfresco environment I have one site set up (Finance). Within 
> the Finance doc library I have three test docs. No other changes have been 
> made to the Alfresco instance. 
> Running a crawl with no filter configurations set returns 81 items. This is 
> via the URL in a browser.
> If I then set the Site Filter configuration to ‘Finance’ and apply, I still 
> get 81 items when I re-run the crawl. 
> I can see that the term ‘Finance’ is being added to the URL but this does not 
> seem to change the behaviour. 
> 
> 
> I am happy to spend time diagnosing this is there is anyone available to 
> assist. 
> 
> Thanks
> 
> Paul
> 
> 
> 
>> On 27 Oct 2015, at 18:14, [email protected] 
>> <mailto:[email protected]> wrote:
>> 
>> Hi all,
>> 
>> This is a question regarding the relatively new Alfresco Webscript 
>> connector. 
>> 
>> SETUP
>> I have a vanilla Alfresco Community 5.0 installation
>> One site has been created called 'Finance'
>> A handful of documents have been created in 'Finance' Doc Library.
>> I have cloned and packaged up the 'alfresco-indexer' 
>> (https://github.com/maoo/alfresco-indexer 
>> <https://github.com/maoo/alfresco-indexer>) and have applied the AMP and 
>> CLIENT packages to their respective environments. 
>> 
>> 
>> ISSUE
>> The issue is that the default API call used by Manifold is returning 
>> nothing. The full API call used by Manifold, and based on my config, is :
>> 
>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>> 
>> 
>> TESTS
>> I have identified two streamlined URL's. The first one returns the documents 
>> that exist in the doc library of the 'Finance' site. This URL is:
>> 
>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%7D
>> 
>> The second URL simply adds the site restriction. This URL returns nothing:
>> 
>> http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D
>>  
>> <http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D>
>> 
>> 
>> 
>> Can anyone explain why the documents do not return when only the containing 
>> site is named in the API URL?
>> 
>> Cheers
>> 
>> Paul
>> 
>> 
> 
> 

Reply via email to