The alfresco log snippet doesn’t really shed any more light. It simple doesn’t 
think that the document content has changed. 

09:56:42,059 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] 
[http-apr-8080-exec-5] [getNodesByTransactionId] On Store 
workspace://SpacesStore
09:56:42,065 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] 
[http-apr-8080-exec-5] [getLastTransactionID]
09:56:42,065 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] 
[http-apr-8080-exec-5] [getNodesByAclChangesetId] On Store 
workspace://SpacesStore
09:56:42,070 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] 
[http-apr-8080-exec-5] [getLastAclChangeSetID]
09:56:42,070 DEBUG [com.github.maoo.indexer.webscripts.NodeChangesWebScript] 
[http-apr-8080-exec-5] Attaching 0 nodes to the WebScript template
09:56:42,079 DEBUG [com.github.maoo.indexer.webscripts.NodeChangesWebScript] 
[http-apr-8080-exec-9] Invoking Changes Webscript, using the following params
lastTxnId: 352
lastAclChangesetId: 13
storeId: SpacesStore
storeProtocol: workspace
indexingFilters: 
{"aspectFilters":[],"metadataFilters":{},"mimetypeFilters":[],"siteFilters":["Finance"],"typeFilters":[]}

09:56:42,079 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] 
[http-apr-8080-exec-9] [getNodesByTransactionId] On Store 
workspace://SpacesStore
09:56:42,082 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] 
[http-apr-8080-exec-9] [getLastTransactionID]
09:56:42,082 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] 
[http-apr-8080-exec-9] [getNodesByAclChangesetId] On Store 
workspace://SpacesStore
09:56:42,087 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] 
[http-apr-8080-exec-9] [getLastAclChangeSetID]
09:56:42,087 DEBUG [com.github.maoo.indexer.webscripts.NodeChangesWebScript] 
[http-apr-8080-exec-9] Attaching 0 nodes to the WebScript template

Paul Farrell
Senior Search Consultant
 
109-123 Clifton Street, London EC2A 4LD
T +44 (0) 207 183 6865 | funnelback.com <http://www.funnelback.com/>

UNITED KINGDOM | AUSTRALIA | NEW ZEALAND | POLAND | UNITED STATES

Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> - 
Twitter <https://twitter.com/funnelback>

Funnelback UK Ltd is a limited liability company registered in England & Wales. 
Registered address: Zetland House 109-123, Clifton Street, London. EC2A 4LD. 
Company registration number: 07004264.

> On 28 Oct 2015, at 10:50, Rafa Haro <[email protected]> wrote:
> 
> You’re welcome Paul. Just in case, could you check the Alfresco logs to see 
> if there is something informative there?
> 
> Cheers,
> Rafa
> 
> 
> 
> 
> On Wed, Oct 28, 2015 at 11:47 AM, Paul Farrell <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> I see. That makes sense. 
> 
> No problem. Thanks for the feedback Rafa. Much appreciated. 
> 
> 
> 
> Paul Farrell
> Senior Search Consultant
>  
> 109-123 Clifton Street, London EC2A 4LD
> T +44 (0) 207 183 6865 | funnelback.com <http://www.funnelback.com/>
> 
> UNITED KINGDOM | AUSTRALIA | NEW ZEALAND | POLAND | UNITED STATES
> 
> Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> - 
> Twitter <https://twitter.com/funnelback>
> 
> Funnelback UK Ltd is a limited liability company registered in England & 
> Wales. Registered address: Zetland House 109-123, Clifton Street, London. 
> EC2A 4LD. Company registration number: 07004264.
> 
>> On 28 Oct 2015, at 10:45, Rafa Haro <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hi Paul, 
>> 
>> Before contributing the Alfresco connector, we performed several tests 
>> similar to yours using an Alfresco 4.x version. Therefore, initially, my 
>> guess is the Webscript is not behaving correctly for Alfresco 5 instances. 
>> I’m including Maurizio Pillitu (Alfresco Indexer main developer) in the 
>> email thread. He might can provide some feedback about this or just confirm 
>> my suspicions. 
>> 
>> Cheers,
>> Rafa
>> 
>> 
>> 
>> 
>> On Wed, Oct 28, 2015 at 11:33 AM, Paul Farrell <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hi all,
>> 
>> In follow up to my recent email (below) I thought I would share my findings 
>> with the ‘Alfresco Indexer’ connector 
>> (https://github.com/maoo/alfresco-indexer 
>> <https://github.com/maoo/alfresco-indexer>) in case someone may be able to 
>> advise on it’s usage. 
>> 
>> The reason I went to this is due to the lack of change control detection 
>> with either of the packaged Manifold Alfresco connectors (AtomPub or 
>> WebService). I needed a method whereby the crawl runs each night and picks 
>> up any and all changes to the documents from the previous 24 hours. A common 
>> scenario.
>> 
>> Unfortunately, I am still to achieve this. 
>> 
>> Having built and installed both the AMP and JAR files needed for the new 
>> connector, changes are still not coming through. In fact, I have two 
>> observations so far:
>> 
>> 1. Changes to document content or properties does not cause the same 
>> document to be picked up by the Alfresco connector on the next run
>> 2. Adding ‘Filter Configuration’ seems to do very little to change what is 
>> picked up
>> 
>> IN DETAIL
>> 1. Failing to pick up modified content
>> 
>> Looking at the log files (which are set to debug) I can see that, upon the 
>> first crawl of Alfresco, Manifold sends the following requests:
>> 
>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - Executing request GET 
>> /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>  HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> GET 
>> /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>  HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> 
>> "GET 
>> /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>  HTTP/1.1[\r][\n]"
>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - Executing request GET 
>> /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>  HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> GET 
>> /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>  HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> 
>> "GET 
>> /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>>  HTTP/1.1[\r][\n]"
>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - Executing request GET 
>> /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
>>  HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> GET 
>> /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
>>  HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> 
>> "GET 
>> /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
>>  HTTP/1.1[\r][\n]"
>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - Executing request GET 
>> /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
>>  HTTP/1.1
>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> GET 
>> /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
>>  HTTP/1.1
>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> 
>> "GET 
>> /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
>>  HTTP/1.1[\r][\n]"
>> 
>> This picks up all of the content e.g. documents. 
>> 
>> Running a second crawl, without any other actions being done, results in the 
>> following requests:
>> 
>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - Executing request GET 
>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>>  HTTP/1.1
>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> GET 
>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>>  HTTP/1.1
>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> "GET 
>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>>  HTTP/1.1[\r][\n]”
>> 
>> So I can see that, in the first instance, we are targeting content directly 
>> while, in the second, we are asking for changes. The problem is that no 
>> changes are returned from the second set of requests. The response from 
>> these calls is:
>> 
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  
>> "totalNodes" : "0", [\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  
>> "elapsedTime" : "8",[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  
>> "docs" : [[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  
>> ],[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "    
>> "last_txn_id" : "352",[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "    
>> "last_acl_changeset_id" : "13",[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  
>> "store_id" : "SpacesStore",[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "  
>> "store_protocol" : "workspace"[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << “}"
>> 
>> Regardless of what changes I make to a document that I have been using for 
>> testing, the document is not updated. The response from the calls for 
>> changes (totalNodes) is always ‘0’.
>> 
>> 
>> 2. Adding ‘Filter Configuration’ seems to do very little to change what is 
>> picked up
>> 
>> Within my test Alfresco environment I have one site set up (Finance). Within 
>> the Finance doc library I have three test docs. No other changes have been 
>> made to the Alfresco instance. 
>> Running a crawl with no filter configurations set returns 81 items. This is 
>> via the URL in a browser.
>> If I then set the Site Filter configuration to ‘Finance’ and apply, I still 
>> get 81 items when I re-run the crawl. 
>> I can see that the term ‘Finance’ is being added to the URL but this does 
>> not seem to change the behaviour. 
>> 
>> 
>> I am happy to spend time diagnosing this is there is anyone available to 
>> assist. 
>> 
>> Thanks
>> 
>> Paul
>> 
>> 
>> 
>>> On 27 Oct 2015, at 18:14, [email protected] 
>>> <mailto:[email protected]> wrote:
>>> 
>>> Hi all,
>>> 
>>> This is a question regarding the relatively new Alfresco Webscript 
>>> connector. 
>>> 
>>> SETUP
>>> I have a vanilla Alfresco Community 5.0 installation
>>> One site has been created called 'Finance'
>>> A handful of documents have been created in 'Finance' Doc Library.
>>> I have cloned and packaged up the 'alfresco-indexer' 
>>> (https://github.com/maoo/alfresco-indexer 
>>> <https://github.com/maoo/alfresco-indexer>) and have applied the AMP and 
>>> CLIENT packages to their respective environments. 
>>> 
>>> 
>>> ISSUE
>>> The issue is that the default API call used by Manifold is returning 
>>> nothing. The full API call used by Manifold, and based on my config, is :
>>> 
>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>>> 
>>> 
>>> TESTS
>>> I have identified two streamlined URL's. The first one returns the 
>>> documents that exist in the doc library of the 'Finance' site. This URL is:
>>> 
>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%7D
>>> 
>>> The second URL simply adds the site restriction. This URL returns nothing:
>>> 
>>> http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D
>>>  
>>> <http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D>
>>> 
>>> 
>>> 
>>> Can anyone explain why the documents do not return when only the containing 
>>> site is named in the API URL?
>>> 
>>> Cheers
>>> 
>>> Paul
>>> 
>>> 
>> 
>> 
> 
> 

Reply via email to