The alfresco log snippet doesn’t really shed any more light. It simple doesn’t
think that the document content has changed.
09:56:42,059 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
[http-apr-8080-exec-5] [getNodesByTransactionId] On Store
workspace://SpacesStore
09:56:42,065 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
[http-apr-8080-exec-5] [getLastTransactionID]
09:56:42,065 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
[http-apr-8080-exec-5] [getNodesByAclChangesetId] On Store
workspace://SpacesStore
09:56:42,070 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
[http-apr-8080-exec-5] [getLastAclChangeSetID]
09:56:42,070 DEBUG [com.github.maoo.indexer.webscripts.NodeChangesWebScript]
[http-apr-8080-exec-5] Attaching 0 nodes to the WebScript template
09:56:42,079 DEBUG [com.github.maoo.indexer.webscripts.NodeChangesWebScript]
[http-apr-8080-exec-9] Invoking Changes Webscript, using the following params
lastTxnId: 352
lastAclChangesetId: 13
storeId: SpacesStore
storeProtocol: workspace
indexingFilters:
{"aspectFilters":[],"metadataFilters":{},"mimetypeFilters":[],"siteFilters":["Finance"],"typeFilters":[]}
09:56:42,079 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
[http-apr-8080-exec-9] [getNodesByTransactionId] On Store
workspace://SpacesStore
09:56:42,082 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
[http-apr-8080-exec-9] [getLastTransactionID]
09:56:42,082 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
[http-apr-8080-exec-9] [getNodesByAclChangesetId] On Store
workspace://SpacesStore
09:56:42,087 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl]
[http-apr-8080-exec-9] [getLastAclChangeSetID]
09:56:42,087 DEBUG [com.github.maoo.indexer.webscripts.NodeChangesWebScript]
[http-apr-8080-exec-9] Attaching 0 nodes to the WebScript template
Paul Farrell
Senior Search Consultant
109-123 Clifton Street, London EC2A 4LD
T +44 (0) 207 183 6865 | funnelback.com <http://www.funnelback.com/>
UNITED KINGDOM | AUSTRALIA | NEW ZEALAND | POLAND | UNITED STATES
Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> -
Twitter <https://twitter.com/funnelback>
Funnelback UK Ltd is a limited liability company registered in England & Wales.
Registered address: Zetland House 109-123, Clifton Street, London. EC2A 4LD.
Company registration number: 07004264.
> On 28 Oct 2015, at 10:50, Rafa Haro <[email protected]> wrote:
>
> You’re welcome Paul. Just in case, could you check the Alfresco logs to see
> if there is something informative there?
>
> Cheers,
> Rafa
>
>
>
>
> On Wed, Oct 28, 2015 at 11:47 AM, Paul Farrell <[email protected]
> <mailto:[email protected]>> wrote:
>
> I see. That makes sense.
>
> No problem. Thanks for the feedback Rafa. Much appreciated.
>
>
>
> Paul Farrell
> Senior Search Consultant
>
> 109-123 Clifton Street, London EC2A 4LD
> T +44 (0) 207 183 6865 | funnelback.com <http://www.funnelback.com/>
>
> UNITED KINGDOM | AUSTRALIA | NEW ZEALAND | POLAND | UNITED STATES
>
> Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> -
> Twitter <https://twitter.com/funnelback>
>
> Funnelback UK Ltd is a limited liability company registered in England &
> Wales. Registered address: Zetland House 109-123, Clifton Street, London.
> EC2A 4LD. Company registration number: 07004264.
>
>> On 28 Oct 2015, at 10:45, Rafa Haro <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Hi Paul,
>>
>> Before contributing the Alfresco connector, we performed several tests
>> similar to yours using an Alfresco 4.x version. Therefore, initially, my
>> guess is the Webscript is not behaving correctly for Alfresco 5 instances.
>> I’m including Maurizio Pillitu (Alfresco Indexer main developer) in the
>> email thread. He might can provide some feedback about this or just confirm
>> my suspicions.
>>
>> Cheers,
>> Rafa
>>
>>
>>
>>
>> On Wed, Oct 28, 2015 at 11:33 AM, Paul Farrell <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Hi all,
>>
>> In follow up to my recent email (below) I thought I would share my findings
>> with the ‘Alfresco Indexer’ connector
>> (https://github.com/maoo/alfresco-indexer
>> <https://github.com/maoo/alfresco-indexer>) in case someone may be able to
>> advise on it’s usage.
>>
>> The reason I went to this is due to the lack of change control detection
>> with either of the packaged Manifold Alfresco connectors (AtomPub or
>> WebService). I needed a method whereby the crawl runs each night and picks
>> up any and all changes to the documents from the previous 24 hours. A common
>> scenario.
>>
>> Unfortunately, I am still to achieve this.
>>
>> Having built and installed both the AMP and JAR files needed for the new
>> connector, changes are still not coming through. In fact, I have two
>> observations so far:
>>
>> 1. Changes to document content or properties does not cause the same
>> document to be picked up by the Alfresco connector on the next run
>> 2. Adding ‘Filter Configuration’ seems to do very little to change what is
>> picked up
>>
>> IN DETAIL
>> 1. Failing to pick up modified content
>>
>> Looking at the log files (which are set to debug) I can see that, upon the
>> first crawl of Alfresco, Manifold sends the following requests:
>>
>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - Executing request GET
>> /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>> HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >> GET
>> /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>> HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >>
>> "GET
>> /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>> HTTP/1.1[\r][\n]"
>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - Executing request GET
>> /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>> HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >> GET
>> /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>> HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >>
>> "GET
>> /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9
>> HTTP/1.1[\r][\n]"
>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - Executing request GET
>> /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
>> HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >> GET
>> /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
>> HTTP/1.1
>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >>
>> "GET
>> /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content
>> HTTP/1.1[\r][\n]"
>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - Executing request GET
>> /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
>> HTTP/1.1
>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >> GET
>> /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
>> HTTP/1.1
>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >>
>> "GET
>> /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a
>> HTTP/1.1[\r][\n]"
>>
>> This picks up all of the content e.g. documents.
>>
>> Running a second crawl, without any other actions being done, results in the
>> following requests:
>>
>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - Executing request GET
>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>> HTTP/1.1
>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> GET
>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>> HTTP/1.1
>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> "GET
>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>> HTTP/1.1[\r][\n]”
>>
>> So I can see that, in the first instance, we are targeting content directly
>> while, in the second, we are asking for changes. The problem is that no
>> changes are returned from the second set of requests. The response from
>> these calls is:
>>
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>> "totalNodes" : "0", [\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>> "elapsedTime" : "8",[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>> "docs" : [[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>> ],[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>> "last_txn_id" : "352",[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>> "last_acl_changeset_id" : "13",[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>> "store_id" : "SpacesStore",[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << "
>> "store_protocol" : "workspace"[\r][\n]"
>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << “}"
>>
>> Regardless of what changes I make to a document that I have been using for
>> testing, the document is not updated. The response from the calls for
>> changes (totalNodes) is always ‘0’.
>>
>>
>> 2. Adding ‘Filter Configuration’ seems to do very little to change what is
>> picked up
>>
>> Within my test Alfresco environment I have one site set up (Finance). Within
>> the Finance doc library I have three test docs. No other changes have been
>> made to the Alfresco instance.
>> Running a crawl with no filter configurations set returns 81 items. This is
>> via the URL in a browser.
>> If I then set the Site Filter configuration to ‘Finance’ and apply, I still
>> get 81 items when I re-run the crawl.
>> I can see that the term ‘Finance’ is being added to the URL but this does
>> not seem to change the behaviour.
>>
>>
>> I am happy to spend time diagnosing this is there is anyone available to
>> assist.
>>
>> Thanks
>>
>> Paul
>>
>>
>>
>>> On 27 Oct 2015, at 18:14, [email protected]
>>> <mailto:[email protected]> wrote:
>>>
>>> Hi all,
>>>
>>> This is a question regarding the relatively new Alfresco Webscript
>>> connector.
>>>
>>> SETUP
>>> I have a vanilla Alfresco Community 5.0 installation
>>> One site has been created called 'Finance'
>>> A handful of documents have been created in 'Finance' Doc Library.
>>> I have cloned and packaged up the 'alfresco-indexer'
>>> (https://github.com/maoo/alfresco-indexer
>>> <https://github.com/maoo/alfresco-indexer>) and have applied the AMP and
>>> CLIENT packages to their respective environments.
>>>
>>>
>>> ISSUE
>>> The issue is that the default API call used by Manifold is returning
>>> nothing. The full API call used by Manifold, and based on my config, is :
>>>
>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D
>>>
>>>
>>> TESTS
>>> I have identified two streamlined URL's. The first one returns the
>>> documents that exist in the doc library of the 'Finance' site. This URL is:
>>>
>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%7D
>>>
>>> The second URL simply adds the site restriction. This URL returns nothing:
>>>
>>> http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D
>>>
>>> <http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D>
>>>
>>>
>>>
>>> Can anyone explain why the documents do not return when only the containing
>>> site is named in the API URL?
>>>
>>> Cheers
>>>
>>> Paul
>>>
>>>
>>
>>
>
>