Hi Deanna, For the CMIS connector, I created CONNECTORS-1248 to cover the version info issue you describe.
Karl On Wed, Oct 28, 2015 at 8:08 AM, Delapasse, Deanna < [email protected]> wrote: > Hi Paul, > > I haven't read the entire thread, so I apologize if this is way off base... > > When I worked with the CMIS connector I had to modify the logic to append > document.getLastModificationDate().getTimeInMillis() to the versionString > for it to pick up changes. The Alfresco document version won't update when > you modify metadata. My memory is terrible, but I believe that even > modifying content may not do it unless you have the proper 'versioning' > aspect applied. > > Check inside Alfresco and see if your "version" is actually incrementing > as you expect. I was using an older Alfresco version and was not able to > run with the Alfresco connector, but the CMIS connector worked great for us! > > Good luck! > Deanna > > > > > On Wed, Oct 28, 2015 at 6:07 AM, Paul Farrell <[email protected]> > wrote: > >> The alfresco log snippet doesn’t really shed any more light. It simple >> doesn’t think that the document content has changed. >> >> 09:56:42,059 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] >> [http-apr-8080-exec-5] [getNodesByTransactionId] On Store >> workspace://SpacesStore >> 09:56:42,065 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] >> [http-apr-8080-exec-5] [getLastTransactionID] >> 09:56:42,065 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] >> [http-apr-8080-exec-5] [getNodesByAclChangesetId] On Store >> workspace://SpacesStore >> 09:56:42,070 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] >> [http-apr-8080-exec-5] [getLastAclChangeSetID] >> 09:56:42,070 DEBUG >> [com.github.maoo.indexer.webscripts.NodeChangesWebScript] >> [http-apr-8080-exec-5] Attaching 0 nodes to the WebScript template >> 09:56:42,079 DEBUG >> [com.github.maoo.indexer.webscripts.NodeChangesWebScript] >> [http-apr-8080-exec-9] Invoking Changes Webscript, using the following >> params >> lastTxnId: 352 >> lastAclChangesetId: 13 >> storeId: SpacesStore >> storeProtocol: workspace >> indexingFilters: >> {"aspectFilters":[],"metadataFilters":{},"mimetypeFilters":[],"siteFilters":["Finance"],"typeFilters":[]} >> >> 09:56:42,079 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] >> [http-apr-8080-exec-9] [getNodesByTransactionId] On Store >> workspace://SpacesStore >> 09:56:42,082 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] >> [http-apr-8080-exec-9] [getLastTransactionID] >> 09:56:42,082 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] >> [http-apr-8080-exec-9] [getNodesByAclChangesetId] On Store >> workspace://SpacesStore >> 09:56:42,087 DEBUG [com.github.maoo.indexer.dao.IndexingDaoImpl] >> [http-apr-8080-exec-9] [getLastAclChangeSetID] >> 09:56:42,087 DEBUG >> [com.github.maoo.indexer.webscripts.NodeChangesWebScript] >> [http-apr-8080-exec-9] Attaching 0 nodes to the WebScript template >> >> *Paul Farrell* >> Senior Search Consultant >> >> 109-123 Clifton Street, London EC2A 4LD >> *T* +44 (0) 207 183 6865 | funnelback.com <http://www.funnelback.com/> >> >> *UNITED KINGDOM* | AUSTRALIA | NEW ZEALAND | POLAND | UNITED STATES >> >> Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> - >> Twitter <https://twitter.com/funnelback> >> >> Funnelback UK Ltd is a limited liability company registered in England & >> Wales. Registered address: Zetland House 109-123, Clifton Street, London. >> EC2A 4LD. Company registration number: 07004264. >> >> On 28 Oct 2015, at 10:50, Rafa Haro <[email protected]> wrote: >> >> You’re welcome Paul. Just in case, could you check the Alfresco logs to >> see if there is something informative there? >> >> Cheers, >> Rafa >> >> >> >> >> On Wed, Oct 28, 2015 at 11:47 AM, Paul Farrell <[email protected]> >> wrote: >> >>> I see. That makes sense. >>> >>> No problem. Thanks for the feedback Rafa. Much appreciated. >>> >>> >>> >>> *Paul Farrell* >>> Senior Search Consultant >>> >>> 109-123 Clifton Street, London EC2A 4LD >>> *T* +44 (0) 207 183 6865 | funnelback.com <http://www.funnelback.com/> >>> >>> *UNITED KINGDOM* | AUSTRALIA | NEW ZEALAND | POLAND | UNITED STATES >>> >>> Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback> - >>> Twitter <https://twitter.com/funnelback> >>> >>> Funnelback UK Ltd is a limited liability company registered in England & >>> Wales. Registered address: Zetland House 109-123, Clifton Street, London. >>> EC2A 4LD. Company registration number: 07004264. >>> >>> On 28 Oct 2015, at 10:45, Rafa Haro <[email protected]> wrote: >>> >>> Hi Paul, >>> >>> Before contributing the Alfresco connector, we performed several tests >>> similar to yours using an Alfresco 4.x version. Therefore, initially, my >>> guess is the Webscript is not behaving correctly for Alfresco 5 instances. >>> I’m including Maurizio Pillitu (Alfresco Indexer main developer) in the >>> email thread. He might can provide some feedback about this or just confirm >>> my suspicions. >>> >>> Cheers, >>> Rafa >>> >>> >>> >>> >>> On Wed, Oct 28, 2015 at 11:33 AM, Paul Farrell <[email protected]> >>> wrote: >>> >>>> Hi all, >>>> >>>> In follow up to my recent email (below) I thought I would share my >>>> findings with the ‘Alfresco Indexer’ connector ( >>>> https://github.com/maoo/alfresco-indexer) in case someone may be able >>>> to advise on it’s usage. >>>> >>>> The reason I went to this is due to the lack of change control >>>> detection with either of the packaged Manifold Alfresco connectors (AtomPub >>>> or WebService). I needed a method whereby the crawl runs each night and >>>> picks up any and all changes to the documents from the previous 24 hours. A >>>> common scenario. >>>> >>>> Unfortunately, I am still to achieve this. >>>> >>>> Having built and installed both the AMP and JAR files needed for the >>>> new connector, changes are still not coming through. In fact, I have two >>>> observations so far: >>>> >>>> 1. Changes to document content or properties does not cause the same >>>> document to be picked up by the Alfresco connector on the next run >>>> 2. Adding ‘Filter Configuration’ seems to do very little to change what >>>> is picked up >>>> >>>> *IN DETAIL* >>>> *1. Failing to pick up modified content* >>>> >>>> Looking at the log files (which are set to debug) I can see that, upon >>>> the first crawl of Alfresco, Manifold sends the following requests: >>>> >>>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - Executing request >>>> GET >>>> /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 >>>> HTTP/1.1 >>>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >>>> >> GET >>>> /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 >>>> HTTP/1.1 >>>> DEBUG 2015-10-28 05:24:35,056 (Worker thread '1') - http-outgoing-239 >>>> >> "GET >>>> /alfresco/service/node/actions/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 >>>> HTTP/1.1[\r][\n]" >>>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - Executing request >>>> GET >>>> /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 >>>> HTTP/1.1 >>>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >>>> >> GET >>>> /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 >>>> HTTP/1.1 >>>> DEBUG 2015-10-28 05:24:35,070 (Worker thread '1') - http-outgoing-240 >>>> >> "GET >>>> /alfresco/service/node/details/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9 >>>> HTTP/1.1[\r][\n]" >>>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - Executing request >>>> GET >>>> /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content >>>> HTTP/1.1 >>>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >>>> >> GET >>>> /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content >>>> HTTP/1.1 >>>> DEBUG 2015-10-28 05:24:35,082 (Worker thread '1') - http-outgoing-241 >>>> >> "GET >>>> /alfresco/service/api/node/workspace/SpacesStore/267839b2-f466-42c5-9a35-cb3e41281bb9/content >>>> HTTP/1.1[\r][\n]" >>>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - Executing request >>>> GET >>>> /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a >>>> HTTP/1.1 >>>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >>>> >> GET >>>> /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a >>>> HTTP/1.1 >>>> DEBUG 2015-10-28 05:24:40,263 (Worker thread '1') - http-outgoing-242 >>>> >> "GET >>>> /alfresco/service/node/actions/workspace/SpacesStore/72948f84-4bf1-4ec5-8378-1bed0951600a >>>> HTTP/1.1[\r][\n]" >>>> >>>> This picks up all of the content e.g. documents. >>>> >>>> Running a second crawl, without any other actions being done, results >>>> in the following requests: >>>> >>>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - Executing request GET >>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D >>>> HTTP/1.1 >>>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> >>>> GET >>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D >>>> HTTP/1.1 >>>> DEBUG 2015-10-28 05:26:31,854 (Startup thread) - http-outgoing-248 >> >>>> "GET >>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=333&lastAclChangesetId=13&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D >>>> HTTP/1.1[\r][\n]” >>>> >>>> So I can see that, in the first instance, we are targeting content >>>> directly while, in the second, we are asking for changes. The problem is >>>> that no changes are returned from the second set of requests. The response >>>> from these calls is: >>>> >>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " >>>> "totalNodes" : "0", [\r][\n]" >>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " >>>> "elapsedTime" : "8",[\r][\n]" >>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " >>>> "docs" : [[\r][\n]" >>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " >>>> ],[\r][\n]" >>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " >>>> "last_txn_id" : "352",[\r][\n]" >>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " >>>> "last_acl_changeset_id" : "13",[\r][\n]" >>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " >>>> "store_id" : "SpacesStore",[\r][\n]" >>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << " >>>> "store_protocol" : "workspace"[\r][\n]" >>>> DEBUG 2015-10-28 05:56:42,218 (Startup thread) - http-outgoing-257 << >>>> “}" >>>> >>>> Regardless of what changes I make to a document that I have been using >>>> for testing, the document is not updated. The response from the calls for >>>> changes (totalNodes) is always ‘0’. >>>> >>>> >>>> *2. Adding ‘Filter Configuration’ seems to do very little to change >>>> what is picked up* >>>> >>>> Within my test Alfresco environment I have one site set up (Finance). >>>> Within the Finance doc library I have three test docs. No other changes >>>> have been made to the Alfresco instance. >>>> Running a crawl with no filter configurations set returns 81 items. >>>> This is via the URL in a browser. >>>> If I then set the Site Filter configuration to ‘Finance’ and apply, I >>>> still get 81 items when I re-run the crawl. >>>> I can see that the term ‘Finance’ is being added to the URL but this >>>> does not seem to change the behaviour. >>>> >>>> >>>> I am happy to spend time diagnosing this is there is anyone available >>>> to assist. >>>> >>>> Thanks >>>> >>>> Paul >>>> >>>> >>>> >>>> On 27 Oct 2015, at 18:14, [email protected] wrote: >>>> >>>> Hi all, >>>> >>>> This is a question regarding the relatively new Alfresco Webscript >>>> connector. >>>> >>>> SETUP >>>> I have a vanilla Alfresco Community 5.0 installation >>>> One site has been created called 'Finance' >>>> A handful of documents have been created in 'Finance' Doc Library. >>>> I have cloned and packaged up the 'alfresco-indexer' ( >>>> https://github.com/maoo/alfresco-indexer) and have applied the AMP and >>>> CLIENT packages to their respective environments. >>>> >>>> >>>> ISSUE >>>> The issue is that the default API call used by Manifold is returning >>>> nothing. The full API call used by Manifold, and based on my config, is : >>>> >>>> >>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%2C%22typeFilters%22%3A%5B%5D%2C%22mimetypeFilters%22%3A%5B%5D%2C%22aspectFilters%22%3A%5B%5D%2C%22metadataFilters%22%3A%7B%7D%7D >>>> >>>> >>>> TESTS >>>> I have identified two streamlined URL's. The first one returns the >>>> documents that exist in the doc library of the 'Finance' site. This URL is: >>>> >>>> >>>> /alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%7D >>>> >>>> The second URL simply adds the site restriction. This URL returns >>>> nothing: >>>> >>>> >>>> http://52.23.225.233:8080/alfresco/service/node/changes/workspace/SpacesStore?lastTxnId=0&lastAclChangesetId=0&indexingFilters=%7B%22siteFilters%22%3A%5B%22Finance%22%5D%7D >>>> >>>> >>>> >>>> Can anyone explain why the documents do not return when only the >>>> containing site is named in the API URL? >>>> >>>> Cheers >>>> >>>> Paul >>>> >>>> >>>> >>>> >>> >>> >> >> >
