Re: Wrong Index selection for query?

2022-11-15 Thread Jorge Flórez
Hi,
as additional info, I executed the query using  "explain measure":

explain measure SELECT [jcr:uuid]  FROM [RADICADO_MIGRADO]  WHERE
 [Num_Radicado] = 'R-2022-005778'

and the result was:

[RADICADO_MIGRADO] as [RADICADO_MIGRADO] /*
lucene:Index1(/oak:index/Index1) Num_Radicado:R-2022-005778 where
[RADICADO_MIGRADO].[Num_Radicado] = 'R-2022-005778' */ cost: {
\"RADICADO_MIGRADO\": { perEntry: 1.0, perExecution: 1.0, count: 52210
} }

It seems the correct index would be used, but as you read in my previous
mail, *that did not happen*. Any help is appreciated.

Regards.

Jorge

El lun, 14 nov 2022 a las 13:02, Jorge Flórez ()
escribió:

> Hello,
>
> in a repository we have (very large it seems) there are two index
> definitions. Please see the image:
>
> https://drive.google.com/file/d/1KS2MZHfj1aRoWm7v6o3kbNFCPormxEft/view?usp=sharing
>
> One index to make the depiction of a content tree faster (Index2, which
> indexes nodes of type nt:folder) and one to make queries over a specific
> node type and property faster (Index1, which indexes nodes of type
> RADICADO_MIGRADO and the property Num_Radicado).
>
> When I use queries like
>
> SELECT [jcr:uuid]  FROM [RADICADO_MIGRADO]  WHERE  [Num_Radicado] =
> 'R-2022-005778' and isdescendantnode('/')
>
> SELECT [jcr:uuid]  FROM [RADICADO_MIGRADO]  WHERE  [Num_Radicado] =
> 'R-2022-005778'
>
> Index2 is being picked. Which results in:
> The query read or traversed more than 10 nodes.:
> java.lang.UnsupportedOperationException: The query read or traversed more
> than 10 nodes. To avoid affecting other tasks, processing was stopped.
>
> Why is Index2 picked, having that index1 is specific for that node type
> and indexes that property? in this case both indexes are returning the same
> cost...
>
> Thanks in advance.
>
> P.S.
> The cost calculation and chosen plan for each query is here:
>
> Parsing JCR-SQL2 statement: SELECT [jcr:uuid]  FROM [RADICADO_MIGRADO]
>  WHERE  [Num_Radicado] = 'R-2022-005778' and isdescendantnode('/')
> cost using filter Filter(query=SELECT [jcr:uuid]  FROM [RADICADO_MIGRADO]
>  WHERE  [Num_Radicado] = 'R-2022-005778' and isdescendantnode('/'),
> path=//*, property=[Num_Radicado=[R-2022-005778]])
> cost for reference is Infinity
> cost for property is Infinity
> cost for nodeType is 409504.0
>
> *cost for [/oak:index/Index2] of type (lucene-property) with plan
> [lucene:Index2(/oak:index/Index2) jcr:primaryType:RADICADO_MIGRADO] is
> 3.00cost for [/oak:index/Index1] of type (lucene-property) with plan
> [lucene:Index1(/oak:index/Index1) Num_Radicado:R-2022-005778] is 3.00*
> cost for lucene-property is Infinity
> cost for aggregate lucene is Infinity
> selected index
> org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex@283b96a
> with plan /oak:index/Index2 and
> org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex@283b96a
> with plan /oak:index/Index1 have similar costs 3.0 and 3.0 for query
> Filter(query=SELECT [jcr:uuid]  FROM [RADICADO_MIGRADO]  WHERE
>  [Num_Radicado] = 'R-2022-005778' and isdescendantnode('/'), path=//*,
> property=[Num_Radicado=[R-2022-005778]]) - check query explanation / index
> definitions
> cost for traverse is 3823716.0
> count: 1 query: SELECT [jcr:uuid]  FROM [RADICADO_MIGRADO]  WHERE
>  [Num_Radicado] = 'x' and isdescendantnode('x')
> query execute SELECT [jcr:uuid]  FROM [RADICADO_MIGRADO]  WHERE
>  [Num_Radicado] = 'R-2022-005778' and isdescendantnode('/')
> query plan [RADICADO_MIGRADO] as [RADICADO_MIGRADO] /*
> lucene:Index2(/oak:index/Index2) jcr:primaryType:RADICADO_MIGRADO where
> ([RADICADO_MIGRADO].[Num_Radicado] = 'R-2022-005778') and
> (isdescendantnode([RADICADO_MIGRADO], [/])) */
> The query read or traversed more than 10 nodes.
>
> Parsing JCR-SQL2 statement: SELECT [jcr:uuid]  FROM [RADICADO_MIGRADO]
>  WHERE  [Num_Radicado] = 'R-2022-005778'
> cost using filter Filter(query=SELECT [jcr:uuid]  FROM [RADICADO_MIGRADO]
>  WHERE  [Num_Radicado] = 'R-2022-005778', path=*,
> property=[Num_Radicado=[R-2022-005778]])
> cost for reference is Infinity
> cost for property is Infinity
> cost for nodeType is 409504.0
>
> *cost for [/oak:index/Index2] of type (lucene-property) with plan
> [lucene:Index2(/oak:index/Index2) jcr:primaryType:RADICADO_MIGRADO] is
> 3.00cost for [/oak:index/Index1] of type (lucene-property) with plan
> [lucene:Index1(/oak:index/Index1) Num_Radicado:R-2022-005778] is 3.00*
> cost for lucene-property is Infinity
> cost for aggregate lucene is Infinity
> selected index
> org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex@1cdf077b
> with plan /oak:index/Index2 and
> org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex@1cdf077b
> with plan /oak:index/Index1 have similar costs 3.0 and 3.0 for query
> Filter(query=SELECT [jcr:uuid]  FROM [RADICADO_MIGRADO]  WHERE
>  [Num_Radicado] = 'R-2022-005778', path=*,
> property=[Num_Radicado=[R-2022-005778]]) - check query explanation / index
> definitions
> cost for 

Re: Unresolved Conflict Question

2022-11-15 Thread Angela Schreiber
hi jorge

imho this would be a great addition to the oak documentation e.g. linked into 
the section https://jackrabbit.apache.org/oak/docs/dos_and_donts.html

would it be possible for you to create ticket and a PR for oak-doc?

kind regards
angela

From: Jorge Flórez 
Sent: Monday, November 14, 2022 21:32
To: oak-dev@jackrabbit.apache.org 
Subject: Re: Unresolved Conflict Question

EXTERNAL: Use caution when clicking on links or opening attachments.


Hi,
after some reading, testing and debugging, I think I understand how it
works. So I thought write something if anyone is in the same situation:

- There are no nodes (I think) with a "conflict" state in a content
repository. I searched the paths reported in my log with conflicts and
found no indicator (special mixin, child nodes, properties, etc). The
rep:MergeConflict is "added" by the AnnotatingConflictHandler in runtime so
the ConflictValidator checks if the node "has" that mixin and then throws a
CommitFailedException, discarding the changes that were about to be saved.
It is not like that mixin is added to the node in the repository.

- To avoid or minimize conflicts:
1. Try to keep the JCR sessions as short as possible. i.e. create the
session, make changes, call session.save(), call session.logout. If you
need to do something additional in the repository, a few lines after (maybe
after some processing that could take some time), create the session again
and repeat.

2. Try to use session.refresh(true) before saving, if you think that some
significant time can pass between the login() and the session.save() call.

3. You could write your own conflict handler and add it when configuring
your Oak or WhiteBoard instances. Only if you know what you are doing (i.e.
you know how to resolve conflict in each one of the possible situations).
By default, the AnnotatingConflictHandler instance will discard your
changes and your commit will fail. The worst that will happen is that some
changes were not persisted (if you are ok with that).
Please check
org.apache.jackrabbit.oak.plugins.commit.JcrLastModifiedConflictHandler. It
seems like a good example to follow.

4. Enable the DEBUG level on
org.apache.jackrabbit.oak.plugins.commit.MergingNodeStateDiff and
org.apache.jackrabbit.oak.plugins.commit.ConflictValidator loggers if you
want to have more information on the circumstances of a conflict that
happened in a point of time.

References
https://cqdump.joerghoh.de/2015/11/02/aem-anti-pattern-long-running-sessions/
https://cqdump.joerghoh.de/2015/12/22/how-can-i-avoid-oak-writemerge-conflicts/
https://jackrabbit.apache.org/oak/docs/FAQ.html
https://adapt.to/content/dam/adaptto/production/presentations/2015/adaptTo2015-Conflict-handling-with-Oak-Michael-Duerig-with-comments.pdf/_jcr_content/renditions/original./adaptTo2015-Conflict-handling-with-Oak-Michael-Duerig-with-comments.pdf

Thanks.

Jorge


El mié, 5 oct 2022 a las 10:58, Jorge Flórez ()
escribió:

> Hi,
>
> in a production log I have some messages like this one:
>
> javax.jcr.InvalidItemStateException: OakState0001: Unresolved conflicts in
> /F/EDTG/2010286E/00_Hoja_Control_2017_T5.pdf
> at
> deployment.mpEcmEA.ear//org.apache.jackrabbit.oak.api.CommitFailedException.asRepositoryException(CommitFailedException.java:238)
> at
> deployment.mpEcmEA.ear//org.apache.jackrabbit.oak.api.CommitFailedException.asRepositoryException(CommitFailedException.java:213)
> at
> deployment.mpEcmEA.ear//org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.newRepositoryException(SessionDelegate.java:669)
> at
> deployment.mpEcmEA.ear//org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.save(SessionDelegate.java:495)
> at
> deployment.mpEcmEA.ear//org.apache.jackrabbit.oak.jcr.session.SessionImpl$8.performVoid(SessionImpl.java:420)
> at
> deployment.mpEcmEA.ear//org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.performVoid(SessionDelegate.java:273)
> at
> deployment.mpEcmEA.ear//org.apache.jackrabbit.oak.jcr.session.SessionImpl.save(SessionImpl.java:417)
>
> I have been reading and I think I got some understanding, but I still have
> questions:
>
> 1 Does the past message mean that the node cannot be modified further in
> any way?
> 2 As far as I know, some conflict handlers are added to my JCR instance by
> default:
> RepMembersConflictHandler, JcrLastModifiedConflictHandler,
> AnnotatingConflictHandler and one wrapper. If my guess is correct, the
> "conflict" was managed by the AnnotatingConflictHandler which always
> returns *Resolution.THEIRS* and adds rep:MergeConflict mixin to the node.
> Does this mean that all I have to do is remove the mixin to get rid of
> "OakState0001: Unresolved conflicts"?
> 3 Or should I write my own handler that for example always returns 
> *Resolution.THEIRS
> *and that's it? or in the handler I have to manually make the respective
> changes in all of the events?
>
> I hope I have been clear. Thanks in advance.
>
> Regards.
>
> Jorge
>
>


Re: Oak Jenkins build takes up too many resources

2022-11-15 Thread Marcel Reutegger
Hi,

On 11.11.22, 15:11, "Robert Munteanu"  wrote:
> I think it would be worthile to check if the build can be optimised

FYI, I also changed the Job configuration to not build PRs in draft state.

Regards
Marcel


Re: Oak Jenkins build takes up too many resources

2022-11-15 Thread Konrad Windszus
Probably it would also be fair to not parallelise all modules (i.e. reserve 
only handful number of slots instead of almost all available).
Currently we trigger 35 builds at the same time: 
https://github.com/apache/jackrabbit-oak/blob/6c04951be723c4574d3a7c1bcc4ec04e9f9e7dd0/Jenkinsfile#L23
 

 while ASF Jenkins has only about the same amount of slots.
I think we should block at most a handful of nodes in parallel (although this 
may lead to slightly slower build times).

Konrad

> On 14. Nov 2022, at 16:29, Konrad Windszus  wrote:
> 
> Seems this is pretty easy: https://stackoverflow.com/a/70375236 
> 
> Can you come up with a PR?
> Thanks,
> Konrad
> 
>> On 14. Nov 2022, at 16:16, Marcel Reutegger  
>> wrote:
>> 
>> Hi,
>> 
>> On 11.11.22, 15:11, "Robert Munteanu"  wrote:
>>> I think it would be worthile to check if the build can be optimised,
>>> otherwise Oak builds are blocking many execution slots of the Jenkins
>>> ASF instance.
>> 
>> I did notice one thing. Our PRs may schedule a build each time commits
>> are pushed to the branch. Is there a way to cancel an already running
>> build when there are new changes coming in from a PR?
>> 
>> Regards
>> Marcel
> 



Re: Oak Jenkins build takes up too many resources

2022-11-15 Thread Marcel Reutegger
On 14.11.22, 16:29, "Konrad Windszus"  wrote:
> Can you come up with a PR?
See https://github.com/apache/jackrabbit-oak/pull/755
Though, I don’t think the build it triggers actually uses the updated 
Jenkinsfile.
Regards
Marcel