[ 
https://issues.apache.org/jira/browse/OAK-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13816003#comment-13816003
 ] 

Thomas Mueller commented on OAK-1150:
-------------------------------------

I think what we need is some kind of exclude list, so that nt:folder and so on 
are not indexed, but other node types and mixins are. To build the exclude 
list, I got some statistics from one particular application (default setup). 
The number of nodes below the given index node (/oak:index/nodetype/:index/xxx) 
are (only those with more than 1000 entries):

{code}
100047  nt:folder
95614   sling:Message
30619   nt:unstructured
29326   nt:resource
17968   nt:file
6922    cq:Widget
6227    sling:Folder
4528    nt:frozenNode
4387    cq:LiveRelationship
4360    cq:WidgetCollection
2772    cq:CatalogSyncAction
2652    nt:version
2597    nt:versionLabels
2237    cq:PageContent
2003    nt:versionHistory
1425    cq:Page
1409    rep:versionStorage
1378    cq:ClientLibraryFolder
1216    cq:Dialog
1183    nt:propertyDefinition
1164    cq:Component
1137    mix:versionable
1042    sling:OrderedFolder
1004    cq:Taggable
{code}

How I got those results (this is mainly for my own use, in case I need to get 
similar statistics):

{code}
./mongo 127.0.0.1/Oak --eval "var c = db.nodes.find(); while(c.hasNext()) 
{printjson(c.next()._id)}" >> test.txt
In this file, remove ", replace "," with "-", regexp "^([0-9]*):" with "\1,". 
Then with the H2 database:
create table test(depth int, path varchar) as
select * from csvread('~/temp/test2.txt', 'depth,path');
create index on test(path);
alter table test add column parent varchar as left(path, instr(path, '/', -1) - 
1);
select (select count(*) from test t2 where t2.path 
between t1.path || '/' and t1.path || '0') count, t1.path
from test t1 where t1.depth = 4 
and t1.path like '/oak:index/nodetype/:index/%'
order by 1 desc;
{code}


> NodeType index: don't index all primary and mixin types
> -------------------------------------------------------
>
>                 Key: OAK-1150
>                 URL: https://issues.apache.org/jira/browse/OAK-1150
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>            Reporter: Thomas Mueller
>
> Currently, the nodetype index indexes all primary types and mixin types 
> (including nt:base I think).
> This results in many nodes in this index, which unnecessarily increases the 
> repository size, but doesn't really help executing queries (running a query 
> to get all nt:base nodes doesn't benefit much from using the nodetype index).
> It should also help reduce writes in updating the index, for example for 
> OAK-1099



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to