[
https://issues.apache.org/jira/browse/OAK-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13816003#comment-13816003
]
Thomas Mueller commented on OAK-1150:
-------------------------------------
I think what we need is some kind of exclude list, so that nt:folder and so on
are not indexed, but other node types and mixins are. To build the exclude
list, I got some statistics from one particular application (default setup).
The number of nodes below the given index node (/oak:index/nodetype/:index/xxx)
are (only those with more than 1000 entries):
{code}
100047 nt:folder
95614 sling:Message
30619 nt:unstructured
29326 nt:resource
17968 nt:file
6922 cq:Widget
6227 sling:Folder
4528 nt:frozenNode
4387 cq:LiveRelationship
4360 cq:WidgetCollection
2772 cq:CatalogSyncAction
2652 nt:version
2597 nt:versionLabels
2237 cq:PageContent
2003 nt:versionHistory
1425 cq:Page
1409 rep:versionStorage
1378 cq:ClientLibraryFolder
1216 cq:Dialog
1183 nt:propertyDefinition
1164 cq:Component
1137 mix:versionable
1042 sling:OrderedFolder
1004 cq:Taggable
{code}
How I got those results (this is mainly for my own use, in case I need to get
similar statistics):
{code}
./mongo 127.0.0.1/Oak --eval "var c = db.nodes.find(); while(c.hasNext())
{printjson(c.next()._id)}" >> test.txt
In this file, remove ", replace "," with "-", regexp "^([0-9]*):" with "\1,".
Then with the H2 database:
create table test(depth int, path varchar) as
select * from csvread('~/temp/test2.txt', 'depth,path');
create index on test(path);
alter table test add column parent varchar as left(path, instr(path, '/', -1) -
1);
select (select count(*) from test t2 where t2.path
between t1.path || '/' and t1.path || '0') count, t1.path
from test t1 where t1.depth = 4
and t1.path like '/oak:index/nodetype/:index/%'
order by 1 desc;
{code}
> NodeType index: don't index all primary and mixin types
> -------------------------------------------------------
>
> Key: OAK-1150
> URL: https://issues.apache.org/jira/browse/OAK-1150
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Reporter: Thomas Mueller
>
> Currently, the nodetype index indexes all primary types and mixin types
> (including nt:base I think).
> This results in many nodes in this index, which unnecessarily increases the
> repository size, but doesn't really help executing queries (running a query
> to get all nt:base nodes doesn't benefit much from using the nodetype index).
> It should also help reduce writes in updating the index, for example for
> OAK-1099
--
This message was sent by Atlassian JIRA
(v6.1#6144)