Hi, I'm not sure if you are aware of the following, it might help: https://oakutils.appspot.com/generate/index https://www.aemstuff.com/blogs/feb/aemindexcheatsheat.html https://experienceleague.adobe.com/docs/experience-manager-65/assets/JCR_query_cheatsheet-v1.1.pdf
These were written for the Adobe AEM product, but I find them useful even outside of AEM. And here an example index definition: { "/oak:index/acmeAsset-1": { "compatVersion": 2, "type": "lucene", "tags": ["asset"], "async": ["async", "nrt"], "includedPaths": ["/content/dam"], "jcr:primaryType": "oak:QueryIndexDefinition", "evaluatePathRestrictions": true, "maxFieldLength": 100000, "aggregates": { "jcr:primaryType": "nt:unstructured", "dam:Asset": { "jcr:primaryType": "nt:unstructured", "include0": { "path": "jcr:content", "jcr:primaryType": "nt:unstructured" }, "include1": { "path": "jcr:content/metadata", "jcr:primaryType": "nt:unstructured" }, "include2": { "path": "jcr:content/metadata/*", "jcr:primaryType": "nt:unstructured" }, "include3": { "path": "jcr:content/renditions", "jcr:primaryType": "nt:unstructured" }, "include4": { "path": "jcr:content/renditions/original", "jcr:primaryType": "nt:unstructured" }, "include5": { "path": "jcr:content/renditions/original/jcr:content", "jcr:primaryType": "nt:unstructured" }, "include6": { "path": "jcr:content/comments", "jcr:primaryType": "nt:unstructured" }, "include7": { "path": "jcr:content/comments/*", "jcr:primaryType": "nt:unstructured" }, "include8": { "path": "jcr:content/data/master", "jcr:primaryType": "nt:unstructured" }, "include9": { "path": "jcr:content/usages", "jcr:primaryType": "nt:unstructured" }, "include10": { "path": "jcr:content/renditions/text.txt/jcr:content", "jcr:primaryType": "nt:unstructured" } } }, "facets": { "jcr:primaryType": "nt:unstructured", "topChildren": "100", "secure": "insecure" }, "indexRules": { "jcr:primaryType": "nt:unstructured", "dam:Asset": { "jcr:primaryType": "nt:unstructured", "properties": { "jcr:primaryType": "nt:unstructured", "jcrLastModified": { "ordered": true, "name": "jcr:content/jcr:lastModified", "propertyIndex": true, "jcr:primaryType": "nt:unstructured", "type": "Date" }, "jcrTitle": { "useInSpellcheck": true, "useInSuggest": true, "nodeScopeIndex": true, "name": "jcr:content/jcr:title", "propertyIndex": true, "boost": 2.0, "jcr:primaryType": "nt:unstructured" }, "jcrDescription": { "nodeScopeIndex": true, "useInSpellcheck": true, "name": "jcr:content/jcr:description", "propertyIndex": true, "jcr:primaryType": "nt:unstructured", "useInSuggest": true }, "jcrCreated": { "ordered": true, "name": "jcr:created", "propertyIndex": true, "jcr:primaryType": "nt:unstructured", "type": "Date" }, "nodeName": { "nodeScopeIndex": true, "name": ":nodeName", "jcr:primaryType": "nt:unstructured", "useInSuggest": true }, } } } } } I wonder if nowadays, you would get more answers on stackoverflow.com? I'm not sure... Regards, Thomas From: Raffaele Gambelli <raffaele.gambe...@cegeka.com.INVALID> Date: Wednesday, 11 September 2024 at 18:58 To: users@jackrabbit.apache.org <users@jackrabbit.apache.org> Subject: Re: Indexing a binary and searching with contains, help request EXTERNAL: Use caution when clicking on links or opening attachments. Forgive me, I really ask you for help, I beg you... this issue is driving me crazy, I tried to search for similar code in oak projects but without finding anything, on the web there is incredibly nothing similar. Is it possible that among you developers there is not a soul willing to help? What good is this mailinglist if none of those who carry on this beautiful project ever take action? Cordiali saluti / Best regards, Raffaele Gambelli Senior Java Developer E raffaele.gambe...@cegeka.com<mailto:raffaele.gambe...@cegeka.com> [CEGEKA] Via Ettore Cristoni, 84 IT-40033 Bologna (IT), Italy T +39 02 2544271 https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.cegeka.com%2F&data=05%7C02%7Cmueller%40adobe.com%7C568bbc3b1b274b37a23f08dcd282f6d5%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C638616707160458717%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=SK5hNvfuhqTxOTycVagrptiMhISVw2JG4d16LxcVn5o%3D&reserved=0<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.cegeka.com%2F&data=05%7C02%7Cmueller%40adobe.com%7C568bbc3b1b274b37a23f08dcd282f6d5%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C638616707160469937%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ZNciXrpB7vHd3i1y2u2n41wsRWRXQTWWTcUa0i46Vvw%3D&reserved=0><http://www.cegeka.com/> [https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2F2655225.fs1.hubspotusercontent-na1.net%2Fhubfs%2F2655225%2F0.0%2520Cegeka%2520&data=05%7C02%7Cmueller%40adobe.com%7C568bbc3b1b274b37a23f08dcd282f6d5%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C638616707160474921%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=10xeV4YSfw9GOVoRpuxdVTjzpIetVGFDZ3wuZvEIcBM%3D&reserved=0(new)/1.%20Visuals/Email%20Signatures/Annual_Report_Visuals_2023_Email%20Banner%201.png]<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.cegeka.com%2Fit%2Fannual-report-2023%3Futm_campaign%3D&data=05%7C02%7Cmueller%40adobe.com%7C568bbc3b1b274b37a23f08dcd282f6d5%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C638616707160479668%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ZzKmTH%2B7C1xfmClQplVDTYzJnpgMYpnXzMSyp2ADqZI%3D&reserved=0[EN]%20-%20Annual%20Report%202023&utm_source=email%20signature%20banner&utm_medium=email%20signature%20banner%20annual%20report%202023<https://2655225.fs1.hubspotusercontent-na1.net/hubfs/2655225/0.0%20Cegeka%20>> Dichiarazione di Riservatezza Le informazioni contenute nella mail sono riservate. Se si rende conto di non essere il destinatario corretto della mail, la preghiamo di segnalare l'errore al mittente e di cancellare immediatamente il messaggio. L’utilizzo improprio di informazioni riservate può comportare sanzioni. Protezione dei dati personali La informiamo che i suoi dati saranno trattati da Cegeka nel rispetto delle disposizioni di legge applicabili (D. Lgs 196/2003 e Regolamento UE 679/2016). Per maggiori dettagli può consultare le nostre informative privacy al link https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.cegeka.com%2Fit%2Finformazioni-sulla-privacy&data=05%7C02%7Cmueller%40adobe.com%7C568bbc3b1b274b37a23f08dcd282f6d5%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C638616707160484328%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=OSemO%2FkRnSbMqBcyu3NFOYthrluEtOpHS8mxQVxpkSc%3D&reserved=0.<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.cegeka.com%2Fit%2Finformazioni-sulla-privacy&data=05%7C02%7Cmueller%40adobe.com%7C568bbc3b1b274b37a23f08dcd282f6d5%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C638616707160488876%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=f9qWpGcka%2FRF8%2Fd5W6pq5Zrw1v9qLJgP86fVCGQsA3E%3D&reserved=0><https://www.cegeka.com/it/informazioni-sulla-privacy> ________________________________ From: Raffaele Gambelli <raffaele.gambe...@cegeka.com.INVALID> Sent: Wednesday, September 11, 2024 12:51 PM To: users@jackrabbit.apache.org <users@jackrabbit.apache.org> Subject: Indexing a binary and searching with contains, help request Good morning, I would like to ask for your help in understanding where I go wrong in building a working example where I populate a repository with binary data, index it, and run a contains query. I have logs to TRACE and I see the indexing working, upon executing the query however I always get 0 results. Repository is NodeStore and I create it in this way: LuceneIndexProvider provider = new LuceneIndexProvider(); Oak oak = new Oak(ns) .with((QueryIndexProvider) provider) .with((Observer) provider) .with(new LuceneIndexEditorProvider()); repository = new Jcr(oak).createRepository(); Then I populate it in this way: Node node = rootNode.addNode("node" + i, "nt:unstructured"); byte[] data = ("testo" + i).getBytes(); ByteArrayInputStream bais = new ByteArrayInputStream(data); Binary binary = session.getValueFactory() .createBinary(bais); try { node.setProperty("binaryData", binary); } finally { binary.dispose(); } node.setProperty("jcr:mimeType", "text/plain"); Then the index is in this way: Node root = session.getRootNode(); Node oakIndex = root.getNode("oak:index"); Node index = oakIndex.addNode("contentTextIndex", "oak:QueryIndexDefinition"); index.setProperty("type", "lucene"); index.setProperty("async", (String[]) null); Node indexRules = index.addNode("indexRules", "nt:unstructured"); Node ntBase = indexRules.addNode("nt:base", "nt:unstructured"); Node properties = ntBase.addNode("properties", "nt:unstructured"); Node binaryDataProperty = properties.addNode("binaryData", "nt:unstructured"); binaryDataProperty.setProperty("name", propertyName); binaryDataProperty.setProperty("propertyIndex", true); binaryDataProperty.setProperty("analyzed", true); Node jcrMimeTypeProperty = properties.addNode("jcr:mimeType"); jcrMimeTypeProperty.setProperty("name", "jcr:mimeType"); jcrMimeTypeProperty.setProperty("propertyIndex", true); jcrMimeTypeProperty.setProperty("analyzed", true); Then I search in this way: String sql2QueryString = "SELECT * FROM [nt:base] WHERE CONTAINS([binaryData], 'testo')"; Query sql2Query = queryManager.createQuery(sql2QueryString, Query.JCR_SQL2); QueryResult result = sql2Query.execute(); and I read the results in this way: NodeIterator nodes = result.getNodes(); while (nodes.hasNext()) { Node node = nodes.nextNode(); log.info("Path: " + node.getPath()); counter++; } log.info("Found {} results", counter); I'm using oak 1.68.0 with tika-core and tika-parsers-standard-package 2.9.2. In logs I see the indexing and the text extraction correctly, if you want I can attach a full log. Really thank you for your help, best regards Cordiali saluti / Best regards, Raffaele Gambelli Senior Java Developer E raffaele.gambe...@cegeka.com<mailto:raffaele.gambe...@cegeka.com> [CEGEKA] Via Ettore Cristoni, 84 IT-40033 Bologna (IT), Italy T +39 02 2544271 https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.cegeka.com%2F&data=05%7C02%7Cmueller%40adobe.com%7C568bbc3b1b274b37a23f08dcd282f6d5%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C638616707160493324%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=i6lJqK9XBQaGpbIvVDhbaMp9olf2IWDeuUFBzDn2W9A%3D&reserved=0<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.cegeka.com%2F&data=05%7C02%7Cmueller%40adobe.com%7C568bbc3b1b274b37a23f08dcd282f6d5%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C638616707160497741%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=mjy%2B0B8YHARsq3pdJyQmB6qZvkRaG0l6iF6kCgU54os%3D&reserved=0><http://www.cegeka.com/> [https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2F2655225.fs1.hubspotusercontent-na1.net%2Fhubfs%2F2655225%2F0.0%2520Cegeka%2520&data=05%7C02%7Cmueller%40adobe.com%7C568bbc3b1b274b37a23f08dcd282f6d5%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C638616707160502767%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=2fsRwdcmoWTqhbXlOX2UOT0LwGhg2SYjC8Xn0KZzzkE%3D&reserved=0(new)/1.%20Visuals/Email%20Signatures/Annual_Report_Visuals_2023_Email%20Banner%201.png]<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.cegeka.com%2Fit%2Fannual-report-2023%3Futm_campaign%3D&data=05%7C02%7Cmueller%40adobe.com%7C568bbc3b1b274b37a23f08dcd282f6d5%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C638616707160507418%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=3K9XpMZrR1Jsqdy64PoZgIKq%2BUan%2BETNV4SM%2FWU9rr8%3D&reserved=0[EN]%20-%20Annual%20Report%202023&utm_source=email%20signature%20banner&utm_medium=email%20signature%20banner%20annual%20report%202023<https://2655225.fs1.hubspotusercontent-na1.net/hubfs/2655225/0.0%20Cegeka%20>> Dichiarazione di Riservatezza Le informazioni contenute nella mail sono riservate. Se si rende conto di non essere il destinatario corretto della mail, la preghiamo di segnalare l'errore al mittente e di cancellare immediatamente il messaggio. L’utilizzo improprio di informazioni riservate può comportare sanzioni. Protezione dei dati personali La informiamo che i suoi dati saranno trattati da Cegeka nel rispetto delle disposizioni di legge applicabili (D. Lgs 196/2003 e Regolamento UE 679/2016). Per maggiori dettagli può consultare le nostre informative privacy al link https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.cegeka.com%2Fit%2Finformazioni-sulla-privacy&data=05%7C02%7Cmueller%40adobe.com%7C568bbc3b1b274b37a23f08dcd282f6d5%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C638616707160512276%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=45rt0fAdsau%2BfiExqFn8gil66BaP%2BQdcX193M4JzdUU%3D&reserved=0.<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.cegeka.com%2Fit%2Finformazioni-sulla-privacy&data=05%7C02%7Cmueller%40adobe.com%7C568bbc3b1b274b37a23f08dcd282f6d5%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C638616707160516792%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=UdUTfHYVQGpPxOZRxc1x7bxpyILy2FD9Dby2IymSBro%3D&reserved=0><https://www.cegeka.com/it/informazioni-sulla-privacy>