I am adding BinaryValue properties to my nodes. It appears that jackrabbit is
not indexing the values of the BinaryValue even if the contents represent a
string. If I add the String value as a StringValue, the value is indexed and
picked up in a contains search.
I have 2 issues with this:
1) String property values have a limit of around 16000 characters because the
SimpleDBPersistence adapter will store the value in a BLOB field. I get Mysql
data truncation errors unless I chop the data down to 16000 characters. In
addition, I am doubling my space requirements. No only do I have to store my
binary content, by it's string representation in the node.
2) I use a byte[] array throughout my application has a means to store pdf
files, image files, text files, etc... It is a "common denominator for all
content" PDF files, image files, wiki entries, etc... all can be stored,
passed around, retrieved as a byte[] array. I would like to figure out how to
get jackrabbit to index the byte[] array properly.
3) Not an issue, but a question. How does jackrabbit know that a node is a pdf
document? It must figure it out somehow because I see that there is support in
the SearchIndex to configure pdf extractions. Do I add "jcr:mimeType" property
of application/pdf to my pdf node and that will do it? Will this solve the
first 2 issues??
I appreciate your thoughts on this!
My Code:
String contentText= "this is a unique piece of text";
byte[] bytes = contentText.getBytes();
node.setProperty("content", new BinaryValue(bytes));
if (content.length() > 16000) {
contentText= contentText.substring(0, 16000);
}
node.setProperty("worksproperty", new StringValue(contentText));
This is my xpath query:
//*[jcr:contains(.,'unique')]