I guess it simply uses the Lucene Standard Analyzer, thus, yes the tags will be indexed. There isn't a HTML analyzer in Lucene AFAIK, which means you have to preprocess the literals first via Apache Tika or something like JSoup before you add them to the triple store.
Lorenz On 29.01.2018 10:14, Jean-Marc Vanel wrote: > Hi > > With semantic_forms one can create content with an HTML editor in > JavaScript. > > Example: > http://semantic-forms.cc:9112/download?url=http%3A%2F%2Fsemantic-forms.cc%3A9112%2Fldp%2F1515780312176-31461258964949990&syntax=Turtle > and how it looks in the UI : > http://semantic-forms.cc:9112/ldp/1515780312176-31461258964949990 > > My question is: > Does Jena text indexing process the tags in HTML (or XML) content ? > If yes , <bold> would be indexed in Lucene, which is not desirable. > > Nothing is said in these 2 pages: > https://jena.apache.org/documentation/notes/typed-literals.html > https://jena.apache.org/documentation/query/text-query.html >
