Vielen Dank Lorenz ! This is annoying; I can't preprocess the literals before putting them in TDB, because TDB *is* the database for my CMS + social network. And duplication of data would be a mess. But maybe there is a way to preprocess the literals before putting them in the underlying Lucene.
This being said, the most frequent tags , <p> and <div> are not likely to be search strings from the user. So this is not a big problem, but I felt it an interesting problem. 2018-01-29 11:12 GMT+01:00 Lorenz Buehmann < [email protected]>: > I guess it simply uses the Lucene Standard Analyzer, thus, yes the tags > will be indexed. There isn't a HTML analyzer in Lucene AFAIK, which > means you have to preprocess the literals first via Apache Tika or > something like JSoup before you add them to the triple store. > > > Lorenz > > > > On 29.01.2018 10:14, Jean-Marc Vanel wrote: > > Hi > > > > With semantic_forms one can create content with an HTML editor in > > JavaScript. > > > > Example: > > http://semantic-forms.cc:9112/download?url=http%3A%2F% > 2Fsemantic-forms.cc%3A9112%2Fldp%2F1515780312176-31461258964949990&syntax= > Turtle > > and how it looks in the UI : > > http://semantic-forms.cc:9112/ldp/1515780312176-31461258964949990 > > > > My question is: > > Does Jena text indexing process the tags in HTML (or XML) content ? > > If yes , <bold> would be indexed in Lucene, which is not desirable. > > > > Nothing is said in these 2 pages: > > https://jena.apache.org/documentation/notes/typed-literals.html > > https://jena.apache.org/documentation/query/text-query.html > > > > -- Jean-Marc Vanel http://www.semantic-forms.cc:9111/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me#subject <http://www.semantic-forms.cc:9111/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me> Déductions SARL - Consulting, services, training, Rule-based programming, Semantic Web +33 (0)6 89 16 29 52 Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui
