Hi Sorin, 1) I suggest trying to simplify the entity map. I assume there’s data for each of the properties other than skos:altLabel in the entity map:
> [ text:field "gndtype"; > text:predicate skos:altLabel > ] > [ text:field "oldgndid"; > text:predicate gndo:oldAuthorityNumber > ] > [ text:field "prefName"; > text:predicate gndo:preferredNameForTheSubjectHeading > ] > [ text:field "varName"; > text:predicate gndo:variantNameForTheSubjectHeading > ] > [ text:field "prefName"; > text:predicate gndo:preferredNameForThePlaceOrGeographicName > ] > [ text:field "varName"; > text:predicate gndo:variantNameForThePlaceOrGeographicName > ] > [ text:field "prefName"; > text:predicate gndo:preferredNameForTheWork > ] > [ text:field "varName"; > text:predicate gndo:variantNameForTheWork > ] > [ text:field "prefName"; > text:predicate gndo:preferredNameForTheConferenceOrEvent > ] > [ text:field "varName"; > text:predicate gndo:variantNameForTheConferenceOrEvent > ] > [ text:field "prefName"; > text:predicate gndo:preferredNameForTheCorporateBody > ] > [ text:field "varName"; > text:predicate gndo:variantNameForTheCorporateBody > ] > [ text:field "prefName"; > text:predicate gndo:preferredNameForThePerson > ] > [ text:field "varName"; > text:predicate gndo:variantNameForThePerson > ] > [ text:field "prefName"; > text:predicate gndo:preferredNameForTheFamily > ] > [ text:field "varName"; > text:predicate gndo:variantNameForTheFamily > ] 2) You might try a TextIndexLucene 3) Adding the line log4j.logger.org.apache.jena.query.text.es=DEBUG should work. I see no problem with it. Sorry to be of little help, Chris > On Feb 28, 2019, at 8:53 AM, Sorin Gheorghiu > <[email protected]> wrote: > > Hi Chris, > Thank you for answering, I reply you directly because users@jena doesn't > accept messages larger than 1Mb. > > The previous text index successful attempt we did was with 3.8.0, not 3.9.0, > sorry for the misinformation. > Attached is the assembler file for 3.10.0 as requested, as well as the packet > capture file to see that only the 'gndtype' field has data. > I tried to enable the debug logs in log4j.properties with > log4j.logger.org.apache.jena.query.text.es=DEBUG but no output in the log > file. > > Regards, > Sorin > > Am 27.02.2019 um 20:01 schrieb Chris Tomlinson: >> Hi Sorin, >> >> Please provide the assembler file for Elasticsearch that has the problematic >> entity map definitions. >> >> There haven’t been any changes in over a year to textindexer since well >> before 3.9. I don’t see any relevant changes to the handling of entity maps >> either so I can’t begin to pursue the issue further w/o perhaps seeing your >> current assembler file. >> >> I don't have any experience with Elasticsearch or with using jena-text-es >> beyond a simple change to TextIndexES.java to change >> org.elasticsearch.common.transport.InetSocketTransportAddress to >> org.elasticsearch.common.transport.TransportAddress as part of the upgrade >> to Lucene 7.4.0 and Elasticsearch 6.4.2. >> >> Regards, >> Chris >> >> >>> On Feb 25, 2019, at 2:37 AM, Sorin Gheorghiu >>> <[email protected]> <mailto:[email protected]> >>> wrote: >>> >>> Correction: only the *latest field *from the /text:map/ list contains a >>> value. >>> >>> To reformulate: >>> >>> * if there are 3 fields in /text:map/, then during indexing the first >>> two are empty (let's name them 'text1' and 'text2') and the latest >>> field contains data (let's name it 'text3') >>> * if on the next attempt the field 'text3' is commented out, then >>> 'text1' is empty and 'text2' contains data >>> >>> >>> Am 22.02.2019 um 15:01 schrieb Sorin Gheorghiu: >>>> In addition: >>>> >>>> * if there are 3 fields in /text:map/, then during indexing one >>>> contains data (let's name it 'text1'), the others are empty (let's >>>> name them 'text2' and 'text3'), >>>> * if on the next attempt the field 'text1' is commented out, then >>>> 'text2' contains data and 'text3' is empty >>>> >>>> >>>> >>>> -------- Weitergeleitete Nachricht -------- >>>> Betreff: Text Index build with empty fields >>>> Datum: Fri, 22 Feb 2019 14:01:18 +0100 >>>> Von: Sorin Gheorghiu <[email protected]> >>>> <mailto:[email protected]> >>>> Antwort an: [email protected] <mailto:[email protected]> >>>> An: [email protected] <mailto:[email protected]> >>>> >>>> >>>> >>>> Hi, >>>> >>>> When building the text index with the /jena.textindexer/ tool in Jena 3.10 >>>> for an external full-text search engine (Elasticsearch of course) and >>>> having multiple fields with different names in /text:map/, just *one field >>>> is indexed* (more precisely one field contains data, the others are >>>> empty). It doesn't look to be an issue with Elasticsearch, in the logs >>>> generated during the indexing the fields are already missing the values, >>>> but one. The same setup worked in Jena 3.9. Changing the Java version from >>>> 8 to 9 or 11 didn't change anything. >>>> >>>> Could it be that changes of the new release have affected this tool and we >>>> deal with a bug? >>>> > -- > Sorin Gheorghiu Tel: +49 7531 88-3198 > Universität Konstanz Raum: B705 > 78464 Konstanz [email protected] > <mailto:[email protected]> > > - KIM: Abteilung Contentdienste -
