Re: Text Index build with empty fields

Chris Tomlinson Thu, 28 Feb 2019 12:48:56 -0800

Hi Sorin,

1) I suggest trying to simplify the entity map. I assume there’s data for each 
of the properties other than skos:altLabel in the entity map:


>          [ text:field "gndtype";
>            text:predicate skos:altLabel
>          ]
>          [ text:field "oldgndid";
>            text:predicate gndo:oldAuthorityNumber
>          ]
>          [ text:field "prefName";
>            text:predicate gndo:preferredNameForTheSubjectHeading
>          ]
>          [ text:field "varName";
>            text:predicate gndo:variantNameForTheSubjectHeading
>          ]
>          [ text:field "prefName";
>            text:predicate gndo:preferredNameForThePlaceOrGeographicName
>          ]
>          [ text:field "varName";
>            text:predicate gndo:variantNameForThePlaceOrGeographicName
>          ]
>          [ text:field "prefName";
>            text:predicate gndo:preferredNameForTheWork
>          ]
>          [ text:field "varName";
>            text:predicate gndo:variantNameForTheWork
>          ]
>          [ text:field "prefName";
>            text:predicate gndo:preferredNameForTheConferenceOrEvent
>          ]
>          [ text:field "varName";
>            text:predicate gndo:variantNameForTheConferenceOrEvent
>          ]
>          [ text:field "prefName";
>            text:predicate gndo:preferredNameForTheCorporateBody
>          ]
>          [ text:field "varName";
>            text:predicate gndo:variantNameForTheCorporateBody
>          ]
>          [ text:field "prefName";
>            text:predicate gndo:preferredNameForThePerson
>          ]
>          [ text:field "varName";
>            text:predicate gndo:variantNameForThePerson
>          ]
>          [ text:field "prefName";
>            text:predicate gndo:preferredNameForTheFamily
>          ]
>          [ text:field "varName";
>            text:predicate gndo:variantNameForTheFamily
>          ]


2) You might try a TextIndexLucene

3) Adding the line log4j.logger.org.apache.jena.query.text.es=DEBUG should 
work. I see no problem with it.

Sorry to be of little help,
Chris


> On Feb 28, 2019, at 8:53 AM, Sorin Gheorghiu 
> <[email protected]> wrote:
> 
> Hi Chris,
> Thank you for answering, I reply you directly because users@jena doesn't 
> accept messages larger than 1Mb.
> 
> The previous text index successful attempt we did was with 3.8.0, not 3.9.0, 
> sorry for the misinformation.
> Attached is the assembler file for 3.10.0 as requested, as well as the packet 
> capture file to see that only the 'gndtype' field has data.
> I tried to enable the debug logs in log4j.properties with 
> log4j.logger.org.apache.jena.query.text.es=DEBUG but no output in the log 
> file.
> 
> Regards,
> Sorin
> 
> Am 27.02.2019 um 20:01 schrieb Chris Tomlinson:
>> Hi Sorin,
>> 
>> Please provide the assembler file for Elasticsearch that has the problematic 
>> entity map definitions.
>> 
>> There haven’t been any changes in over a year to textindexer since well 
>> before 3.9. I don’t see any relevant changes to the handling of entity maps 
>> either so I can’t begin to pursue the issue further w/o perhaps seeing your 
>> current assembler file. 
>> 
>> I don't have any experience with Elasticsearch or with using jena-text-es 
>> beyond a simple change to TextIndexES.java to change 
>> org.elasticsearch.common.transport.InetSocketTransportAddress to 
>> org.elasticsearch.common.transport.TransportAddress as part of the upgrade 
>> to Lucene 7.4.0 and Elasticsearch 6.4.2.
>> 
>> Regards,
>> Chris
>> 
>> 
>>> On Feb 25, 2019, at 2:37 AM, Sorin Gheorghiu 
>>> <[email protected]> <mailto:[email protected]> 
>>> wrote:
>>> 
>>> Correction: only the *latest field *from the /text:map/ list contains a 
>>> value.
>>> 
>>> To reformulate:
>>> 
>>> * if there are 3 fields in /text:map/, then during indexing the first
>>>   two are empty (let's name them 'text1' and 'text2') and the latest
>>>   field contains data (let's name it 'text3')
>>> * if on the next attempt the field 'text3' is commented out, then
>>>   'text1' is empty and 'text2' contains data
>>> 
>>> 
>>> Am 22.02.2019 um 15:01 schrieb Sorin Gheorghiu:
>>>> In addition:
>>>> 
>>>>  * if there are 3 fields in /text:map/, then during indexing one
>>>>    contains data (let's name it 'text1'), the others are empty (let's
>>>>    name them 'text2' and 'text3'),
>>>>  * if on the next attempt the field 'text1' is commented out, then
>>>>    'text2' contains data and 'text3' is empty
>>>> 
>>>> 
>>>> 
>>>> -------- Weitergeleitete Nachricht --------
>>>> Betreff:   Text Index build with empty fields
>>>> Datum:     Fri, 22 Feb 2019 14:01:18 +0100
>>>> Von:       Sorin Gheorghiu <[email protected]> 
>>>> <mailto:[email protected]>
>>>> Antwort an:        [email protected] <mailto:[email protected]>
>>>> An:        [email protected] <mailto:[email protected]>
>>>> 
>>>> 
>>>> 
>>>> Hi,
>>>> 
>>>> When building the text index with the /jena.textindexer/ tool in Jena 3.10 
>>>> for an external full-text search engine (Elasticsearch of course) and 
>>>> having multiple fields with different names in /text:map/, just *one field 
>>>> is indexed* (more precisely one field contains data, the others are 
>>>> empty). It doesn't look to be an issue with Elasticsearch, in the logs 
>>>> generated during the indexing the fields are already missing the values, 
>>>> but one. The same setup worked in Jena 3.9. Changing the Java version from 
>>>> 8 to 9 or 11 didn't change anything.
>>>> 
>>>> Could it be that changes of the new release have affected this tool and we 
>>>> deal with a bug?
>>>> 
> -- 
> Sorin Gheorghiu             Tel: +49 7531 88-3198
> Universität Konstanz        Raum: B705
> 78464 Konstanz              [email protected] 
> <mailto:[email protected]>
> 
> - KIM: Abteilung Contentdienste -

Re: Text Index build with empty fields

Reply via email to