Re: Offset-Based Analysis

2023-02-22 Thread Mikhail Khludnev
One more idea. It's possible to ask Solr for essential tokenization via /analysis/field API (here's a clue https://stackoverflow.com/a/37785401), get token stream in structured response, and pass it into NPL pipeline for enrichment. On Wed, Feb 22, 2023 at 5:26 PM Luke Kot-Zaniewski (BLOOMBERG/ 91

Re: Offset-Based Analysis

2023-02-22 Thread Luke Kot-Zaniewski (BLOOMBERG/ 919 3RD A)
Hi Mikhail, Thanks for the quick reply and the suggestion. This is definitely good to know about. In my case however, there are several such NLP/data extraction systems and I am not sure if they all use the same tokenization but I will give this another look. I can see how this is a more well-d