Yes I could imagine big gains from this strategy if OpenNLP is in the analysis chain ;-)
On Fri, Apr 13, 2018 at 5:01 PM Markus Jelsma <markus.jel...@openindex.io> wrote: > Hello David, > > If JSON serialization is too bulky, we could also opt for > SimplePreAnalyzed right? At least as a FieldType it is possible, if not > with URP, it just needs some work. > > Regarding results; we haven't done it yet, and won't for some time, but we > will when we reintroduce OpenNLP in the analysis chain. We tried to > introduce POS-tagging on our own two years ago, but i wasn't suited for > production because it was too heavy on the CPU. Indexing data suddenly took > eight to ten times longer in a SolrCloud environment with three replica's. > > If we offload our current chains without OpenNLP, it will only benefit > when large fields pass through a regex, and for decompounding the Germanic > languages we ingest. Offloading just this cost is a micro optimization, > offloading the various OpenNLP char and token filters are really beneficial. > > Regarding a dependency on Lucene core and analysis-common, it would be > helpful, but we'll manage. > > Thanks again, > Markus > > -----Original message----- > > From:David Smiley <david.w.smi...@gmail.com> > > Sent: Thursday 12th April 2018 19:16 > > To: solr-user@lucene.apache.org > > Subject: Re: PreAnalyzed URP and SchemaRequest API > > > > Ah ok. > > I've wondered how much value there is in pre-analysis. The serialization > > of the analyzed form in JSON is bulky. If you can share any results, I'd > > be interested to hear how it went. It's an optimization so you should be > > able to know how much better it is. Of course it isn't for everybody -- > > only when the analysis chain is sufficiently complex. > > > > On Mon, Apr 9, 2018 at 9:45 AM Markus Jelsma <markus.jel...@openindex.io > > > > wrote: > > > > > Hello David, > > > > > > The remote client has everything on the class path but just calling > > > setTokenStream is not going to work. Remotely, all i get from > SchemaRequest > > > API is a AnalyzerDefinition. I haven't found any Solr code that allows > me > > > to transform that directly into an analyzer. If i had that, it would > make > > > things easy. > > > > > > As far as i see it, i need to reconstruct a real Analyzer using > > > AnalyzerDefinition's information. It won't be a problem, but it is > > > cumbersome. > > > > > > Thanks anyway, > > > Markus > > > > > > -----Original message----- > > > > From:David Smiley <david.w.smi...@gmail.com> > > > > Sent: Thursday 5th April 2018 19:38 > > > > To: solr-user@lucene.apache.org > > > > Subject: Re: PreAnalyzed URP and SchemaRequest API > > > > > > > > Is this really a problem when you could easily enough create a > TextField > > > > and call setTokenStream? > > > > > > > > Does your remote client have Solr-core and all its dependencies on > the > > > > classpath? That's one way to do it... and presumably the direction > you > > > > are going because you're asking how to work with PreAnalyzedParser > which > > > is > > > > in solr-core. *Alternatively*, only bring in Lucene core and > construct > > > > things yourself in the right format. You could copy > PreAnalyzedParser > > > into > > > > your codebase so that you don't have to reinvent any wheels, even > though > > > > that's awkward. Perhaps that ought to be in Solrj? But no we don't > want > > > > SolrJ depending on Lucene-core, though it'd make a fine "optional" > > > > dependency. > > > > > > > > On Wed, Apr 4, 2018 at 4:53 AM Markus Jelsma < > markus.jel...@openindex.io > > > > > > > > wrote: > > > > > > > > > Hello, > > > > > > > > > > We intend to move to PreAnalyzed URP for analysis offloading. > Browsing > > > the > > > > > Javadocs i came across the SchemaRequest API looking for a way to > get a > > > > > Field object remotely, which i seem to need for > > > > > JsonPreAnalyzedParser.toFormattedString(Field f). But all i can get > > > from > > > > > SchemaRequest API is FieldTypeRepresentation, which offers me > > > > > getIndexAnalyzer() but won't allow me to construct a Field object. > > > > > > > > > > So, to analyze remotely i do need an index-time analyzer. I can > get it, > > > > > but not turn it into a Field object, which the PreAnalyzedParser > for > > > some > > > > > reason wants. > > > > > > > > > > Any hints here? I must be looking the wrong way. > > > > > > > > > > Many thanks! > > > > > Markus > > > > > > > > > -- > > > > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker > > > > LinkedIn: http://linkedin.com/in/davidwsmiley | Book: > > > > http://www.solrenterprisesearchserver.com > > > > > > > > > -- > > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker > > LinkedIn: http://linkedin.com/in/davidwsmiley | Book: > > http://www.solrenterprisesearchserver.com > > > -- Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com