Yes I could imagine big gains from this strategy if OpenNLP is in the
analysis chain ;-)

On Fri, Apr 13, 2018 at 5:01 PM Markus Jelsma <markus.jel...@openindex.io>
wrote:

> Hello David,
>
> If JSON serialization is too bulky, we could also opt for
> SimplePreAnalyzed right? At least as a FieldType it is possible, if not
> with URP, it just needs some work.
>
> Regarding results; we haven't done it yet, and won't for some time, but we
> will when we reintroduce OpenNLP in the analysis chain. We tried to
> introduce POS-tagging on our own two years ago, but i wasn't suited for
> production because it was too heavy on the CPU. Indexing data suddenly took
> eight to ten times longer in a SolrCloud environment with three replica's.
>
> If we offload our current chains without OpenNLP, it will only benefit
> when large fields pass through a regex, and for decompounding the Germanic
> languages we ingest. Offloading just this cost is a micro optimization,
> offloading the various OpenNLP char and token filters are really beneficial.
>
> Regarding a dependency on Lucene core and analysis-common, it would be
> helpful, but we'll manage.
>
> Thanks again,
> Markus
>
> -----Original message-----
> > From:David Smiley <david.w.smi...@gmail.com>
> > Sent: Thursday 12th April 2018 19:16
> > To: solr-user@lucene.apache.org
> > Subject: Re: PreAnalyzed URP and SchemaRequest API
> >
> > Ah ok.
> > I've wondered how much value there is in pre-analysis.  The serialization
> > of the analyzed form in JSON is bulky.  If you can share any results, I'd
> > be interested to hear how it went.  It's an optimization so you should be
> > able to know how much better it is.  Of course it isn't for everybody --
> > only when the analysis chain is sufficiently complex.
> >
> > On Mon, Apr 9, 2018 at 9:45 AM Markus Jelsma <markus.jel...@openindex.io
> >
> > wrote:
> >
> > > Hello David,
> > >
> > > The remote client has everything on the class path but just calling
> > > setTokenStream is not going to work. Remotely, all i get from
> SchemaRequest
> > > API is a AnalyzerDefinition. I haven't found any Solr code that allows
> me
> > > to transform that directly into an analyzer. If i had that, it would
> make
> > > things easy.
> > >
> > > As far as i see it, i need to reconstruct a real Analyzer using
> > > AnalyzerDefinition's information. It won't be a problem, but it is
> > > cumbersome.
> > >
> > > Thanks anyway,
> > > Markus
> > >
> > > -----Original message-----
> > > > From:David Smiley <david.w.smi...@gmail.com>
> > > > Sent: Thursday 5th April 2018 19:38
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: PreAnalyzed URP and SchemaRequest API
> > > >
> > > > Is this really a problem when you could easily enough create a
> TextField
> > > > and call setTokenStream?
> > > >
> > > > Does your remote client have Solr-core and all its dependencies on
> the
> > > > classpath?   That's one way to do it... and presumably the direction
> you
> > > > are going because you're asking how to work with PreAnalyzedParser
> which
> > > is
> > > > in solr-core.  *Alternatively*, only bring in Lucene core and
> construct
> > > > things yourself in the right format.  You could copy
> PreAnalyzedParser
> > > into
> > > > your codebase so that you don't have to reinvent any wheels, even
> though
> > > > that's awkward.  Perhaps that ought to be in Solrj?  But no we don't
> want
> > > > SolrJ depending on Lucene-core, though it'd make a fine "optional"
> > > > dependency.
> > > >
> > > > On Wed, Apr 4, 2018 at 4:53 AM Markus Jelsma <
> markus.jel...@openindex.io
> > > >
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > We intend to move to PreAnalyzed URP for analysis offloading.
> Browsing
> > > the
> > > > > Javadocs i came across the SchemaRequest API looking for a way to
> get a
> > > > > Field object remotely, which i seem to need for
> > > > > JsonPreAnalyzedParser.toFormattedString(Field f). But all i can get
> > > from
> > > > > SchemaRequest API is FieldTypeRepresentation, which offers me
> > > > > getIndexAnalyzer() but won't allow me to construct a Field object.
> > > > >
> > > > > So, to analyze remotely i do need an index-time analyzer. I can
> get it,
> > > > > but not turn it into a Field object, which the PreAnalyzedParser
> for
> > > some
> > > > > reason wants.
> > > > >
> > > > > Any hints here? I must be looking the wrong way.
> > > > >
> > > > > Many thanks!
> > > > > Markus
> > > > >
> > > > --
> > > > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> > > > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> > > > http://www.solrenterprisesearchserver.com
> > > >
> > >
> > --
> > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> > http://www.solrenterprisesearchserver.com
> >
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Reply via email to