Re: Solr OpenNLP named entity extraction

2018-07-11 Thread Jerome Yang
Thanks a lot Steve!

On Wed, Jul 11, 2018 at 10:24 AM Steve Rowe  wrote:

> Hi Jerome,
>
> I was able to setup a configset to perform OpenNLP NER, loading the model
> files from local storage.
>
> There is a trick though[1]: the model files must be located *in a jar* or
> *in a subdirectory* under ${solr.solr.home}/lib/ or under a directory
> specified via a solrconfig.xml  directive.
>
> I tested with the bin/solr cloud example, and put model files under the
> two solr home directories, at example/cloud/node1/solr/lib/opennlp/ and
> example/cloud/node1/solr/lib/opennlp/.  The “opennlp/“ subdirectory is
> required, though its name can be anything else you choose.
>
> [1] As you noted, ZkSolrResourceLoader delegates to its parent classloader
> when it can’t find resources in a configset, and the parent classloader is
> set up to load from subdirectories and jar files under
> ${solr.solr.home}/lib/ or under a directory specified via a solrconfig.xml
>  directive.  These directories themselves are not included in the set
> of directories from which resources are loaded; only their children are.
>
> --
> Steve
> www.lucidworks.com
>
> > On Jul 9, 2018, at 10:10 PM, Jerome Yang  wrote:
> >
> > Hi Steve,
> >
> > Put models under " ${solr.solr.home}/lib/ " is not working.
> > I check the "ZkSolrResourceLoader" seems it will first try to find modes
> in
> > config set.
> > If not find, then it uses class loader to load from resources.
> >
> > Regards,
> > Jerome
> >
> > On Tue, Jul 10, 2018 at 9:58 AM Jerome Yang  wrote:
> >
> >> Thanks Steve!
> >>
> >>
> >> On Tue, Jul 10, 2018 at 5:20 AM Steve Rowe  wrote:
> >>
> >>> Hi Jerome,
> >>>
> >>> See the ref guide[1] for a writeup of how to enable uploading files
> >>> larger than 1MB into ZooKeeper.
> >>>
> >>> Local storage should also work - have you tried placing OpenNLP model
> >>> files in ${solr.solr.home}/lib/ ? - make sure you do the same on each
> node.
> >>>
> >>> [1]
> >>>
> https://lucene.apache.org/solr/guide/7_4/setting-up-an-external-zookeeper-ensemble.html#increasing-the-file-size-limit
> >>>
> >>> --
> >>> Steve
> >>> www.lucidworks.com
> >>>
>  On Jul 9, 2018, at 12:50 AM, Jerome Yang  wrote:
> 
>  Hi guys,
> 
>  In Solrcloud mode, where to put the OpenNLP models?
>  Upload to zookeeper?
>  As I test on solr 7.3.1, seems absolute path on local host is not
> >>> working.
>  And can not upload into zookeeper if the model size exceed 1M.
> 
>  Regards,
>  Jerome
> 
>  On Wed, Apr 18, 2018 at 9:54 AM Steve Rowe  wrote:
> 
> > Hi Alexey,
> >
> > First, thanks for moving the conversation to the mailing list.
> >>> Discussion
> > of usage problems should take place here rather than in JIRA.
> >
> > I locally set up Solr 7.3 similarly to you and was able to get things
> >>> to
> > work.
> >
> > Problems with your setup:
> >
> > 1. Your update chain is missing the Log and Run update processors at
> >>> the
> > end (I see these are missing from the example in the javadocs for the
> > OpenNLP NER update processor; I’ll fix that):
> >
> >
> >
> >
> >  The Log update processor isn’t strictly necessary, but, from <
> >
> >>>
> https://lucene.apache.org/solr/guide/7_3/update-request-processors.html#custom-update-request-processor-chain
> >> :
> >
> >  Do not forget to add RunUpdateProcessorFactory at the end of any
> >  chains you define in solrconfig.xml. Otherwise update requests
> >  processed by that chain will not actually affect the indexed
> >>> data.
> >
> > 2. Your example document is missing an “id” field.
> >
> > 3. For whatever reason, the pre-trained model "en-ner-person.bin"
> >>> doesn’t
> > extract anything from text “This is Steve Jobs 2”.  It will extract
> >>> “Steve
> > Jobs” from text “This is Steve Jobs in white” e.g. though.
> >
> > 4. (Not a problem necessarily) You may want to use a multi-valued
> >>> “string”
> > field for the “dest” field in your update chain, e.g. “people_str”
> >>> (“*_str”
> > in the default configset is so configured).
> >
> > --
> > Steve
> > www.lucidworks.com
> >
> >> On Apr 17, 2018, at 8:23 AM, Alexey Ponomarenko <
> >>> alex1989s...@gmail.com>
> > wrote:
> >>
> >> Hi once more I am trying to implement named entities extraction
> using
> > this
> >> manual
> >>
> >
> >>>
> https://lucene.apache.org/solr/7_3_0//solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html
> >>
> >> I am modified solrconfig.xml like this:
> >>
> >> 
> >>  > class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
> >>   opennlp/en-ner-person.bin
> >>   text_opennlp
> >>   description_en
> >>   content
> >> 
> >> 
> >>
> >> But when I was trying to add data using:

Re: Solr OpenNLP named entity extraction

2018-07-10 Thread Steve Rowe
Hi Jerome,

I was able to setup a configset to perform OpenNLP NER, loading the model files 
from local storage.

There is a trick though[1]: the model files must be located *in a jar* or *in a 
subdirectory* under ${solr.solr.home}/lib/ or under a directory specified via a 
solrconfig.xml  directive.

I tested with the bin/solr cloud example, and put model files under the two 
solr home directories, at example/cloud/node1/solr/lib/opennlp/ and 
example/cloud/node1/solr/lib/opennlp/.  The “opennlp/“ subdirectory is 
required, though its name can be anything else you choose.

[1] As you noted, ZkSolrResourceLoader delegates to its parent classloader when 
it can’t find resources in a configset, and the parent classloader is set up to 
load from subdirectories and jar files under ${solr.solr.home}/lib/ or under a 
directory specified via a solrconfig.xml  directive.  These directories 
themselves are not included in the set of directories from which resources are 
loaded; only their children are.

--
Steve
www.lucidworks.com

> On Jul 9, 2018, at 10:10 PM, Jerome Yang  wrote:
> 
> Hi Steve,
> 
> Put models under " ${solr.solr.home}/lib/ " is not working.
> I check the "ZkSolrResourceLoader" seems it will first try to find modes in
> config set.
> If not find, then it uses class loader to load from resources.
> 
> Regards,
> Jerome
> 
> On Tue, Jul 10, 2018 at 9:58 AM Jerome Yang  wrote:
> 
>> Thanks Steve!
>> 
>> 
>> On Tue, Jul 10, 2018 at 5:20 AM Steve Rowe  wrote:
>> 
>>> Hi Jerome,
>>> 
>>> See the ref guide[1] for a writeup of how to enable uploading files
>>> larger than 1MB into ZooKeeper.
>>> 
>>> Local storage should also work - have you tried placing OpenNLP model
>>> files in ${solr.solr.home}/lib/ ? - make sure you do the same on each node.
>>> 
>>> [1]
>>> https://lucene.apache.org/solr/guide/7_4/setting-up-an-external-zookeeper-ensemble.html#increasing-the-file-size-limit
>>> 
>>> --
>>> Steve
>>> www.lucidworks.com
>>> 
 On Jul 9, 2018, at 12:50 AM, Jerome Yang  wrote:
 
 Hi guys,
 
 In Solrcloud mode, where to put the OpenNLP models?
 Upload to zookeeper?
 As I test on solr 7.3.1, seems absolute path on local host is not
>>> working.
 And can not upload into zookeeper if the model size exceed 1M.
 
 Regards,
 Jerome
 
 On Wed, Apr 18, 2018 at 9:54 AM Steve Rowe  wrote:
 
> Hi Alexey,
> 
> First, thanks for moving the conversation to the mailing list.
>>> Discussion
> of usage problems should take place here rather than in JIRA.
> 
> I locally set up Solr 7.3 similarly to you and was able to get things
>>> to
> work.
> 
> Problems with your setup:
> 
> 1. Your update chain is missing the Log and Run update processors at
>>> the
> end (I see these are missing from the example in the javadocs for the
> OpenNLP NER update processor; I’ll fix that):
> 
>
>
> 
>  The Log update processor isn’t strictly necessary, but, from <
> 
>>> https://lucene.apache.org/solr/guide/7_3/update-request-processors.html#custom-update-request-processor-chain
>> :
> 
>  Do not forget to add RunUpdateProcessorFactory at the end of any
>  chains you define in solrconfig.xml. Otherwise update requests
>  processed by that chain will not actually affect the indexed
>>> data.
> 
> 2. Your example document is missing an “id” field.
> 
> 3. For whatever reason, the pre-trained model "en-ner-person.bin"
>>> doesn’t
> extract anything from text “This is Steve Jobs 2”.  It will extract
>>> “Steve
> Jobs” from text “This is Steve Jobs in white” e.g. though.
> 
> 4. (Not a problem necessarily) You may want to use a multi-valued
>>> “string”
> field for the “dest” field in your update chain, e.g. “people_str”
>>> (“*_str”
> in the default configset is so configured).
> 
> --
> Steve
> www.lucidworks.com
> 
>> On Apr 17, 2018, at 8:23 AM, Alexey Ponomarenko <
>>> alex1989s...@gmail.com>
> wrote:
>> 
>> Hi once more I am trying to implement named entities extraction using
> this
>> manual
>> 
> 
>>> https://lucene.apache.org/solr/7_3_0//solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html
>> 
>> I am modified solrconfig.xml like this:
>> 
>> 
>>  class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
>>   opennlp/en-ner-person.bin
>>   text_opennlp
>>   description_en
>>   content
>> 
>> 
>> 
>> But when I was trying to add data using:
>> 
>> *request:*
>> 
>> POST
>> 
> 
>>> http://localhost:8983/solr/numberplate/update?version=2.2=xml=multiple-extract
>> 
>> This is Steve Jobs 2
>> This is text 2> name="content">This is text for content 2
>> 
>> *response*
>> 
>> 
>> 
>>  
>>  0

Re: Solr OpenNLP named entity extraction

2018-07-09 Thread Jerome Yang
Hi Steve,

Put models under " ${solr.solr.home}/lib/ " is not working.
I check the "ZkSolrResourceLoader" seems it will first try to find modes in
config set.
If not find, then it uses class loader to load from resources.

Regards,
Jerome

On Tue, Jul 10, 2018 at 9:58 AM Jerome Yang  wrote:

> Thanks Steve!
>
>
> On Tue, Jul 10, 2018 at 5:20 AM Steve Rowe  wrote:
>
>> Hi Jerome,
>>
>> See the ref guide[1] for a writeup of how to enable uploading files
>> larger than 1MB into ZooKeeper.
>>
>> Local storage should also work - have you tried placing OpenNLP model
>> files in ${solr.solr.home}/lib/ ? - make sure you do the same on each node.
>>
>> [1]
>> https://lucene.apache.org/solr/guide/7_4/setting-up-an-external-zookeeper-ensemble.html#increasing-the-file-size-limit
>>
>> --
>> Steve
>> www.lucidworks.com
>>
>> > On Jul 9, 2018, at 12:50 AM, Jerome Yang  wrote:
>> >
>> > Hi guys,
>> >
>> > In Solrcloud mode, where to put the OpenNLP models?
>> > Upload to zookeeper?
>> > As I test on solr 7.3.1, seems absolute path on local host is not
>> working.
>> > And can not upload into zookeeper if the model size exceed 1M.
>> >
>> > Regards,
>> > Jerome
>> >
>> > On Wed, Apr 18, 2018 at 9:54 AM Steve Rowe  wrote:
>> >
>> >> Hi Alexey,
>> >>
>> >> First, thanks for moving the conversation to the mailing list.
>> Discussion
>> >> of usage problems should take place here rather than in JIRA.
>> >>
>> >> I locally set up Solr 7.3 similarly to you and was able to get things
>> to
>> >> work.
>> >>
>> >> Problems with your setup:
>> >>
>> >> 1. Your update chain is missing the Log and Run update processors at
>> the
>> >> end (I see these are missing from the example in the javadocs for the
>> >> OpenNLP NER update processor; I’ll fix that):
>> >>
>> >> 
>> >> 
>> >>
>> >>   The Log update processor isn’t strictly necessary, but, from <
>> >>
>> https://lucene.apache.org/solr/guide/7_3/update-request-processors.html#custom-update-request-processor-chain
>> >>> :
>> >>
>> >>   Do not forget to add RunUpdateProcessorFactory at the end of any
>> >>   chains you define in solrconfig.xml. Otherwise update requests
>> >>   processed by that chain will not actually affect the indexed
>> data.
>> >>
>> >> 2. Your example document is missing an “id” field.
>> >>
>> >> 3. For whatever reason, the pre-trained model "en-ner-person.bin"
>> doesn’t
>> >> extract anything from text “This is Steve Jobs 2”.  It will extract
>> “Steve
>> >> Jobs” from text “This is Steve Jobs in white” e.g. though.
>> >>
>> >> 4. (Not a problem necessarily) You may want to use a multi-valued
>> “string”
>> >> field for the “dest” field in your update chain, e.g. “people_str”
>> (“*_str”
>> >> in the default configset is so configured).
>> >>
>> >> --
>> >> Steve
>> >> www.lucidworks.com
>> >>
>> >>> On Apr 17, 2018, at 8:23 AM, Alexey Ponomarenko <
>> alex1989s...@gmail.com>
>> >> wrote:
>> >>>
>> >>> Hi once more I am trying to implement named entities extraction using
>> >> this
>> >>> manual
>> >>>
>> >>
>> https://lucene.apache.org/solr/7_3_0//solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html
>> >>>
>> >>> I am modified solrconfig.xml like this:
>> >>>
>> >>> 
>> >>>  > >> class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
>> >>>opennlp/en-ner-person.bin
>> >>>text_opennlp
>> >>>description_en
>> >>>content
>> >>>  
>> >>> 
>> >>>
>> >>> But when I was trying to add data using:
>> >>>
>> >>> *request:*
>> >>>
>> >>> POST
>> >>>
>> >>
>> http://localhost:8983/solr/numberplate/update?version=2.2=xml=multiple-extract
>> >>>
>> >>> This is Steve Jobs 2
>> >>> This is text 2> >>> name="content">This is text for content 2
>> >>>
>> >>> *response*
>> >>>
>> >>> 
>> >>> 
>> >>>   
>> >>>   0
>> >>>   3
>> >>>   
>> >>> 
>> >>>
>> >>> But I don't see any data inserted to *content* field and in any other
>> >> field.
>> >>>
>> >>> *If you need some additional data I can provide it.*
>> >>>
>> >>> Can you help me? What have I done wrong?
>> >>
>> >>
>> >
>> > --
>> > Pivotal Greenplum | Pivotal Software, Inc. 
>>
>>
>
> --
>  Pivotal Greenplum | Pivotal Software, Inc. 
>
>

-- 
 Pivotal Greenplum | Pivotal Software, Inc. 


Re: Solr OpenNLP named entity extraction

2018-07-09 Thread Jerome Yang
Thanks Steve!


On Tue, Jul 10, 2018 at 5:20 AM Steve Rowe  wrote:

> Hi Jerome,
>
> See the ref guide[1] for a writeup of how to enable uploading files larger
> than 1MB into ZooKeeper.
>
> Local storage should also work - have you tried placing OpenNLP model
> files in ${solr.solr.home}/lib/ ? - make sure you do the same on each node.
>
> [1]
> https://lucene.apache.org/solr/guide/7_4/setting-up-an-external-zookeeper-ensemble.html#increasing-the-file-size-limit
>
> --
> Steve
> www.lucidworks.com
>
> > On Jul 9, 2018, at 12:50 AM, Jerome Yang  wrote:
> >
> > Hi guys,
> >
> > In Solrcloud mode, where to put the OpenNLP models?
> > Upload to zookeeper?
> > As I test on solr 7.3.1, seems absolute path on local host is not
> working.
> > And can not upload into zookeeper if the model size exceed 1M.
> >
> > Regards,
> > Jerome
> >
> > On Wed, Apr 18, 2018 at 9:54 AM Steve Rowe  wrote:
> >
> >> Hi Alexey,
> >>
> >> First, thanks for moving the conversation to the mailing list.
> Discussion
> >> of usage problems should take place here rather than in JIRA.
> >>
> >> I locally set up Solr 7.3 similarly to you and was able to get things to
> >> work.
> >>
> >> Problems with your setup:
> >>
> >> 1. Your update chain is missing the Log and Run update processors at the
> >> end (I see these are missing from the example in the javadocs for the
> >> OpenNLP NER update processor; I’ll fix that):
> >>
> >> 
> >> 
> >>
> >>   The Log update processor isn’t strictly necessary, but, from <
> >>
> https://lucene.apache.org/solr/guide/7_3/update-request-processors.html#custom-update-request-processor-chain
> >>> :
> >>
> >>   Do not forget to add RunUpdateProcessorFactory at the end of any
> >>   chains you define in solrconfig.xml. Otherwise update requests
> >>   processed by that chain will not actually affect the indexed data.
> >>
> >> 2. Your example document is missing an “id” field.
> >>
> >> 3. For whatever reason, the pre-trained model "en-ner-person.bin"
> doesn’t
> >> extract anything from text “This is Steve Jobs 2”.  It will extract
> “Steve
> >> Jobs” from text “This is Steve Jobs in white” e.g. though.
> >>
> >> 4. (Not a problem necessarily) You may want to use a multi-valued
> “string”
> >> field for the “dest” field in your update chain, e.g. “people_str”
> (“*_str”
> >> in the default configset is so configured).
> >>
> >> --
> >> Steve
> >> www.lucidworks.com
> >>
> >>> On Apr 17, 2018, at 8:23 AM, Alexey Ponomarenko <
> alex1989s...@gmail.com>
> >> wrote:
> >>>
> >>> Hi once more I am trying to implement named entities extraction using
> >> this
> >>> manual
> >>>
> >>
> https://lucene.apache.org/solr/7_3_0//solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html
> >>>
> >>> I am modified solrconfig.xml like this:
> >>>
> >>> 
> >>>   >> class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
> >>>opennlp/en-ner-person.bin
> >>>text_opennlp
> >>>description_en
> >>>content
> >>>  
> >>> 
> >>>
> >>> But when I was trying to add data using:
> >>>
> >>> *request:*
> >>>
> >>> POST
> >>>
> >>
> http://localhost:8983/solr/numberplate/update?version=2.2=xml=multiple-extract
> >>>
> >>> This is Steve Jobs 2
> >>> This is text 2 >>> name="content">This is text for content 2
> >>>
> >>> *response*
> >>>
> >>> 
> >>> 
> >>>   
> >>>   0
> >>>   3
> >>>   
> >>> 
> >>>
> >>> But I don't see any data inserted to *content* field and in any other
> >> field.
> >>>
> >>> *If you need some additional data I can provide it.*
> >>>
> >>> Can you help me? What have I done wrong?
> >>
> >>
> >
> > --
> > Pivotal Greenplum | Pivotal Software, Inc. 
>
>

-- 
 Pivotal Greenplum | Pivotal Software, Inc. 


Re: Solr OpenNLP named entity extraction

2018-07-09 Thread Steve Rowe
Hi Jerome,

See the ref guide[1] for a writeup of how to enable uploading files larger than 
1MB into ZooKeeper.

Local storage should also work - have you tried placing OpenNLP model files in 
${solr.solr.home}/lib/ ? - make sure you do the same on each node.

[1] 
https://lucene.apache.org/solr/guide/7_4/setting-up-an-external-zookeeper-ensemble.html#increasing-the-file-size-limit

--
Steve
www.lucidworks.com

> On Jul 9, 2018, at 12:50 AM, Jerome Yang  wrote:
> 
> Hi guys,
> 
> In Solrcloud mode, where to put the OpenNLP models?
> Upload to zookeeper?
> As I test on solr 7.3.1, seems absolute path on local host is not working.
> And can not upload into zookeeper if the model size exceed 1M.
> 
> Regards,
> Jerome
> 
> On Wed, Apr 18, 2018 at 9:54 AM Steve Rowe  wrote:
> 
>> Hi Alexey,
>> 
>> First, thanks for moving the conversation to the mailing list.  Discussion
>> of usage problems should take place here rather than in JIRA.
>> 
>> I locally set up Solr 7.3 similarly to you and was able to get things to
>> work.
>> 
>> Problems with your setup:
>> 
>> 1. Your update chain is missing the Log and Run update processors at the
>> end (I see these are missing from the example in the javadocs for the
>> OpenNLP NER update processor; I’ll fix that):
>> 
>> 
>> 
>> 
>>   The Log update processor isn’t strictly necessary, but, from <
>> https://lucene.apache.org/solr/guide/7_3/update-request-processors.html#custom-update-request-processor-chain
>>> :
>> 
>>   Do not forget to add RunUpdateProcessorFactory at the end of any
>>   chains you define in solrconfig.xml. Otherwise update requests
>>   processed by that chain will not actually affect the indexed data.
>> 
>> 2. Your example document is missing an “id” field.
>> 
>> 3. For whatever reason, the pre-trained model "en-ner-person.bin" doesn’t
>> extract anything from text “This is Steve Jobs 2”.  It will extract “Steve
>> Jobs” from text “This is Steve Jobs in white” e.g. though.
>> 
>> 4. (Not a problem necessarily) You may want to use a multi-valued “string”
>> field for the “dest” field in your update chain, e.g. “people_str” (“*_str”
>> in the default configset is so configured).
>> 
>> --
>> Steve
>> www.lucidworks.com
>> 
>>> On Apr 17, 2018, at 8:23 AM, Alexey Ponomarenko 
>> wrote:
>>> 
>>> Hi once more I am trying to implement named entities extraction using
>> this
>>> manual
>>> 
>> https://lucene.apache.org/solr/7_3_0//solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html
>>> 
>>> I am modified solrconfig.xml like this:
>>> 
>>> 
>>>  > class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
>>>opennlp/en-ner-person.bin
>>>text_opennlp
>>>description_en
>>>content
>>>  
>>> 
>>> 
>>> But when I was trying to add data using:
>>> 
>>> *request:*
>>> 
>>> POST
>>> 
>> http://localhost:8983/solr/numberplate/update?version=2.2=xml=multiple-extract
>>> 
>>> This is Steve Jobs 2
>>> This is text 2>> name="content">This is text for content 2
>>> 
>>> *response*
>>> 
>>> 
>>> 
>>>   
>>>   0
>>>   3
>>>   
>>> 
>>> 
>>> But I don't see any data inserted to *content* field and in any other
>> field.
>>> 
>>> *If you need some additional data I can provide it.*
>>> 
>>> Can you help me? What have I done wrong?
>> 
>> 
> 
> -- 
> Pivotal Greenplum | Pivotal Software, Inc. 



Re: Solr OpenNLP named entity extraction

2018-07-08 Thread Jerome Yang
Hi guys,

In Solrcloud mode, where to put the OpenNLP models?
Upload to zookeeper?
As I test on solr 7.3.1, seems absolute path on local host is not working.
And can not upload into zookeeper if the model size exceed 1M.

Regards,
Jerome

On Wed, Apr 18, 2018 at 9:54 AM Steve Rowe  wrote:

> Hi Alexey,
>
> First, thanks for moving the conversation to the mailing list.  Discussion
> of usage problems should take place here rather than in JIRA.
>
> I locally set up Solr 7.3 similarly to you and was able to get things to
> work.
>
> Problems with your setup:
>
> 1. Your update chain is missing the Log and Run update processors at the
> end (I see these are missing from the example in the javadocs for the
> OpenNLP NER update processor; I’ll fix that):
>
>  
>  
>
>The Log update processor isn’t strictly necessary, but, from <
> https://lucene.apache.org/solr/guide/7_3/update-request-processors.html#custom-update-request-processor-chain
> >:
>
>Do not forget to add RunUpdateProcessorFactory at the end of any
>chains you define in solrconfig.xml. Otherwise update requests
>processed by that chain will not actually affect the indexed data.
>
> 2. Your example document is missing an “id” field.
>
> 3. For whatever reason, the pre-trained model "en-ner-person.bin" doesn’t
> extract anything from text “This is Steve Jobs 2”.  It will extract “Steve
> Jobs” from text “This is Steve Jobs in white” e.g. though.
>
> 4. (Not a problem necessarily) You may want to use a multi-valued “string”
> field for the “dest” field in your update chain, e.g. “people_str” (“*_str”
> in the default configset is so configured).
>
> --
> Steve
> www.lucidworks.com
>
> > On Apr 17, 2018, at 8:23 AM, Alexey Ponomarenko 
> wrote:
> >
> > Hi once more I am trying to implement named entities extraction using
> this
> > manual
> >
> https://lucene.apache.org/solr/7_3_0//solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html
> >
> > I am modified solrconfig.xml like this:
> >
> > 
> >class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
> > opennlp/en-ner-person.bin
> > text_opennlp
> > description_en
> > content
> >   
> > 
> >
> > But when I was trying to add data using:
> >
> > *request:*
> >
> > POST
> >
> http://localhost:8983/solr/numberplate/update?version=2.2=xml=multiple-extract
> >
> > This is Steve Jobs 2
> > This is text 2 > name="content">This is text for content 2
> >
> > *response*
> >
> > 
> > 
> >
> >0
> >3
> >
> > 
> >
> > But I don't see any data inserted to *content* field and in any other
> field.
> >
> > *If you need some additional data I can provide it.*
> >
> > Can you help me? What have I done wrong?
>
>

-- 
 Pivotal Greenplum | Pivotal Software, Inc. 


Re: Solr OpenNLP named entity extraction

2018-04-17 Thread Steve Rowe
Hi Alexey,

First, thanks for moving the conversation to the mailing list.  Discussion of 
usage problems should take place here rather than in JIRA.

I locally set up Solr 7.3 similarly to you and was able to get things to work.

Problems with your setup:

1. Your update chain is missing the Log and Run update processors at the end (I 
see these are missing from the example in the javadocs for the OpenNLP NER 
update processor; I’ll fix that):

 
 

   The Log update processor isn’t strictly necessary, but, from 
:

   Do not forget to add RunUpdateProcessorFactory at the end of any
   chains you define in solrconfig.xml. Otherwise update requests
   processed by that chain will not actually affect the indexed data.

2. Your example document is missing an “id” field.

3. For whatever reason, the pre-trained model "en-ner-person.bin" doesn’t 
extract anything from text “This is Steve Jobs 2”.  It will extract “Steve 
Jobs” from text “This is Steve Jobs in white” e.g. though.

4. (Not a problem necessarily) You may want to use a multi-valued “string” 
field for the “dest” field in your update chain, e.g. “people_str” (“*_str” in 
the default configset is so configured).

--
Steve
www.lucidworks.com

> On Apr 17, 2018, at 8:23 AM, Alexey Ponomarenko  
> wrote:
> 
> Hi once more I am trying to implement named entities extraction using this
> manual
> https://lucene.apache.org/solr/7_3_0//solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html
> 
> I am modified solrconfig.xml like this:
> 
> 
>   
> opennlp/en-ner-person.bin
> text_opennlp
> description_en
> content
>   
> 
> 
> But when I was trying to add data using:
> 
> *request:*
> 
> POST
> http://localhost:8983/solr/numberplate/update?version=2.2=xml=multiple-extract
> 
> This is Steve Jobs 2
> This is text 2 name="content">This is text for content 2
> 
> *response*
> 
> 
> 
>
>0
>3
>
> 
> 
> But I don't see any data inserted to *content* field and in any other field.
> 
> *If you need some additional data I can provide it.*
> 
> Can you help me? What have I done wrong?



Re: Solr OpenNLP named entity extraction

2018-04-17 Thread David Hastings
Did you send a commit after you sent the document?

On Tue, Apr 17, 2018 at 8:23 AM, Alexey Ponomarenko 
wrote:

> Hi once more I am trying to implement named entities extraction using this
> manual
> https://lucene.apache.org/solr/7_3_0//solr-analysis-
> extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpd
> ateProcessorFactory.html
>
> I am modified solrconfig.xml like this:
>
>  
>
>  opennlp/en-ner-person.bin
>  text_opennlp
>  description_en
>  content
>
>  
>
> But when I was trying to add data using:
>
> *request:*
>
> POST
> http://localhost:8983/solr/numberplate/update?version=2.
> 2=xml=multiple-extract
>
> This is Steve Jobs 2
> This is text 2 name="content">This is text for content 2
>
> *response*
>
> 
> 
> 
> 0
> 3
> 
> 
>
> But I don't see any data inserted to *content* field and in any other
> field.
>
> *If you need some additional data I can provide it.*
>
> Can you help me? What have I done wrong?
>


Solr OpenNLP named entity extraction

2018-04-17 Thread Alexey Ponomarenko
Hi once more I am trying to implement named entities extraction using this
manual
https://lucene.apache.org/solr/7_3_0//solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html

I am modified solrconfig.xml like this:

 
   
 opennlp/en-ner-person.bin
 text_opennlp
 description_en
 content
   
 

But when I was trying to add data using:

*request:*

POST
http://localhost:8983/solr/numberplate/update?version=2.2=xml=multiple-extract

This is Steve Jobs 2
This is text 2This is text for content 2

*response*




0
3



But I don't see any data inserted to *content* field and in any other field.

*If you need some additional data I can provide it.*

Can you help me? What have I done wrong?