Re: Resetting Authentication/Authorization

2018-03-29 Thread Shawn Heisey

On 3/29/2018 8:28 PM, Terry Steichen wrote:

When I set up the initial authentications and authorizations (I'm using
6.6.0 and running in cloud mode.), I call "bin/solr auth enable
-credentials xxx:yyy".


What does this command output?  There should definitely be something 
output when that command is run.  I don't know if it will be a lot of 
output or a little bit, but whatever it is, can you provide it?



I then use a series of additional API calls ( to
create additional users and permissions).  This creates my desired
security environment (and, BTW, it seems to function as it should).


Can you elaborate on exactly what you did when you say "a series of 
additional API calls"?



If I restart solr, it appears I must reactivate it with the same
'bin/solr auth enable -credentials xxx:yyy' command.  But, it seems that
when solr is restarted this way, only the authorizations are retained
persistently.  But the authentications have to be created again from
scratch.


Enabling the authentication when running in cloud mode should upload a 
"security.json" file to zookeeper.  It should also write some variables 
to your solr.in.sh file, so that future usage of the bin/solr tool can 
provide the authentication that is required.


Thanks,
Shawn



Resetting Authentication/Authorization

2018-03-29 Thread Terry Steichen
When I set up the initial authentications and authorizations (I'm using
6.6.0 and running in cloud mode.), I call "bin/solr auth enable
-credentials xxx:yyy".  I then use a series of additional API calls ( to
create additional users and permissions).  This creates my desired
security environment (and, BTW, it seems to function as it should).

If I restart solr, it appears I must reactivate it with the same
'bin/solr auth enable -credentials xxx:yyy' command.  But, it seems that
when solr is restarted this way, only the authorizations are retained
persistently.  But the authentications have to be created again from
scratch.

I would like to (somehow) capture the authentication/authorization
information (probably in a security.json file?) and then (somehow)
reload it when there's a restart. 

Can that be done?


Need help to get started on Solr, searching get nothing. Thank you very much in advance

2018-03-29 Thread Raymond Xie
 I am new to Solr, following Steve Rowe's example on
https://github.com/apache/lucene-solr/tree/master/solr/example/films:

It would be greatly appreciated if anyone can enlighten me where to start
troubleshooting, thank you very much in advance.

The steps I followed are:

Here ya go << END_OF_SCRIPT

bin/solr stop
rm server/logs/*.log
rm -Rf server/solr/films/
bin/solr start
bin/solr create -c films
curl http://localhost:8983/solr/films/schema -X POST -H
'Content-type:application/json' --data-binary '{
"add-field" : {
"name":"name",
"type":"text_general",
"multiValued":false,
"stored":true
},
"add-field" : {
"name":"initial_release_date",
"type":"pdate",
"stored":true
}
}'
bin/post -c films example/films/films.json
curl http://localhost:8983/solr/films/config/params -H
'Content-type:application/json'  -d '{
"update" : {
  "facets": {
"facet.field":"genre"
}
  }
}'

# END_OF_SCRIPT

Additional fun -

Add highlighting:
curl http://localhost:8983/solr/films/config/params -H
'Content-type:application/json'  -d '{
"set" : {
  "browse": {
"hl":"on",
"hl.fl":"name"
}
  }
}'
try http://localhost:8983/solr/films/browse?q=batman now, and you'll
see "batman" highlighted in the results



I got nothing in my search:




**
*Sincerely yours,*


*Raymond*


Re: Three Indexing Questions

2018-03-29 Thread Shawn Heisey
On 3/29/2018 3:59 PM, Terry Steichen wrote:
> First question: When indexing content in a directory, Solr's normal
> behavior is to recursively index all the files found in that directory
> and its subdirectories.  However, turns out that when the files are of
> the form *.eml (email), solr won't do that.  I can use a wildcard to get
> it to index the current directory, but it won't recurse.

At first I had no idea what program you were using.  I may have figured
it out, see below.

> I note this message that's displayed when I begin indexing: "Entering
> auto mode. File endings considered are
> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log

That looks like the simple post tool included with Solr.  If it is, type
"bin/post -help" and you will see that there is a -filetypes option that
lets you change the list of extensions that are considered valid.

Note that the post tool included with Solr is a SIMPLE post tool.  It's
designed as a way to get your feet wet, not for heavy production usage. 
It does not have extensive capability.  We strongly recommend that you
graduate to a better indexing program.  Usually that means that you're
going to have to write one yourself, to be sure that it does everything
YOU want it to do.  The one included with Solr probably can't do some of
the things that you want it to do.

Also, indexing files using the post tool is going to run Tika extraction
inside Solr.  Tika is a separate Apache project.  Solr happens to
include a subset of Tika's capability that can run inside Solr.  That
program is known to sometimes behave explosively when it processes
documents.  If an explosion happens in Tika and it's running inside
Solr, then Solr itself might crash.  Running Tika outside Solr, usually
in a program that you write yourself, is highly recommended.  Doing this
will also give you access to the full range of Tika's capabilities.

Here's an example of a program that uses both JDBC and Tika to index to
Solr:

https://lucidworks.com/2012/02/14/indexing-with-solrj/

If you search google for "tika index solr" (without the quotes), you'll
find some other examples of custom programs that use Tika to index to
Solr.  There may be better searches you can do on Google as well.

Thanks,
Shawn



Re: Indexing multi level Nested JSON using curl

2018-03-29 Thread Zheng Lin Edwin Yeo
Hi,

Do anyone knows if we can make any change to the the split=/|/orgs in the
curl URL command to achieve the indexing of the multi-level Nested JSON?

Regards,
Edwin

On 26 March 2018 at 17:30, Zheng Lin Edwin Yeo  wrote:

> Hi,
>
> I'm trying to index the following JSON with 2 child level using the
> following curl command using cygwin:
>
> curl 'http://localhost:8983/solr/collection1/update/json/docs?spl
> it=/|/orgs'
> -H 'Content-type:application/json' -d '
> {
>   "id":"1",
>   "name_s": "JoeSmith",
>   "phone_s": 876876687,
>   "orgs": [
> {
>   "name1_s" : "Microsoft",
>   "city_s" : "Seattle",
>   "zip_s" : 98052,
>   "orgs":[{"name2_ss":"alan","phone2_ss":"123"},{"name2_ss":"
> edwin","phone2_ss":"456"}]
> },
> {
>   "name1_s" : "Apple",
>   "city_s" : "Cupertino",
>   "zip_s" : 95014,
>   "orgs":[{"name2_ss":"alan","phone2_ss":"123"},{"name2_ss":"
> edwin","phone2_ss":"456"}]
> }
>   ]
> }'
>
> However, after indexing, this is what is shown in Solr. The 2nd child have
> been place together under the 1st child as a multi-valued field, which is
> wrong. If I have set the field for the 2nd child to be non-multi-valued
> field,  it will have error saying "multiple values encountered for non
> multiValued field orgs2.name2_s:".
>
> {
>   "responseHeader":{
> "zkConnected":true,
> "status":0,
> "QTime":41,
> "params":{
>   "q":"phone_s:876876687",
>   "fl":"*,[child parentFilter=phone_s:876876687]",
>   "sort":"id asc"}},
>   "response":{"numFound":1,"start":0,"docs":[
>   {
> "id":"1",
> "name_s":"JoeSmith",
> "phone_s":"876876687",
> "language_s":"en",
> "_version_":1595632041779527680,
> "_childDocuments_":[
> {
>   "name1_s":"Microsoft",
>   "city_s":"Seattle",
>   "zip_s":"98052",
>   "orgs.name2_ss":["alan",
> "edwin"],
>   "orgs.phone2_ss":["123",
> "456"],
>   "_version_":1595632041779527680},
> {
>   "name1_s":"Apple",
>   "city_s":"Cupertino",
>   "zip_s":"95014",
>   "orgs.name2_ss":["alan",
> "edwin"],
>   "orgs.phone2_ss":["123",
> "456"],
>   "_version_":1595632041779527680}]}]
>   }}
>
>
> How can we structure the curl command so it will be able to accept child
> of child relationship? We should not be doing any pre-processing to the
> JSON to achieve that.
>
> I'm using Solr 7.2.1.
>
> Regards,
> Edwin
>
>


Re: UIMA-SOLR integration

2018-03-29 Thread Mark Robinson
Thanks much Steve for the suggestions and pointers.

Best,
Mark

On Thu, Mar 29, 2018 at 3:17 PM, Steve Rowe  wrote:

> Hi Mark,
>
> Not sure about the advisability of pursuing UIMA - I’ve never used it with
> Lucene or Solr - but soon-to-be-released Solr 7.3, will include OpenNLP
> integration:
>
> * Language analysis, in the Solr reference guide: <
> https://builds.apache.org/view/L/view/Lucene/job/Solr-
> reference-guide-7.3/javadoc/language-analysis.html#opennlp-integration>
>
> * Language detection, in the Solr reference guide: <
> https://builds.apache.org/view/L/view/Lucene/job/Solr-
> reference-guide-7.3/javadoc/detecting-languages-during-indexing.html>
>
> * NER, in javadocs (sorry, couldn’t think of a place where a pre-release
> HTML view is available):  org/repos/asf?p=lucene-solr.git;a=blob;f=solr/contrib/
> analysis-extras/src/java/org/apache/solr/update/processor/
> OpenNLPExtractNamedEntitiesUpdateProcessorFactory.java;hb=
> refs/heads/branch_7_3#l60>
>
> --
> Steve
> www.lucidworks.com
>
> > On Mar 29, 2018, at 6:40 AM, Mark Robinson 
> wrote:
> >
> > Hi All,
> >
> > Is it still advisable to pursue UIMA or can some one pls advise something
> > else to check on related to SOLR and NLP?
> >
> > Thanks!
> > Mark
> >
> >
> > -- Forwarded message --
> > From: Mark Robinson 
> > Date: Wed, Mar 28, 2018 at 2:21 PM
> > Subject: UIMA-SOLR integration
> > To: solr-user@lucene.apache.org
> >
> >
> > Hi,
> >
> > I was trying to integrate UIMA into SOLR following the solr docs and many
> > other hints on the net.
> > While trying to get a VALID_ALCHEMYAPI_KEY I contacted IBM support and
> got
> > the following advice:-
> >
> > "As announced a year a go the Alchemy Service was scheduled and shutdown
> on
> > March 7th, 2018, and is no longer supported.  The AlchemAPI services was
> > broken down into three other services where AlchemyLanguage has been
> > replaced by Natural Language Understanding, AlchemyVision by Visual
> > Recognition, and AlchemyDataNews by Discovery News.  The suggestion is to
> > migrated to the respective merged service in order to be able to take
> > advantage of the features."
> >
> > Could someone please share any other suggestions instead of having to
> > use ALCHEMYAPI so that I can still continue with my work.
> >
> > Note:- I already commented out OPENCALAIS references in
> > OverridingParamsExtServicesAE.xml as I was getting error with OPEN
> > CALAIS so was relying only on AlchemyAPI only.
> >
> > Any immediate help is greatly appreciated!
> >
> > Thanks!
> >
> > Mark
>
>


Re: Three Indexing Questions

2018-03-29 Thread Erik Hatcher
Terry -

You’re speaking of bin/post, looks like.   bin/post is _just_ a simple tool to 
provide some basic utility.   The fact that it can recurse a directory 
structure at all is an extra bonus that really isn’t about “Solr” per se, but 
about posting content into it.   

Frankly, (even as the author of bin/post) I don’t think bin/post for file 
system crawling is the rightest way to go.   Having Solr parse content (which 
bin/post sends into Solr’s /update/extract handler) itself is recommended for 
production/scale.

All caveats aside and recommendations to upsize your file crawler…. it’s just a 
bin/post shell script and a Java class called SimplePostTool - I’d encourage 
you to adapt what it does to your requirements so that it will send over .eml 
files like apparently work manually (how did you test that?  curious on the 
details), and handle multiple directories.   It wasn’t designed to handle 
robust file crawls, but certainly is there for your taking to adjust to your 
needs if it is close enough.   And of course, if you want to generalize the 
handling and submit that back then bin/post can improve!

In short: no, bin/post can’t do the things you’re asking of it, but there’s no 
reason it couldn’t be evolved to handle those things.

Erik


> 
> I note this message that's displayed when I begin indexing: "Entering
> auto mode. File endings considered are
> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> 
> Is there a way to get it to recurse through files with different
> extensions, for example, like .eml?  When I manually add all the
> subdirectory content, solr seems to parse the content very well,
> recognizing all the standard email metadata.  I just can't get it to do
> the indexing recursively.
> 
> Second question: if I want to index files from many different source
> directories, is there a way to specify these different sources in one
> command? (Right now I have to issue a separate indexing command for each
> directory - which means I have to sit around and wait till each is
> finished.)
> 
> Third question: I have a very large directory structure that includes a
> couple of subdirectories I'd like to exclude from indexing.  Is there a
> way to index recursively, but exclude specified directories?
> 



Three Indexing Questions

2018-03-29 Thread Terry Steichen
First question: When indexing content in a directory, Solr's normal
behavior is to recursively index all the files found in that directory
and its subdirectories.  However, turns out that when the files are of
the form *.eml (email), solr won't do that.  I can use a wildcard to get
it to index the current directory, but it won't recurse.

I note this message that's displayed when I begin indexing: "Entering
auto mode. File endings considered are
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log

Is there a way to get it to recurse through files with different
extensions, for example, like .eml?  When I manually add all the
subdirectory content, solr seems to parse the content very well,
recognizing all the standard email metadata.  I just can't get it to do
the indexing recursively.

Second question: if I want to index files from many different source
directories, is there a way to specify these different sources in one
command? (Right now I have to issue a separate indexing command for each
directory - which means I have to sit around and wait till each is
finished.)

Third question: I have a very large directory structure that includes a
couple of subdirectories I'd like to exclude from indexing.  Is there a
way to index recursively, but exclude specified directories?



Re: WordDelimiterGraphFilter expected behaviour ?

2018-03-29 Thread Shawn Heisey
On 3/29/2018 1:48 PM, Kelvyn Scrupps wrote:
> I'm using WordDelimiterGraphFilter on a field and came across a curious 
> additional positional "hole" generated by the filter while playing with the 
> analysis tool.  
> For input "wibble , wobble" (space either side of the comma so it's a 
> separate token), the output introduces an additional positional hole after 
> the comma, i.e. 
>
> Term   position
> Wibble 1
> ,  2
> Wobble  4 *
>
> The positionlength for each is 1, so no obvious graph-span going on.
>
> Its not just comma, any punctuation would do, e.g. "wibble ! wobble"

The wrinkle here is enabling preserveOriginal at the same time that you
have a term which is completely removed by the filter (in this case, the
comma).  If preserveOriginal is disabled, they both behave the same.  I
don't know if this is a bug or not.  My instinct is to say it's a bug,
but it's possible that this is expected.

Having a term that's just a punctuation character in the index is
generally not very useful ... but there are OTHER situations with this
filter where preserveOriginal *is* the behavior you want.  I would
imagine that as long as you don't have terms that completely disappear
when the filter runs, it would behave correctly.  Try replacing the ","
with "x," to see what I mean.

Also, FYI, when using a Graph filter, the index analysis chain must also
have this filter (but not the query analysis):

        

Adding that didn't seem to fix the behavior that concerns you, but the
docs do say it's required on the index analysis whenever using a Graph
filter.

Thanks,
Shawn



Re: Solr 7.2 cannot see all running nodes

2018-03-29 Thread Shawn Heisey
On 3/29/2018 12:45 PM, Abhi Basu wrote:
> Also, another question, where it says to copy the zoo.cfg from
> /solr72/server/solr folder to /solr72/server/solr/node1/solr, should I
> actually be grabbing the zoo.cfg from one of my external zk nodes?

If you're using zookeeper processes that are separate from Solr, then
zoo.cfg in the solr directory is unimportant.

Doing anything related to zoo.cfg in a solr directory would imply that
you are running Solr with the embedded ZK.  Which is not recommended in
most cases.  The primary issue with the embedded ZK is that when you
stop Solr, you're also stopping ZK.

Thanks,
Shawn



Re: Routing a subquery directly to the shard a document came from

2018-03-29 Thread Jeff Wartes

This gets really close:

q=
fl=id,subquery:[subquery],[shard]
subquery.q=
subquery.fq={!cache=false} +{!terms f=_root_ v=$row.id}
subquery.shards=$row.[shard]

The issue here is that local params aren't a thing except in a query parser, 
and the "shards=" param isn't a query so it isn't parsed. So I have no way to 
dereference the "$row.[shard]".


On 3/27/18, 4:00 PM, "Jeff Wartes"  wrote:


I have a large 7.2 index with nested documents and many shards.
For each result (parent doc) in a query, I want to gather a 
relevance-ranked subset of the child documents. It seemed like the subquery 
transformer would be ideal: 
https://lucene.apache.org/solr/guide/7_2/transforming-result-documents.html#TransformingResultDocuments-_subquery_
(the [child] transformer allows for a filter, but the results have an 
effectively random sort)

So maybe something like this:
q=
fl=id,subquery:[subquery]
subquery.q=
subquery.fq={!cache=false} +{!terms f=_root_ v=$row.id}

This actually works fine, but there’s a lot more work going on than 
necessary. Say we have X shards and get N documents back:

Query http requests = 1 top-level query + X distributed shard-requests
Subquery http requests = N rows + N * X distributed shard-requests
So with N=10 results and X=50 shards, that is: 1+50+10+500 = 561 http 
requests through the cluster.

Some of that is unavoidable, of course, but it occurs to me that all the 
child docs are indexed in the same shard (segment) that the parent doc is. 
Meaning that if you know the parent doc id, (and I do) you can use the document 
routing to know exactly which shard to send the subquery request to. This would 
save 490 of the http requests in the scenario above.

Is there any form of query that allows for explicitly following the 
document routing rules for a given document ID?

I’m aware of the “distrib=false” and “shards=foo” parameters, but using 
those would require me to recreate the document routing in the client.
There’s also the “fl=[shard]” thing, but that would still require me to 
handle the subqueries in the client.






PreAnalyzed FieldType, and simultaneously importing JSON

2018-03-29 Thread Markus Jelsma
Hello,

We want to move to PreAnalyzed FieldType to offload our very heavy analysis 
chain away from the search cluster, so we have to configure our fields to 
accept pre-analyzed tokens in production.

But we use the same schema in development environments too, and that is where 
we use JSON files, or stream (export/import) data directly from production 
servers into a development environment, again via JSON. And in case of disaster 
recovery, we can import the daily exported JSON bzipped files back into our 
production servers.

But this JSON loading does not work with PreAnalyzed FieldType. So to load JSON 
we must reset all fields back to their respective language specific FieldTypes 
on-the-fly, we could automate, but it is a hassle we like to avoid.

Have i overlooked any configuration parameters that can help? Must we automate 
the on-the-fly schema reconfiguration and reset to PreAnalyzed after JSON 
loading is finished?

Many thanks!
Markus


WordDelimiterGraphFilter expected behaviour ?

2018-03-29 Thread Kelvyn Scrupps
Hi

First posting to list, but here goes .

I'm using WordDelimiterGraphFilter on a field and came across a curious 
additional positional "hole" generated by the filter while playing with the 
analysis tool.  
For input "wibble , wobble" (space either side of the comma so it's a separate 
token), the output introduces an additional positional hole after the comma, 
i.e. 

Term   position
Wibble 1
,  2
Wobble  4 *

The positionlength for each is 1, so no obvious graph-span going on.

Its not just comma, any punctuation would do, e.g. "wibble ! wobble"

I know it's a bit contrived, and it doesn't break anything in production but it 
just puzzled me.  

The question is - is this by design ?.  Its not the behaviour of the old 
WordDelimiterFilter filter.  

Setup:

Solr 6.6.3

Field:





 ...
  

Thanks for any insight.

Kelvyn Scrupps
Developer for Allies Computing 

 
 
 


__
This email has been scanned by the Symantec Email Security.cloud service 
(http://www.symanteccloud.com) for Allies Computing Ltd
__


Re: Solr 7.2 cannot see all running nodes

2018-03-29 Thread Abhi Basu
Just an update. Adding hostnames to solr.xml and using "-z
zk1:2181,zk2:2181,zk3:2181" worked and I can see 4 live nodes and able to
create collection with 2S/2R.

Thanks for your help, greatly appreciate it.

Regards,

Abhi

On Thu, Mar 29, 2018 at 1:45 PM, Abhi Basu <9000r...@gmail.com> wrote:

> Also, another question, where it says to copy the zoo.cfg from
> /solr72/server/solr folder to /solr72/server/solr/node1/solr, should I
> actually be grabbing the zoo.cfg from one of my external zk nodes?
>
> Thanks,
>
> Abhi
>
> On Thu, Mar 29, 2018 at 1:04 PM, Abhi Basu <9000r...@gmail.com> wrote:
>
>> Ok, will give it a try along with the host name.
>>
>>
>> On Thu, Mar 29, 2018 at 12:20 PM, Webster Homer 
>> wrote:
>>
>>> This Zookeeper ensemble doesn't look right.
>>> >
>>> > ./bin/solr start -cloud -s /usr/local/bin/solr-7.2.1/server/solr/node1/
>>> -p
>>> > 8983 -z zk0-esohad,zk1-esohad,zk3-esohad:2181 -m 8g
>>>
>>>
>>> Shouldn't the zookeeper ensemble be specified as:
>>>   zk0-esohad:2181,zk1-esohad:2181,zk3-esohad:2181
>>>
>>> You should put the zookeeper port on each node in the comma separated
>>> list.
>>> I don't know if this is your problem, but I think your solr nodes will
>>> only
>>> be connecting to 1 zookeeper
>>>
>>> On Thu, Mar 29, 2018 at 10:56 AM, Walter Underwood <
>>> wun...@wunderwood.org>
>>> wrote:
>>>
>>> > I had that problem. Very annoying and we probably should require
>>> special
>>> > flag to use localhost.
>>> >
>>> > We need to start solr like this:
>>> >
>>> > ./solr start -c -h `hostname`
>>> >
>>> > If anybody ever forgets, we get a 127.0.0.1 node that shows down in
>>> > cluster status. No idea how to get rid of that.
>>> >
>>> > wunder
>>> > Walter Underwood
>>> > wun...@wunderwood.org
>>> > http://observer.wunderwood.org/  (my blog)
>>> >
>>> > > On Mar 29, 2018, at 7:46 AM, Shawn Heisey 
>>> wrote:
>>> > >
>>> > > On 3/29/2018 8:25 AM, Abhi Basu wrote:
>>> > >> "Operation create caused
>>> > >> exception:":"org.apache.solr.common.SolrException:org.
>>> > apache.solr.common.SolrException:
>>> > >> Cannot create collection ems-collection. Value of maxShardsPerNode
>>> is 1,
>>> > >> and the number of nodes currently live or live and part of your
>>> > >
>>> > > I'm betting that all your nodes are registering themselves with the
>>> same
>>> > name, and that name is probably either 127.0.0.1 or 127.1.1.0 -- an
>>> address
>>> > on the loopback interface.
>>> > >
>>> > > Usually this problem (on an OS other than Windows, at least) is
>>> caused
>>> > by an incorrect /etc/hosts file that maps your hostname to a  loopback
>>> > address instead of a real address.
>>> > >
>>> > > You can override the value that SolrCloud uses to register itself
>>> into
>>> > zookeeper so it doesn't depend on the OS configuration.  In solr.in.sh,
>>> I
>>> > think this is the SOLR_HOST variable, which gets translated into
>>> -Dhost=XXX
>>> > on the java commandline.  It can also be configured in solr.xml.
>>> > >
>>> > > Thanks,
>>> > > Shawn
>>> > >
>>> >
>>> >
>>>
>>> --
>>>
>>>
>>> This message and any attachment are confidential and may be privileged or
>>> otherwise protected from disclosure. If you are not the intended
>>> recipient,
>>> you must not copy this message or attachment or disclose the contents to
>>> any other person. If you have received this transmission in error, please
>>> notify the sender immediately and delete the message and any attachment
>>> from your system. Merck KGaA, Darmstadt, Germany and any of its
>>> subsidiaries do not accept liability for any omissions or errors in this
>>> message which may arise as a result of E-Mail-transmission or for damages
>>> resulting from any unauthorized changes of the content of this message
>>> and
>>> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
>>> subsidiaries do not guarantee that this message is free of viruses and
>>> does
>>> not accept liability for any damages caused by any virus transmitted
>>> therewith.
>>>
>>> Click http://www.emdgroup.com/disclaimer to access the German, French,
>>> Spanish and Portuguese versions of this disclaimer.
>>>
>>
>>
>>
>> --
>> Abhi Basu
>>
>
>
>
> --
> Abhi Basu
>



-- 
Abhi Basu


Re: UIMA-SOLR integration

2018-03-29 Thread Steve Rowe
Hi Mark,

Not sure about the advisability of pursuing UIMA - I’ve never used it with 
Lucene or Solr - but soon-to-be-released Solr 7.3, will include OpenNLP 
integration:

* Language analysis, in the Solr reference guide: 


* Language detection, in the Solr reference guide: 


* NER, in javadocs (sorry, couldn’t think of a place where a pre-release HTML 
view is available): 


--
Steve
www.lucidworks.com

> On Mar 29, 2018, at 6:40 AM, Mark Robinson  wrote:
> 
> Hi All,
> 
> Is it still advisable to pursue UIMA or can some one pls advise something
> else to check on related to SOLR and NLP?
> 
> Thanks!
> Mark
> 
> 
> -- Forwarded message --
> From: Mark Robinson 
> Date: Wed, Mar 28, 2018 at 2:21 PM
> Subject: UIMA-SOLR integration
> To: solr-user@lucene.apache.org
> 
> 
> Hi,
> 
> I was trying to integrate UIMA into SOLR following the solr docs and many
> other hints on the net.
> While trying to get a VALID_ALCHEMYAPI_KEY I contacted IBM support and got
> the following advice:-
> 
> "As announced a year a go the Alchemy Service was scheduled and shutdown on
> March 7th, 2018, and is no longer supported.  The AlchemAPI services was
> broken down into three other services where AlchemyLanguage has been
> replaced by Natural Language Understanding, AlchemyVision by Visual
> Recognition, and AlchemyDataNews by Discovery News.  The suggestion is to
> migrated to the respective merged service in order to be able to take
> advantage of the features."
> 
> Could someone please share any other suggestions instead of having to
> use ALCHEMYAPI so that I can still continue with my work.
> 
> Note:- I already commented out OPENCALAIS references in
> OverridingParamsExtServicesAE.xml as I was getting error with OPEN
> CALAIS so was relying only on AlchemyAPI only.
> 
> Any immediate help is greatly appreciated!
> 
> Thanks!
> 
> Mark



Re: Solr 7.2 cannot see all running nodes

2018-03-29 Thread Abhi Basu
Also, another question, where it says to copy the zoo.cfg from
/solr72/server/solr folder to /solr72/server/solr/node1/solr, should I
actually be grabbing the zoo.cfg from one of my external zk nodes?

Thanks,

Abhi

On Thu, Mar 29, 2018 at 1:04 PM, Abhi Basu <9000r...@gmail.com> wrote:

> Ok, will give it a try along with the host name.
>
>
> On Thu, Mar 29, 2018 at 12:20 PM, Webster Homer 
> wrote:
>
>> This Zookeeper ensemble doesn't look right.
>> >
>> > ./bin/solr start -cloud -s /usr/local/bin/solr-7.2.1/server/solr/node1/
>> -p
>> > 8983 -z zk0-esohad,zk1-esohad,zk3-esohad:2181 -m 8g
>>
>>
>> Shouldn't the zookeeper ensemble be specified as:
>>   zk0-esohad:2181,zk1-esohad:2181,zk3-esohad:2181
>>
>> You should put the zookeeper port on each node in the comma separated
>> list.
>> I don't know if this is your problem, but I think your solr nodes will
>> only
>> be connecting to 1 zookeeper
>>
>> On Thu, Mar 29, 2018 at 10:56 AM, Walter Underwood > >
>> wrote:
>>
>> > I had that problem. Very annoying and we probably should require special
>> > flag to use localhost.
>> >
>> > We need to start solr like this:
>> >
>> > ./solr start -c -h `hostname`
>> >
>> > If anybody ever forgets, we get a 127.0.0.1 node that shows down in
>> > cluster status. No idea how to get rid of that.
>> >
>> > wunder
>> > Walter Underwood
>> > wun...@wunderwood.org
>> > http://observer.wunderwood.org/  (my blog)
>> >
>> > > On Mar 29, 2018, at 7:46 AM, Shawn Heisey 
>> wrote:
>> > >
>> > > On 3/29/2018 8:25 AM, Abhi Basu wrote:
>> > >> "Operation create caused
>> > >> exception:":"org.apache.solr.common.SolrException:org.
>> > apache.solr.common.SolrException:
>> > >> Cannot create collection ems-collection. Value of maxShardsPerNode
>> is 1,
>> > >> and the number of nodes currently live or live and part of your
>> > >
>> > > I'm betting that all your nodes are registering themselves with the
>> same
>> > name, and that name is probably either 127.0.0.1 or 127.1.1.0 -- an
>> address
>> > on the loopback interface.
>> > >
>> > > Usually this problem (on an OS other than Windows, at least) is caused
>> > by an incorrect /etc/hosts file that maps your hostname to a  loopback
>> > address instead of a real address.
>> > >
>> > > You can override the value that SolrCloud uses to register itself into
>> > zookeeper so it doesn't depend on the OS configuration.  In solr.in.sh,
>> I
>> > think this is the SOLR_HOST variable, which gets translated into
>> -Dhost=XXX
>> > on the java commandline.  It can also be configured in solr.xml.
>> > >
>> > > Thanks,
>> > > Shawn
>> > >
>> >
>> >
>>
>> --
>>
>>
>> This message and any attachment are confidential and may be privileged or
>> otherwise protected from disclosure. If you are not the intended
>> recipient,
>> you must not copy this message or attachment or disclose the contents to
>> any other person. If you have received this transmission in error, please
>> notify the sender immediately and delete the message and any attachment
>> from your system. Merck KGaA, Darmstadt, Germany and any of its
>> subsidiaries do not accept liability for any omissions or errors in this
>> message which may arise as a result of E-Mail-transmission or for damages
>> resulting from any unauthorized changes of the content of this message and
>> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
>> subsidiaries do not guarantee that this message is free of viruses and
>> does
>> not accept liability for any damages caused by any virus transmitted
>> therewith.
>>
>> Click http://www.emdgroup.com/disclaimer to access the German, French,
>> Spanish and Portuguese versions of this disclaimer.
>>
>
>
>
> --
> Abhi Basu
>



-- 
Abhi Basu


Re: Solr 7.2 cannot see all running nodes

2018-03-29 Thread Abhi Basu
Ok, will give it a try along with the host name.


On Thu, Mar 29, 2018 at 12:20 PM, Webster Homer 
wrote:

> This Zookeeper ensemble doesn't look right.
> >
> > ./bin/solr start -cloud -s /usr/local/bin/solr-7.2.1/server/solr/node1/
> -p
> > 8983 -z zk0-esohad,zk1-esohad,zk3-esohad:2181 -m 8g
>
>
> Shouldn't the zookeeper ensemble be specified as:
>   zk0-esohad:2181,zk1-esohad:2181,zk3-esohad:2181
>
> You should put the zookeeper port on each node in the comma separated list.
> I don't know if this is your problem, but I think your solr nodes will only
> be connecting to 1 zookeeper
>
> On Thu, Mar 29, 2018 at 10:56 AM, Walter Underwood 
> wrote:
>
> > I had that problem. Very annoying and we probably should require special
> > flag to use localhost.
> >
> > We need to start solr like this:
> >
> > ./solr start -c -h `hostname`
> >
> > If anybody ever forgets, we get a 127.0.0.1 node that shows down in
> > cluster status. No idea how to get rid of that.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> > > On Mar 29, 2018, at 7:46 AM, Shawn Heisey  wrote:
> > >
> > > On 3/29/2018 8:25 AM, Abhi Basu wrote:
> > >> "Operation create caused
> > >> exception:":"org.apache.solr.common.SolrException:org.
> > apache.solr.common.SolrException:
> > >> Cannot create collection ems-collection. Value of maxShardsPerNode is
> 1,
> > >> and the number of nodes currently live or live and part of your
> > >
> > > I'm betting that all your nodes are registering themselves with the
> same
> > name, and that name is probably either 127.0.0.1 or 127.1.1.0 -- an
> address
> > on the loopback interface.
> > >
> > > Usually this problem (on an OS other than Windows, at least) is caused
> > by an incorrect /etc/hosts file that maps your hostname to a  loopback
> > address instead of a real address.
> > >
> > > You can override the value that SolrCloud uses to register itself into
> > zookeeper so it doesn't depend on the OS configuration.  In solr.in.sh,
> I
> > think this is the SOLR_HOST variable, which gets translated into
> -Dhost=XXX
> > on the java commandline.  It can also be configured in solr.xml.
> > >
> > > Thanks,
> > > Shawn
> > >
> >
> >
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.
>



-- 
Abhi Basu


Re: New 7.2.1 install on linux; "permission denied" on exec?

2018-03-29 Thread Shawn Heisey
On 3/28/2018 4:15 PM, hal...@xsmail.com wrote:
>   cd /home/test/
>   wget http://apache.osuosl.org/lucene/solr/7.2.1/solr-7.2.1.tgz
>   tar zxvf ./solr-7.2.1.tgz
>
>   id solr; grep solr /etc/passwd
>   uid=485(solr) gid=482(solr) groups=482(solr),100(users)
>   solr:x:485:482::/var/solr:/bin/sh
>
>   cd /home/test/solr-7.2.1
>
>   ./bin/install_solr_service.sh \
>/home/test/solr-7.2.1.tgz \
>-n \
>-i /opt/solr \
>-d /var/solr \
>-s solr \
>-u solr \
>-p 

Looks fine.  It's a little odd to be changing the install location to
/opt/solr instead of /opt ... but if that's what you really want, it
won't cause any issues.

>   chown -R solr:solr /opt/solr

Why are you doing this step?  Those files are *MEANT* to be owned by
root.  The solr user has no need to write to files in that location. 
(Changing permissions in this way is unlikely to hurt anything, but
isn't at all necessary)

>   Mar 28 14:42:21 test.loc solr[7458]: -sh: 
> /opt/solr/solr/bin/solr: Permission denied

Try the following as a troubleshooting step.  Either log in as "solr" or
use the following command as root to change users:

su - solr

Then run this command:

/usr/bin/env bash -x /opt/solr/solr/bin/solr

What I'm hoping that will do is either output a better error message, or
output some longer data that can hopefully pinpoint the problem.  If
that command actually works, then you MIGHT be able to get something
helpful by editing that script and adding " -x" (without the quotes, but
WITH the space) to the end of the first line.  After that edit, try
running the init script again.

What specific distribution and version of Linux are you running?  The
output of "lsb_release -a" and "uname -a" can be very useful to answer
this question with a lot of detail.

What shell does the solr user have?  This is the last entry on the
user's line in /etc/passwd.  That user will need a REAL shell --
/bin/false and other things that prevent login will NOT work.

Thanks,
Shawn



Re: Solr 7.2 cannot see all running nodes

2018-03-29 Thread Webster Homer
This Zookeeper ensemble doesn't look right.
>
> ./bin/solr start -cloud -s /usr/local/bin/solr-7.2.1/server/solr/node1/ -p
> 8983 -z zk0-esohad,zk1-esohad,zk3-esohad:2181 -m 8g


Shouldn't the zookeeper ensemble be specified as:
  zk0-esohad:2181,zk1-esohad:2181,zk3-esohad:2181

You should put the zookeeper port on each node in the comma separated list.
I don't know if this is your problem, but I think your solr nodes will only
be connecting to 1 zookeeper

On Thu, Mar 29, 2018 at 10:56 AM, Walter Underwood 
wrote:

> I had that problem. Very annoying and we probably should require special
> flag to use localhost.
>
> We need to start solr like this:
>
> ./solr start -c -h `hostname`
>
> If anybody ever forgets, we get a 127.0.0.1 node that shows down in
> cluster status. No idea how to get rid of that.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Mar 29, 2018, at 7:46 AM, Shawn Heisey  wrote:
> >
> > On 3/29/2018 8:25 AM, Abhi Basu wrote:
> >> "Operation create caused
> >> exception:":"org.apache.solr.common.SolrException:org.
> apache.solr.common.SolrException:
> >> Cannot create collection ems-collection. Value of maxShardsPerNode is 1,
> >> and the number of nodes currently live or live and part of your
> >
> > I'm betting that all your nodes are registering themselves with the same
> name, and that name is probably either 127.0.0.1 or 127.1.1.0 -- an address
> on the loopback interface.
> >
> > Usually this problem (on an OS other than Windows, at least) is caused
> by an incorrect /etc/hosts file that maps your hostname to a  loopback
> address instead of a real address.
> >
> > You can override the value that SolrCloud uses to register itself into
> zookeeper so it doesn't depend on the OS configuration.  In solr.in.sh, I
> think this is the SOLR_HOST variable, which gets translated into -Dhost=XXX
> on the java commandline.  It can also be configured in solr.xml.
> >
> > Thanks,
> > Shawn
> >
>
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Re: Using Solr to build a product matcher, with learning to rank

2018-03-29 Thread Rahul Singh
Maybe overthinking this. There is a “more like this” feature at basically does 
this. Give that a try before digging deeper into the LTR methods. It may be 
good enough for rock and roll.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Mar 28, 2018, 12:25 PM -0400, Xavier Schepler 
, wrote:
> Hello,
>
> I'm considering using Solr with learning to rank to build a product matcher.
> For example, it should match the titles:
> - Apple iPhone 6 16 Gb,
> - iPhone 6 16 Gb,
> - Smartphone IPhone 6 16 Gb,
> - iPhone 6 black 16 Gb,
> to the same internal reference, an unique identifier.
>
> With Solr, each document would then have a field for the product title and
> one for its class, which is the unique identifier of the product.
> Solr would then be used to perform matching as follows.
>
> 1. A search is performed with a given product title.
> 2. The first three results are considered (this requires an initial
> product title database).
> 3. The most frequent identifier is returned.
>
> This method corresponds roughly to a k-Nearest Neighbor approach with the
> cosine metric, k = 3, and a TF-IDF model.
>
> I've done some preliminary tests with Sci-kit learn and the results are
> good, but not as good as the ones of more sophisticated learning algorithms.
>
> Then, I noticed that there exists learning to rank with Solr.
>
> First, do you think that such an use of Solr makes sense?
> Second, is there a relatively simple way to build a learning model using a
> sparse representation of the query TF-IDF vector?
>
> Kind regards,
>
> Xavier Schepler


Re: Add remote ip address in solr log

2018-03-29 Thread Rick Leir
Vince
Something as simple as an Apache proxypass would help, then your Apache log 
would tell you.
Cheers -- Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Solr 7.2 cannot see all running nodes

2018-03-29 Thread Walter Underwood
I had that problem. Very annoying and we probably should require special flag 
to use localhost.

We need to start solr like this:

./solr start -c -h `hostname`

If anybody ever forgets, we get a 127.0.0.1 node that shows down in cluster 
status. No idea how to get rid of that.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 29, 2018, at 7:46 AM, Shawn Heisey  wrote:
> 
> On 3/29/2018 8:25 AM, Abhi Basu wrote:
>> "Operation create caused
>> exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
>> Cannot create collection ems-collection. Value of maxShardsPerNode is 1,
>> and the number of nodes currently live or live and part of your
> 
> I'm betting that all your nodes are registering themselves with the same 
> name, and that name is probably either 127.0.0.1 or 127.1.1.0 -- an address 
> on the loopback interface.
> 
> Usually this problem (on an OS other than Windows, at least) is caused by an 
> incorrect /etc/hosts file that maps your hostname to a  loopback address 
> instead of a real address.
> 
> You can override the value that SolrCloud uses to register itself into 
> zookeeper so it doesn't depend on the OS configuration.  In solr.in.sh, I 
> think this is the SOLR_HOST variable, which gets translated into -Dhost=XXX 
> on the java commandline.  It can also be configured in solr.xml.
> 
> Thanks,
> Shawn
> 



Re: MatchMode in Dismax parser

2018-03-29 Thread Shawn Heisey

On 3/29/2018 1:42 AM, iamluckysharma.0...@gmail.com wrote:

Just a suggestion , Shouldn't we need to use Math.round instead of direct int 
when watch mode is in %,
example i have 3 boolean clauses if i go for mm=50%, currently it reduce it to 
~1, instead it can be ~2,

another example could be when we have 5 boolean clauses and mm=75%, we get calc 
as 3.75 currently it took 3, so instead of 3 it should have taken 4. as 
Math.round()


Maybe that is what it SHOULD do, but in most languages, converting a 
float value to an integer truncates the decimal portion, it doesn't 
round.  To do that requires a deliberate choice in the code, and that 
probably doesn't exist in dismax/edismax.  If your assertion is that 
this should have been done from day one, I'd say you're right.  But that 
decision is now ancient history.  The person who wrote the code might 
have had a very good reason to NOT do it that way.


At this point, if the functionality were changed, it would result in an 
upgraded Solr version behaving VERY differently than the previous 
version.  While new functionality is often added in any new minor 
release, changing existing behavior that users rely on without a 
configuration option is usually only done in a major version.  So for 
7.x, dismax/edismax would need an option to enable rounding on 
minimum-should-match calculations.  Sounds like a great feature request 
to put into Jira, and patches are always welcome.


Thanks,
Shawn



RE: Query redg : diacritics in keyword search

2018-03-29 Thread Paul, Lulu
Thanks Peter, Charlie, Shawn
Makes perfect sense now. I had missed out the tokenizer from index, was present 
only in the query. Got rid of the preserveOriginal.

Thanks & Best Regards,
Lulu Paul

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: 29 March 2018 15:21
To: solr-user@lucene.apache.org
Subject: Re: Query redg : diacritics in keyword search

On 3/29/2018 5:02 AM, Paul, Lulu wrote:
> The keyword search Carré  returns values Carré and Carre (this works
> well as I added the tokenizer  class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/> in
> the schema config to enable returning of both sets of values)
>
> Now looks like we want Carre to return both Carré and Carre (and this dosen’t 
> work. Solr only returns Carre) – any ideas on how this scenario can be 
> achieved?

Charlie Hull has hit the nail on the head regarding searching.  I actually 
would remove the preserveOriginal flag from that filter.  If the filter is run 
at both index and query time, you don't need preserveOriginal.

If you're talking about what's displayed in your search results, that is 
completely unaffected by analysis.  Analysis only affects queries and the data 
that goes into the 'indexed="true"' part of the index.  Search
*results* are almost always exactly what was sent to Solr.

There is UpdateProcessor functionality that can sit between the values sent to 
Solr and what actually goes into stored/indexed/docValues. Things that happen 
during update processing ARE visible in search results.

https://lucene.apache.org/solr/guide/6_6/update-request-processors.html

Thanks,
Shawn



**
Experience the British Library online at www.bl.uk
The British Library’s latest Annual Report and Accounts : 
www.bl.uk/aboutus/annrep/index.html
Help the British Library conserve the world's knowledge. Adopt a Book. 
www.bl.uk/adoptabook
The Library's St Pancras site is WiFi - enabled
*
The information contained in this e-mail is confidential and may be legally 
privileged. It is intended for the addressee(s) only. If you are not the 
intended recipient, please delete this e-mail and notify the 
postmas...@bl.uk : The contents of this e-mail must 
not be disclosed or copied without the sender's consent.
The statements and opinions expressed in this message are those of the author 
and do not necessarily reflect those of the British Library. The British 
Library does not take any responsibility for the views of the author.
*
Think before you print


Re: Solr 7.2 cannot see all running nodes

2018-03-29 Thread Abhi Basu
So, in the solr.xml on each node should I set the host to the actual host
name?



  

${host:}
${jetty.port:8983}
${hostContext:solr}

${genericCoreNodeNames:true}

${zkClientTimeout:3}
${distribUpdateSoTimeout:60}
${distribUpdateConnTimeout:6}
${zkCredentialsProvider:org.apache.solr.common.cloud.DefaultZkCredentialsProvider}
${zkACLProvider:org.apache.solr.common.cloud.DefaultZkACLProvider}

  


On Thu, Mar 29, 2018 at 9:46 AM, Shawn Heisey  wrote:

> On 3/29/2018 8:25 AM, Abhi Basu wrote:
>
>> "Operation create caused
>> exception:":"org.apache.solr.common.SolrException:org.apache
>> .solr.common.SolrException:
>> Cannot create collection ems-collection. Value of maxShardsPerNode is 1,
>> and the number of nodes currently live or live and part of your
>>
>
> I'm betting that all your nodes are registering themselves with the same
> name, and that name is probably either 127.0.0.1 or 127.1.1.0 -- an address
> on the loopback interface.
>
> Usually this problem (on an OS other than Windows, at least) is caused by
> an incorrect /etc/hosts file that maps your hostname to a  loopback address
> instead of a real address.
>
> You can override the value that SolrCloud uses to register itself into
> zookeeper so it doesn't depend on the OS configuration.  In solr.in.sh, I
> think this is the SOLR_HOST variable, which gets translated into -Dhost=XXX
> on the java commandline.  It can also be configured in solr.xml.
>
> Thanks,
> Shawn
>
>


-- 
Abhi Basu


Re: Solr 7.2 cannot see all running nodes

2018-03-29 Thread Shawn Heisey

On 3/29/2018 8:25 AM, Abhi Basu wrote:

"Operation create caused
exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Cannot create collection ems-collection. Value of maxShardsPerNode is 1,
and the number of nodes currently live or live and part of your


I'm betting that all your nodes are registering themselves with the same 
name, and that name is probably either 127.0.0.1 or 127.1.1.0 -- an 
address on the loopback interface.


Usually this problem (on an OS other than Windows, at least) is caused 
by an incorrect /etc/hosts file that maps your hostname to a  loopback 
address instead of a real address.


You can override the value that SolrCloud uses to register itself into 
zookeeper so it doesn't depend on the OS configuration.  In solr.in.sh, 
I think this is the SOLR_HOST variable, which gets translated into 
-Dhost=XXX on the java commandline.  It can also be configured in solr.xml.


Thanks,
Shawn



Re: Solr 7.2 cannot see all running nodes

2018-03-29 Thread Abhi Basu
Yes, only showing one live node on admin site.

Checking zk logs.

Thanks,

Abhi

On Thu, Mar 29, 2018 at 9:32 AM, Ganesh Sethuraman 
wrote:

> may be you can check int he Admin UI --> Cloud --> Tree --> /live_nodes. To
> see the list of live nodes before running. If it is less than what you
> expected, check the Zoo keeper logs? or make sure connectivity between the
> shards and zookeeper.
>
> On Thu, Mar 29, 2018 at 10:25 AM, Abhi Basu <9000r...@gmail.com> wrote:
>
> > What am I missing? I used the following instructions
> > http://blog.thedigitalgroup.com/susheelk/2015/08/03/
> > solrcloud-2-nodes-solr-1-node-zk-setup/#comment-4321
> > on 4  nodes. The only difference is I have 3 external zk servers. So this
> > is how I am starting each solr node:
> >
> > ./bin/solr start -cloud -s /usr/local/bin/solr-7.2.1/server/solr/node1/
> -p
> > 8983 -z zk0-esohad,zk1-esohad,zk3-esohad:2181 -m 8g
> >
> > They all run without any errors, but when trying to create a collection
> > with 2S/2R, I get an error saying only one node is running.
> >
> > ./server/scripts/cloud-scripts/zkcli.sh -zkhost
> > zk0-esohad,zk1-esohad,zk3-esohad:2181 -cmd upconfig -confname
> > ems-collection -confdir
> > /usr/local/bin/solr-7.2.1/server/solr/configsets/ems-
> > collection-72_configs/conf
> >
> >
> > "Operation create caused
> > exception:":"org.apache.solr.common.SolrException:org.
> apache.solr.common.
> > SolrException:
> > Cannot create collection ems-collection. Value of maxShardsPerNode is 1,
> > and the number of nodes currently live or live and part of your
> > createNodeSet is 1. This allows a maximum of 1 to be created. Value of
> > numShards is 2, value of nrtReplicas is 2, value of tlogReplicas is 0 and
> > value of pullReplicas is 0. This requires 4 shards to be created (higher
> > than the allowed number)",
> >
> >
> > Any ideas?
> >
> > Thanks,
> >
> > Abhi
> >
> > --
> > Abhi Basu
> >
>



-- 
Abhi Basu


MatchMode in Dismax parser

2018-03-29 Thread iamluckysharma . 0910
Just a suggestion , Shouldn't we need to use Math.round instead of direct int 
when watch mode is in %, 
example i have 3 boolean clauses if i go for mm=50%, currently it reduce it to 
~1, instead it can be ~2, 

another example could be when we have 5 boolean clauses and mm=75%, we get calc 
as 3.75 currently it took 3, so instead of 3 it should have taken 4. as 
Math.round() 


Re: Solr 7.2 cannot see all running nodes

2018-03-29 Thread Ganesh Sethuraman
may be you can check int he Admin UI --> Cloud --> Tree --> /live_nodes. To
see the list of live nodes before running. If it is less than what you
expected, check the Zoo keeper logs? or make sure connectivity between the
shards and zookeeper.

On Thu, Mar 29, 2018 at 10:25 AM, Abhi Basu <9000r...@gmail.com> wrote:

> What am I missing? I used the following instructions
> http://blog.thedigitalgroup.com/susheelk/2015/08/03/
> solrcloud-2-nodes-solr-1-node-zk-setup/#comment-4321
> on 4  nodes. The only difference is I have 3 external zk servers. So this
> is how I am starting each solr node:
>
> ./bin/solr start -cloud -s /usr/local/bin/solr-7.2.1/server/solr/node1/ -p
> 8983 -z zk0-esohad,zk1-esohad,zk3-esohad:2181 -m 8g
>
> They all run without any errors, but when trying to create a collection
> with 2S/2R, I get an error saying only one node is running.
>
> ./server/scripts/cloud-scripts/zkcli.sh -zkhost
> zk0-esohad,zk1-esohad,zk3-esohad:2181 -cmd upconfig -confname
> ems-collection -confdir
> /usr/local/bin/solr-7.2.1/server/solr/configsets/ems-
> collection-72_configs/conf
>
>
> "Operation create caused
> exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.
> SolrException:
> Cannot create collection ems-collection. Value of maxShardsPerNode is 1,
> and the number of nodes currently live or live and part of your
> createNodeSet is 1. This allows a maximum of 1 to be created. Value of
> numShards is 2, value of nrtReplicas is 2, value of tlogReplicas is 0 and
> value of pullReplicas is 0. This requires 4 shards to be created (higher
> than the allowed number)",
>
>
> Any ideas?
>
> Thanks,
>
> Abhi
>
> --
> Abhi Basu
>


Solr 7.2 cannot see all running nodes

2018-03-29 Thread Abhi Basu
What am I missing? I used the following instructions
http://blog.thedigitalgroup.com/susheelk/2015/08/03/solrcloud-2-nodes-solr-1-node-zk-setup/#comment-4321
on 4  nodes. The only difference is I have 3 external zk servers. So this
is how I am starting each solr node:

./bin/solr start -cloud -s /usr/local/bin/solr-7.2.1/server/solr/node1/ -p
8983 -z zk0-esohad,zk1-esohad,zk3-esohad:2181 -m 8g

They all run without any errors, but when trying to create a collection
with 2S/2R, I get an error saying only one node is running.

./server/scripts/cloud-scripts/zkcli.sh -zkhost
zk0-esohad,zk1-esohad,zk3-esohad:2181 -cmd upconfig -confname
ems-collection -confdir
/usr/local/bin/solr-7.2.1/server/solr/configsets/ems-collection-72_configs/conf


"Operation create caused
exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Cannot create collection ems-collection. Value of maxShardsPerNode is 1,
and the number of nodes currently live or live and part of your
createNodeSet is 1. This allows a maximum of 1 to be created. Value of
numShards is 2, value of nrtReplicas is 2, value of tlogReplicas is 0 and
value of pullReplicas is 0. This requires 4 shards to be created (higher
than the allowed number)",


Any ideas?

Thanks,

Abhi

-- 
Abhi Basu


Re: Query redg : diacritics in keyword search

2018-03-29 Thread Shawn Heisey

On 3/29/2018 5:02 AM, Paul, Lulu wrote:

The keyword search Carré  returns values Carré and Carre (this works well as I added the tokenizer 
 in the 
schema config to enable returning of both sets of values)

Now looks like we want Carre to return both Carré and Carre (and this dosen’t 
work. Solr only returns Carre) – any ideas on how this scenario can be achieved?


Charlie Hull has hit the nail on the head regarding searching.  I 
actually would remove the preserveOriginal flag from that filter.  If 
the filter is run at both index and query time, you don't need 
preserveOriginal.


If you're talking about what's displayed in your search results, that is 
completely unaffected by analysis.  Analysis only affects queries and 
the data that goes into the 'indexed="true"' part of the index.  Search 
*results* are almost always exactly what was sent to Solr.


There is UpdateProcessor functionality that can sit between the values 
sent to Solr and what actually goes into stored/indexed/docValues.  
Things that happen during update processing ARE visible in search results.


https://lucene.apache.org/solr/guide/6_6/update-request-processors.html

Thanks,
Shawn



Re: Query redg : diacritics in keyword search

2018-03-29 Thread Charlie Hull

On 29/03/2018 14:12, Peter Lancaster wrote:

Hi,

You don't say whether the AsciiFolding filter is at index time or query time. 
In any case you can easily look at what's happening using the admin analysis 
tool which helpfully will even highlight where the analysed query and index 
token match.

That said I'd expect what you want to work if you simply use  on both index and query.


Simply put:

You use the filter at indexing time to collapse any variants of a term 
into a single variant, which is then stored in your index.


You use the filter at query time to collapse any variants of a term that 
users type into a single variant, and if this exists in your index you 
get a match.


If you don't use the same filter at both ends you won't get a match.

Cheers

Charlie



Cheers,
Peter.

-Original Message-
From: Paul, Lulu [mailto:lulu.p...@bl.uk]
Sent: 29 March 2018 12:03
To: solr-user@lucene.apache.org
Subject: Query redg : diacritics in keyword search

Hi,

The keyword search Carré  returns values Carré and Carre (this works well as I added the tokenizer 
 in the 
schema config to enable returning of both sets of values)

Now looks like we want Carre to return both Carré and Carre (and this dosen’t 
work. Solr only returns Carre) – any ideas on how this scenario can be achieved?

Thanks & Best Regards,
Lulu Paul



**
Experience the British Library online at www.bl.uk The British 
Library’s latest Annual Report and Accounts : 
www.bl.uk/aboutus/annrep/index.html
Help the British Library conserve the world's knowledge. Adopt a Book. 
www.bl.uk/adoptabook
The Library's St Pancras site is WiFi - enabled
*
The information contained in this e-mail is confidential and may be legally 
privileged. It is intended for the addressee(s) only. If you are not the intended 
recipient, please delete this e-mail and notify the 
postmas...@bl.uk : The contents of this e-mail must 
not be disclosed or copied without the sender's consent.
The statements and opinions expressed in this message are those of the author 
and do not necessarily reflect those of the British Library. The British 
Library does not take any responsibility for the views of the author.
*
Think before you print


This message is confidential and may contain privileged information. You should 
not disclose its contents to any other person. If you are not the intended 
recipient, please notify the sender named above immediately. It is expressly 
declared that this e-mail does not constitute nor form part of a contract or 
unilateral obligation. Opinions, conclusions and other information in this 
message that do not relate to the official business of findmypast shall be 
understood as neither given nor endorsed by it.


__

This email has been checked for virus and other malicious content prior to 
leaving our network.
__




--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


RE: Query redg : diacritics in keyword search

2018-03-29 Thread Peter Lancaster
Hi,

You don't say whether the AsciiFolding filter is at index time or query time. 
In any case you can easily look at what's happening using the admin analysis 
tool which helpfully will even highlight where the analysed query and index 
token match.

That said I'd expect what you want to work if you simply use  on both index and query.

Cheers,
Peter.

-Original Message-
From: Paul, Lulu [mailto:lulu.p...@bl.uk]
Sent: 29 March 2018 12:03
To: solr-user@lucene.apache.org
Subject: Query redg : diacritics in keyword search

Hi,

The keyword search Carré  returns values Carré and Carre (this works well as I 
added the tokenizer  in the schema config to enable returning of both sets 
of values)

Now looks like we want Carre to return both Carré and Carre (and this dosen’t 
work. Solr only returns Carre) – any ideas on how this scenario can be achieved?

Thanks & Best Regards,
Lulu Paul



**
Experience the British Library online at www.bl.uk The 
British Library’s latest Annual Report and Accounts : 
www.bl.uk/aboutus/annrep/index.html
Help the British Library conserve the world's knowledge. Adopt a Book. 
www.bl.uk/adoptabook
The Library's St Pancras site is WiFi - enabled
*
The information contained in this e-mail is confidential and may be legally 
privileged. It is intended for the addressee(s) only. If you are not the 
intended recipient, please delete this e-mail and notify the 
postmas...@bl.uk : The contents of this e-mail must 
not be disclosed or copied without the sender's consent.
The statements and opinions expressed in this message are those of the author 
and do not necessarily reflect those of the British Library. The British 
Library does not take any responsibility for the views of the author.
*
Think before you print


This message is confidential and may contain privileged information. You should 
not disclose its contents to any other person. If you are not the intended 
recipient, please notify the sender named above immediately. It is expressly 
declared that this e-mail does not constitute nor form part of a contract or 
unilateral obligation. Opinions, conclusions and other information in this 
message that do not relate to the official business of findmypast shall be 
understood as neither given nor endorsed by it.


__

This email has been checked for virus and other malicious content prior to 
leaving our network.
__


Query redg : diacritics in keyword search

2018-03-29 Thread Paul, Lulu
Hi,

The keyword search Carré  returns values Carré and Carre (this works well as I 
added the tokenizer  in the schema config to enable returning of both sets 
of values)

Now looks like we want Carre to return both Carré and Carre (and this dosen’t 
work. Solr only returns Carre) – any ideas on how this scenario can be achieved?

Thanks & Best Regards,
Lulu Paul



**
Experience the British Library online at www.bl.uk
The British Library’s latest Annual Report and Accounts : 
www.bl.uk/aboutus/annrep/index.html
Help the British Library conserve the world's knowledge. Adopt a Book. 
www.bl.uk/adoptabook
The Library's St Pancras site is WiFi - enabled
*
The information contained in this e-mail is confidential and may be legally 
privileged. It is intended for the addressee(s) only. If you are not the 
intended recipient, please delete this e-mail and notify the 
postmas...@bl.uk : The contents of this e-mail must 
not be disclosed or copied without the sender's consent.
The statements and opinions expressed in this message are those of the author 
and do not necessarily reflect those of the British Library. The British 
Library does not take any responsibility for the views of the author.
*
Think before you print


Fwd: UIMA-SOLR integration

2018-03-29 Thread Mark Robinson
Hi All,

Is it still advisable to pursue UIMA or can some one pls advise something
else to check on related to SOLR and NLP?

Thanks!
Mark


-- Forwarded message --
From: Mark Robinson 
Date: Wed, Mar 28, 2018 at 2:21 PM
Subject: UIMA-SOLR integration
To: solr-user@lucene.apache.org


Hi,

I was trying to integrate UIMA into SOLR following the solr docs and many
other hints on the net.
While trying to get a VALID_ALCHEMYAPI_KEY I contacted IBM support and got
the following advice:-

"As announced a year a go the Alchemy Service was scheduled and shutdown on
March 7th, 2018, and is no longer supported.  The AlchemAPI services was
broken down into three other services where AlchemyLanguage has been
replaced by Natural Language Understanding, AlchemyVision by Visual
Recognition, and AlchemyDataNews by Discovery News.  The suggestion is to
migrated to the respective merged service in order to be able to take
advantage of the features."

Could someone please share any other suggestions instead of having to
use ALCHEMYAPI so that I can still continue with my work.

Note:- I already commented out OPENCALAIS references in
OverridingParamsExtServicesAE.xml as I was getting error with OPEN
CALAIS so was relying only on AlchemyAPI only.

Any immediate help is greatly appreciated!

Thanks!

Mark