date:20180711

Re: LTR features not found in Solr after uploading

2018-07-11 Thread Zheng Lin Edwin Yeo

This is the only output that I get when I tried to run the URL
http://localhost:8983/solr/techproducts/schema/feature-store
{

  "responseHeader":{
"status":0,
"QTime":0},
  "featureStores":["_DEFAULT_"]}


Regards,
Edwin

On 11 July 2018 at 23:15, Zheng Lin Edwin Yeo  wrote:

> Hi,
>
> I am setting up the Learing to Rank (LTR) in Solr 7.4.0, and I am
> following the steps from the guide from https://lucene.apache.
> org/solr/guide/7_4/learning-to-rank.html
>
> However, after uploading the features file (myFeatures.json) using curl
> with the same structure as the myFeatures.json in the example, I am not
> able to find any of the features when I tried to run the URL
> http://localhost:8983/solr/techproducts/schema/feature-store
>
> Any idea what could be the issue here? There is no error message or
> anything.
>
> Regards,
> Edwin
>

solr config ganglia reporter encounter an exception

2018-07-11 Thread zhenyuan wei

Hi all,

My solr version is  release7.3.1,  and I follow the solr 7.3.0 ref guide to
config  ganglia
reporter in  solr.xml as below:


  ..
  

  emr-header-1
  8649

  


 than start solr service and encounted the  execption like:

2018-07-11 17:47:31.246 ERROR (main) [   ] o.a.s.s.SolrDispatchFilter Could
not start Solr. Check solr/home property and the logs
2018-07-11 17:47:31.266 ERROR (main) [   ] o.a.s.c.SolrCore
null:java.lang.NoClassDefFoundError: org/acplt/oncrpc/XdrEncodingStream
at info.ganglia.gmetric4j.gmetric.GMetric.(GMetric.java:82)
at info.ganglia.gmetric4j.gmetric.GMetric.(GMetric.java:58)
at info.ganglia.gmetric4j.gmetric.GMetric.(GMetric.java:40)
at
org.apache.solr.metrics.reporters.SolrGangliaReporter.lambda$start$0(SolrGangliaReporter.java:106)
at
org.apache.solr.metrics.reporters.ReporterClientCache.getOrCreate(ReporterClientCache.java:59)
at
org.apache.solr.metrics.reporters.SolrGangliaReporter.start(SolrGangliaReporter.java:106)
at
org.apache.solr.metrics.reporters.SolrGangliaReporter.doInit(SolrGangliaReporter.java:85)
at
org.apache.solr.metrics.SolrMetricReporter.init(SolrMetricReporter.java:70)
at
org.apache.solr.metrics.SolrMetricManager.loadReporter(SolrMetricManager.java:881)
at
org.apache.solr.metrics.SolrMetricManager.loadReporters(SolrMetricManager.java:817)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:546)
at
org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:263)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:183)
at
org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:139)
at
org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:741)
at
org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:348)
at
org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1515)
at
org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1477)
at
org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:785)
at
org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:261)
at
org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:545)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at
org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:41)
at
org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:188)
at
org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:502)
at
org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:150)
at
org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:180)
at
org.eclipse.jetty.deploy.providers.WebAppProvider.fileAdded(WebAppProvider.java:453)
at
org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:64)
at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:610)
at
org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:529)
at org.eclipse.jetty.util.Scanner.scan(Scanner.java:392)
at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:313)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at
org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:150)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at
org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:564)
at
org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:239)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
   at
org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:133)
at org.eclipse.jetty.server.Server.start(Server.java:418)
at
org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:115)
at
org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:113)
at org.eclipse.jetty.server.Server.doStart(Server.java:385)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at
org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1584)
at
org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1508)
at java.security.AccessController.doPrivileged(Native Method)
at
org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1507)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

Re: Access to Jira

2018-07-11 Thread Erick Erickson

You need to create a JIRA account. That said, I don't believe you can
do things like assign tickets to yourself or the like unless you are a
committer.

That said, you _can_ attach patches, make comments and participate in
the development. What people usually do is
1> add a comment to the JIRA you choose that you want to work on it.
2> it's often useful to ask if any committers are interested, as it'll
take one to actually put your changes in to the code line.
3> discuss the approach you want to use (depending on how complex the
work is, this may be virtually no discussion or one that lasts weeks).
4> attach any patches and ask for review. There has been some work
with ReviewBoard, but that's not used consistently.
5> Committers are often busy so you may need to gently prompt to get
someone to commit the changes.

NOTE: Discussing whether there's any widespread interest in the
feature/fix and/or your approach is usually a good idea for several
reasons.
> it'll save you a lot of work if the initial reply is "that's too special 
> purpose to add to the code base".
> it'll save you a lot of work if someone says "that's already been fixed by 
> SOLR--"
> it'll save you a lot of work if someone says "that's mostly already done, 
> take a look at blah.java"

You can see a theme here ;).

Best,
Erick

On Wed, Jul 11, 2018 at 4:51 PM, Ruchir Choudhry
 wrote:
> https://issues.apache.org/jira/projects/SOLR/issues/SOLR-12128?filter=allopenissues
>
> Pls help me to get access to JIRA so that i can pickup tickets to work on.
>
>
> Thanks in advance,
> Ruchir

Access to Jira

2018-07-11 Thread Ruchir Choudhry

https://issues.apache.org/jira/projects/SOLR/issues/SOLR-12128?filter=allopenissues

Pls help me to get access to JIRA so that i can pickup tickets to work on.


Thanks in advance,
Ruchir

Re: Exception writing document xxxxxx to the index; possible analysis error.

2018-07-11 Thread Tomas Fernandez Lobbe

I Daphne, 
the “possible analysis error” is a misleading error message (to be addressed in 
SOLR-12477). The important piece is the 
“java.lang.ArrayIndexOutOfBoundsException”, it looks like your index may be 
corrupted in some way.

Tomás

> On Jul 11, 2018, at 3:01 PM, Liu, Daphne  wrote:
> 
> Hello Solr Expert,
>   We are using Solr 6.3.0 and lately we are unable to write documents into 
> our index. Please see below error messages. Can anyone help us?
>   Thank you.
> 
> 
> ===
> org.apache.solr.common.SolrException: Exception writing document id 
> 3b8514819e204cc7a110aa5752e29b8e to the index; possible analysis error.
>at 
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:178)
>at 
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
>at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
>at 
> org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:335)
>at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
>at 
> org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
>at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
>at 
> org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
>at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
>at 
> org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
>at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
>at 
> org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
>at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
>at 
> org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:74)
>at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
>at 
> org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
>at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
>at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:957)
>at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1112)
>at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:738)
>at 
> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
>at 
> org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)
>at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:179)
>at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:135)
>at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:275)
>at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)
>at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:240)
>at 
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:158)
>at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:186)
>at 
> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:107)
>at 
> org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:54)
>at 
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
>at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
>at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:153)
>at org.apache.solr.core.SolrCore.execute(SolrCore.java:2213)
>at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
>at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460)
>at 
>

Exception writing document xxxxxx to the index; possible analysis error.

2018-07-11 Thread Liu, Daphne

Hello Solr Expert,
   We are using Solr 6.3.0 and lately we are unable to write documents into our 
index. Please see below error messages. Can anyone help us?
   Thank you.


===
org.apache.solr.common.SolrException: Exception writing document id 
3b8514819e204cc7a110aa5752e29b8e to the index; possible analysis error.
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:178)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:335)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:74)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:957)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1112)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:738)
at 
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
at 
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:179)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:135)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:275)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:240)
at 
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:158)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:186)
at 
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:107)
at 
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:54)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:153)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2213)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:303)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)

SolrCloud and Kubernetes

2018-07-11 Thread Sundar Sivashunmugam

Hi,
  We are interested in setting up SolrCloud in Kubernetes, is there any 
documentation available for similar setup?

Thanks!
Sundar Sivashunmugam

Re: solr filter query on text field

2018-07-11 Thread Erick Erickson

bq.  is there any difference if the fq field is a string field vs test

Absolutely. string fields are not analyzed in any way. They're not
tokenized. There are case sensitive. Etc. For example takd
My dog
as input. A string field will have a _single_ token "My dog.". It will
not match a search on "my". It will not match a search on "dog". It
won't even match "my dog." as a phrase since the case is different. It
won't even match "My dog" because there's no period at the end. It
will only match "My dog.".

As a text field, there would be two tokens, "My" and "dog", and they'd
be massaged however your filters arrange things. With the usual
filters in place (lowerCaseFilter in particular) the tokens in the
index would be "my" and "dog" so searches on "my" would match, "My"
would match, "dog" OR "my" would match

Best,
Erick

On Wed, Jul 11, 2018 at 12:01 PM, Wei  wrote:
> btw, is there any difference if the fq field is a string field vs test
> field?
>
> On Wed, Jul 11, 2018 at 11:59 AM, Wei  wrote:
>
>> Thanks Erick and Andrea!  If my default operator is OR,  fq=
>> my_text_field:(Jurassic park the movie)  is equivalent to 
>> my_text_field:(Jurassic
>> OR park OR the OR movie)? That make sense.
>>
>> On Wed, Jul 11, 2018 at 9:06 AM, Andrea Gazzarini 
>> wrote:
>>
>>> The syntax is valid in all those three examples, the right one depends on
>>> what you need.
>>>
>>> The first query executes a proximity search (you can think to a phrase
>>> search, for simplicity) so it returns no result because probably you don't
>>> have any matching docs with that whole literal.
>>>
>>> The second is querying the my_text_field for all terms which compose the
>>> value between parenthesis. You can think to a query where each term is an
>>> optional clause, something like mytextfield:jurassic OR
>>> mytextfiekd:park...
>>> (it's not exactly an OR but this could give you the idea=
>>>
>>> The third example is not doing what you think. My_text_field is used only
>>> with the first term (Jurassic) while the others are using the default
>>> field. Something like mytextfield:jurassic OR defaultfield:park OR
>>> defaultfield:the That's the reason  you have so many results (I guess
>>> the default field is a catch-all field)
>>>
>>> Sorry for typos I'm using my mobile
>>>
>>> Andrea
>>>
>>> Il mer 11 lug 2018, 17:54 Wei  ha scritto:
>>>
>>> > Hi,
>>> >
>>> > I am running filter query on a field of text_general type and see
>>> > completely different results for the following queries:
>>> >
>>> >fq= my_text_field:"Jurassic park the movie"   returns 0
>>> > result
>>> >
>>> >fq= my_text_field:(Jurassic park the movie)   returns 20
>>> > result
>>> >
>>> >fq= my_text_field:Jurassic park the movie  returns
>>> > thousands of results
>>> >
>>> >
>>> > Which one is the correct syntax? I am confused why the first query
>>> doesn't
>>> > have any match at all.  I also thought 2 and 3 are the same, but turns
>>> out
>>> > quite different.
>>> >
>>> >
>>> > Thanks,
>>> > Wei
>>> >
>>>
>>
>>

Text Similarity

2018-07-11 Thread Aroop Ganguly

Hi Team

This is what I want to do:
1. I have 2 datasets of the schema id-number and company-name
2. I want to ultimately be able to link (join or any other means) the 2 data 
sets based on the similarity between the company-name fields of the 2 data set.

Example:

Dataset 1

Id | Company Name
—| —
1 | Aroop Inc
2 | Ganguly & Ganguly Corp


Dataset 2

Yo Revenue| Company Name
— — |
1K  | aroop and sons
2K  | Ganguly Corp
3K  | Ganguly and Ganguly
2K  | Aroop Inc.
6K  | Ganguly Corporation



I want to be able to get a join in the end, based on a smart similarity score 
between the company names in the 2 data sets.

Final Dataset
—---| —| |—   |
Id  | Company Name  |   Revenue |   Matched 
Company Name from Dataset2  |   Similarity Score
—---| —---—| —   
|———
1   | Aroop Inc |   2K  
|   Aroop Inc.  |   
99%
2   | Ganguly & Ganguly Corp|   3K  |   
Ganguly and Ganguly |   75%
—---| —| |—--- |

How should I proceed? (I have preprocessed the data sets to lowercase it and 
remove non essential words like pronouns and acronyms like LTD or Co. )

Thanks
Aroop

RE: SOLR 7.2.1 on SLES 11?

2018-07-11 Thread Lichte, Lucas R - DHS (Tek Systems)

Thanks for the head's up on that bug, it looks like we'll be doing some script 
editing either way.  I think 1 is the most popular with the team at this point, 
but I'll take the temperature and see how people feel.

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Wednesday, July 11, 2018 2:04 PM
To: solr-user@lucene.apache.org
Subject: Re: SOLR 7.2.1 on SLES 11?

On 7/11/2018 12:09 PM, Lichte, Lucas R - DHS (Tek Systems) wrote:
> Hello, we're trying to get SOLR 7.2.1 running on SLES 11 but we hit issues 
> with BASH 3 and the ${distro_string,,} at the beginning of the 
> install_solr_service.sh.  We're just trying to get this upgraded without 
> tossing out the old DB serves so we can get the content team happy and move 
> on to redesigning the environment.  We're wondering if anyone else has hit 
> this, and if they have any lessons learned.
>
> As we see it, there's a few options:
>
> 1.   Install OpenSUSE BASH 4, maybe in /opt
>
> 2.   Update the lowercase method to something from BASH 3 ( pipe to tr?)
>
> 3.   Do this by hand without the install_solr_service.sh
>
> 4.   Build new Redhat servers, migrate the DB and nuke these things.

Both bash 4 and SLES 11 are more than nine years old.  Upgrades are
definitely recommended.

The option that might be fastest is the second one you've presented --
changing anything in the scripts that requires bash 4 so it's compatible
with bash 3.  If you're comfortable with modifying a shell script in
this way, this is a good option.

The first option is probably a little bit safer -- install bash 4, and
make sure that this is the version used when installing and when
starting Solr.  That could be a PATH adjustment, or changing the shebang
in each script.

There is another issue you're going to need to deal with on SLES.  A fix
for this issue has not been committed to the source repository:

https://secure-web.cisco.com/1t8VBNgY_sYJsqMF0W7q4JFwbT7oK6SKtn6P7g6r3FhhNbrIOZEfCoZsmsAi3v22fJ1oXP7lOSwU6SNv1nCeY9u6V-zUCAYo6hVkHGu78vrtg3CJ8vy0AUnEkx0qsrV_tlSOejpFw2cFEYcYHllu8JO6rFCBDVOlGU-vEnR59YvzuL38hOD3qg62rO_i-g-JrT2BRLaZeieXUwhOUBmr85Ucz7nPlLxDSr935AXGdPQvoZmPurfOlY2Q0HFTG9fetjkv0Q0lOSefrwM5h1wR3cQ/https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSOLR-11853

Thanks,
Shawn

Re: SOLR 7.2.1 on SLES 11?

2018-07-11 Thread Shawn Heisey

On 7/11/2018 12:09 PM, Lichte, Lucas R - DHS (Tek Systems) wrote:
> Hello, we're trying to get SOLR 7.2.1 running on SLES 11 but we hit issues 
> with BASH 3 and the ${distro_string,,} at the beginning of the 
> install_solr_service.sh.  We're just trying to get this upgraded without 
> tossing out the old DB serves so we can get the content team happy and move 
> on to redesigning the environment.  We're wondering if anyone else has hit 
> this, and if they have any lessons learned.
>
> As we see it, there's a few options:
>
> 1.   Install OpenSUSE BASH 4, maybe in /opt
>
> 2.   Update the lowercase method to something from BASH 3 ( pipe to tr?)
>
> 3.   Do this by hand without the install_solr_service.sh
>
> 4.   Build new Redhat servers, migrate the DB and nuke these things.

Both bash 4 and SLES 11 are more than nine years old.  Upgrades are
definitely recommended.

The option that might be fastest is the second one you've presented --
changing anything in the scripts that requires bash 4 so it's compatible
with bash 3.  If you're comfortable with modifying a shell script in
this way, this is a good option.

The first option is probably a little bit safer -- install bash 4, and
make sure that this is the version used when installing and when
starting Solr.  That could be a PATH adjustment, or changing the shebang
in each script.

There is another issue you're going to need to deal with on SLES.  A fix
for this issue has not been committed to the source repository:

https://issues.apache.org/jira/browse/SOLR-11853

Thanks,
Shawn

Re: solr filter query on text field

2018-07-11 Thread Wei

btw, is there any difference if the fq field is a string field vs test
field?

On Wed, Jul 11, 2018 at 11:59 AM, Wei  wrote:

> Thanks Erick and Andrea!  If my default operator is OR,  fq=
> my_text_field:(Jurassic park the movie)  is equivalent to 
> my_text_field:(Jurassic
> OR park OR the OR movie)? That make sense.
>
> On Wed, Jul 11, 2018 at 9:06 AM, Andrea Gazzarini 
> wrote:
>
>> The syntax is valid in all those three examples, the right one depends on
>> what you need.
>>
>> The first query executes a proximity search (you can think to a phrase
>> search, for simplicity) so it returns no result because probably you don't
>> have any matching docs with that whole literal.
>>
>> The second is querying the my_text_field for all terms which compose the
>> value between parenthesis. You can think to a query where each term is an
>> optional clause, something like mytextfield:jurassic OR
>> mytextfiekd:park...
>> (it's not exactly an OR but this could give you the idea=
>>
>> The third example is not doing what you think. My_text_field is used only
>> with the first term (Jurassic) while the others are using the default
>> field. Something like mytextfield:jurassic OR defaultfield:park OR
>> defaultfield:the That's the reason  you have so many results (I guess
>> the default field is a catch-all field)
>>
>> Sorry for typos I'm using my mobile
>>
>> Andrea
>>
>> Il mer 11 lug 2018, 17:54 Wei  ha scritto:
>>
>> > Hi,
>> >
>> > I am running filter query on a field of text_general type and see
>> > completely different results for the following queries:
>> >
>> >fq= my_text_field:"Jurassic park the movie"   returns 0
>> > result
>> >
>> >fq= my_text_field:(Jurassic park the movie)   returns 20
>> > result
>> >
>> >fq= my_text_field:Jurassic park the movie  returns
>> > thousands of results
>> >
>> >
>> > Which one is the correct syntax? I am confused why the first query
>> doesn't
>> > have any match at all.  I also thought 2 and 3 are the same, but turns
>> out
>> > quite different.
>> >
>> >
>> > Thanks,
>> > Wei
>> >
>>
>
>

Re: solr filter query on text field

2018-07-11 Thread Wei

Thanks Erick and Andrea!  If my default operator is OR,  fq=
my_text_field:(Jurassic park the movie)  is equivalent to
my_text_field:(Jurassic
OR park OR the OR movie)? That make sense.

On Wed, Jul 11, 2018 at 9:06 AM, Andrea Gazzarini 
wrote:

> The syntax is valid in all those three examples, the right one depends on
> what you need.
>
> The first query executes a proximity search (you can think to a phrase
> search, for simplicity) so it returns no result because probably you don't
> have any matching docs with that whole literal.
>
> The second is querying the my_text_field for all terms which compose the
> value between parenthesis. You can think to a query where each term is an
> optional clause, something like mytextfield:jurassic OR mytextfiekd:park...
> (it's not exactly an OR but this could give you the idea=
>
> The third example is not doing what you think. My_text_field is used only
> with the first term (Jurassic) while the others are using the default
> field. Something like mytextfield:jurassic OR defaultfield:park OR
> defaultfield:the That's the reason  you have so many results (I guess
> the default field is a catch-all field)
>
> Sorry for typos I'm using my mobile
>
> Andrea
>
> Il mer 11 lug 2018, 17:54 Wei  ha scritto:
>
> > Hi,
> >
> > I am running filter query on a field of text_general type and see
> > completely different results for the following queries:
> >
> >fq= my_text_field:"Jurassic park the movie"   returns 0
> > result
> >
> >fq= my_text_field:(Jurassic park the movie)   returns 20
> > result
> >
> >fq= my_text_field:Jurassic park the movie  returns
> > thousands of results
> >
> >
> > Which one is the correct syntax? I am confused why the first query
> doesn't
> > have any match at all.  I also thought 2 and 3 are the same, but turns
> out
> > quite different.
> >
> >
> > Thanks,
> > Wei
> >
>

Re: Solr7.3.1 Installation

2018-07-11 Thread Erick Erickson

Gah! Jason is right Siiih. That'll teach me to try to do two
things at once.

On Wed, Jul 11, 2018 at 11:02 AM, Jason Gerlowski  wrote:
> (I think Erick made a slight typo above: to disable "bad apple" tests,
> use the flag "-Dtests.badapples=false")
> On Wed, Jul 11, 2018 at 11:14 AM Erick Erickson  
> wrote:
>>
>> Note that the native test runs have the know-flaky tests _enabled_ by
>> default, run tests with
>>
>> -Dtests.badapples=true
>>
>> to disable them.
>>
>> Second possibility is to look at the tests that failed and if there is
>> an annotation
>> @BadApple
>> or
>> @AwaitsFix
>> ignore the failure if you can get the tests to pass when running 
>> individually.
>>
>> As Shawn says, this is a known issue that we're working on, but the
>> technical debt is such that it'll
>> be a long-term issue to fix.
>>
>> Best,
>> Erick
>>
>>
>>
>> On Wed, Jul 11, 2018 at 7:13 AM, Shawn Heisey  wrote:
>> > On 7/10/2018 11:20 PM, tapan1707 wrote:
>> >>
>> >> We are trying to install solr-7.3.1 into our existing system (We have also
>> >> made some changes by adding one custom query parser).
>> >>
>> >> I am having some build issues and it would be really helpful if someone
>> >> can
>> >> help.
>> >>
>> >> While running ant test(in the process of building the solr package), it
>> >> terminates because of failed tests.
>> >
>> >
>> > This is a known problem.  Solr's tests are in not in a good state.
>> > Sometimes they pass, sometimes they fail.  Since there are so many tests 
>> > and
>> > a fair number of them do fail intermittently, this creates a situation 
>> > where
>> > on most test runs, there is at least one test failure.  Run the tests 
>> > enough
>> > times, and eventually they will all pass ... but this usually takes many
>> > runs.
>> >
>> > Looking at the commands you're using in your script:  After a user has run
>> > the "ant ivy-bootstrap" command once, ivy is downloaded into the user's 
>> > home
>> > directory and does not need to be downloaded again.  Only the "ant package"
>> > command (run in the "solr" subdirectory) is actually needed to build Solr.
>> > The rest of the commands are not needed.
>> >
>> > As Emir said, you don't need to build Solr at all, even when using custom
>> > plugins.  You can download and use the binary package.
>> >
>> > There is effort underway to solve the problem with Solr tests. The initial
>> > phase of that effort is to disable the tests that fail most frequently.  
>> > The
>> > second overlapping phase of the effort is to actually fix those tests so
>> > that they don't fail - either by fixing bugs in the tests themselves, or by
>> > fixing real bugs in Solr.
>> >
>> >> Also, does ant version has any effects in build??
>> >
>> >
>> > Ant 1.8 and 1.9 should work.  Versions 1.10.0, 1.10.1, as well as 1.10.3 
>> > and
>> > later should be fine, but 1.10.2 has a bug that results in the lucene-solr
>> > build failing:
>> >
>> > https://issues.apache.org/jira/browse/LUCENE-8189
>> >
>> >> At last, at present, we are using solr-6.4.2 which has zookeeper-3.4.6
>> >> dependency but for solr-7, the zookeeper dependency has been upgraded to
>> >> 3.4.10, so my question is, At what extent does this might affect our
>> >> system
>> >> performance? Can we use zookeeper-3.4.6 with solr-7?
>> >> (same with the jetty version)
>> >
>> >
>> > You should be able to use any ZK 3.4.x server version with any version of
>> > Solr.  Most versions of Solr should also work with 3.5.x (still in beta)
>> > servers.  Early 4.x version s shipped with ZK 3.3.x, and the ZK project 
>> > does
>> > not guarantee compatibility between 3.3.x and 3.5.x.
>> >
>> > I can't guarantee that you won't run into bugs, but ZK is generally a very
>> > stable piece of software.  Each new release of ZK includes a very large 
>> > list
>> > of bugfixes.  I have no idea what implications there are for performance.
>> > You would need to ask a ZK support resource that question.  The latest
>> > stable release that is compatible with your software is the recommended
>> > version.  Currently that is 3.4.12.  The 3.5.x releases are in beta.
>> >
>> > Thanks,
>> > Shawn
>> >

SOLR 7.2.1 on SLES 11?

2018-07-11 Thread Lichte, Lucas R - DHS (Tek Systems)

Hello, we're trying to get SOLR 7.2.1 running on SLES 11 but we hit issues with 
BASH 3 and the ${distro_string,,} at the beginning of the 
install_solr_service.sh.  We're just trying to get this upgraded without 
tossing out the old DB serves so we can get the content team happy and move on 
to redesigning the environment.  We're wondering if anyone else has hit this, 
and if they have any lessons learned.

As we see it, there's a few options:


1.Install OpenSUSE BASH 4, maybe in /opt

2.   Update the lowercase method to something from BASH 3 ( pipe to tr?)

3.   Do this by hand without the install_solr_service.sh

4.   Build new Redhat servers, migrate the DB and nuke these things.

Does anyone have any experience/suggestions here?  Are there any gotchas for 
SOLR 7.2.1 in SLES 11?

**
NOTICE: This email and any attachments may contain confidential information. 
Use and further disclosure of the information by the recipient must be 
consistent with applicable laws, regulations and agreements. If you received 
this email in error, please notify the sender; delete the email; and do not 
use, disclose or store the information it contains.

Re: Solr7.3.1 Installation

2018-07-11 Thread Jason Gerlowski

(I think Erick made a slight typo above: to disable "bad apple" tests,
use the flag "-Dtests.badapples=false")
On Wed, Jul 11, 2018 at 11:14 AM Erick Erickson  wrote:
>
> Note that the native test runs have the know-flaky tests _enabled_ by
> default, run tests with
>
> -Dtests.badapples=true
>
> to disable them.
>
> Second possibility is to look at the tests that failed and if there is
> an annotation
> @BadApple
> or
> @AwaitsFix
> ignore the failure if you can get the tests to pass when running individually.
>
> As Shawn says, this is a known issue that we're working on, but the
> technical debt is such that it'll
> be a long-term issue to fix.
>
> Best,
> Erick
>
>
>
> On Wed, Jul 11, 2018 at 7:13 AM, Shawn Heisey  wrote:
> > On 7/10/2018 11:20 PM, tapan1707 wrote:
> >>
> >> We are trying to install solr-7.3.1 into our existing system (We have also
> >> made some changes by adding one custom query parser).
> >>
> >> I am having some build issues and it would be really helpful if someone
> >> can
> >> help.
> >>
> >> While running ant test(in the process of building the solr package), it
> >> terminates because of failed tests.
> >
> >
> > This is a known problem.  Solr's tests are in not in a good state.
> > Sometimes they pass, sometimes they fail.  Since there are so many tests and
> > a fair number of them do fail intermittently, this creates a situation where
> > on most test runs, there is at least one test failure.  Run the tests enough
> > times, and eventually they will all pass ... but this usually takes many
> > runs.
> >
> > Looking at the commands you're using in your script:  After a user has run
> > the "ant ivy-bootstrap" command once, ivy is downloaded into the user's home
> > directory and does not need to be downloaded again.  Only the "ant package"
> > command (run in the "solr" subdirectory) is actually needed to build Solr.
> > The rest of the commands are not needed.
> >
> > As Emir said, you don't need to build Solr at all, even when using custom
> > plugins.  You can download and use the binary package.
> >
> > There is effort underway to solve the problem with Solr tests. The initial
> > phase of that effort is to disable the tests that fail most frequently.  The
> > second overlapping phase of the effort is to actually fix those tests so
> > that they don't fail - either by fixing bugs in the tests themselves, or by
> > fixing real bugs in Solr.
> >
> >> Also, does ant version has any effects in build??
> >
> >
> > Ant 1.8 and 1.9 should work.  Versions 1.10.0, 1.10.1, as well as 1.10.3 and
> > later should be fine, but 1.10.2 has a bug that results in the lucene-solr
> > build failing:
> >
> > https://issues.apache.org/jira/browse/LUCENE-8189
> >
> >> At last, at present, we are using solr-6.4.2 which has zookeeper-3.4.6
> >> dependency but for solr-7, the zookeeper dependency has been upgraded to
> >> 3.4.10, so my question is, At what extent does this might affect our
> >> system
> >> performance? Can we use zookeeper-3.4.6 with solr-7?
> >> (same with the jetty version)
> >
> >
> > You should be able to use any ZK 3.4.x server version with any version of
> > Solr.  Most versions of Solr should also work with 3.5.x (still in beta)
> > servers.  Early 4.x version s shipped with ZK 3.3.x, and the ZK project does
> > not guarantee compatibility between 3.3.x and 3.5.x.
> >
> > I can't guarantee that you won't run into bugs, but ZK is generally a very
> > stable piece of software.  Each new release of ZK includes a very large list
> > of bugfixes.  I have no idea what implications there are for performance.
> > You would need to ask a ZK support resource that question.  The latest
> > stable release that is compatible with your software is the recommended
> > version.  Currently that is 3.4.12.  The 3.5.x releases are in beta.
> >
> > Thanks,
> > Shawn
> >

Re: solr filter query on text field

2018-07-11 Thread Andrea Gazzarini

The syntax is valid in all those three examples, the right one depends on
what you need.

The first query executes a proximity search (you can think to a phrase
search, for simplicity) so it returns no result because probably you don't
have any matching docs with that whole literal.

The second is querying the my_text_field for all terms which compose the
value between parenthesis. You can think to a query where each term is an
optional clause, something like mytextfield:jurassic OR mytextfiekd:park...
(it's not exactly an OR but this could give you the idea=

The third example is not doing what you think. My_text_field is used only
with the first term (Jurassic) while the others are using the default
field. Something like mytextfield:jurassic OR defaultfield:park OR
defaultfield:the That's the reason  you have so many results (I guess
the default field is a catch-all field)

Sorry for typos I'm using my mobile

Andrea

Il mer 11 lug 2018, 17:54 Wei  ha scritto:

> Hi,
>
> I am running filter query on a field of text_general type and see
> completely different results for the following queries:
>
>fq= my_text_field:"Jurassic park the movie"   returns 0
> result
>
>fq= my_text_field:(Jurassic park the movie)   returns 20
> result
>
>fq= my_text_field:Jurassic park the movie  returns
> thousands of results
>
>
> Which one is the correct syntax? I am confused why the first query doesn't
> have any match at all.  I also thought 2 and 3 are the same, but turns out
> quite different.
>
>
> Thanks,
> Wei
>

Re: Regarding pdf indexing issue

2018-07-11 Thread Terry Steichen

Walter,

Well said.  (And I love the hamburger conversion analogy - very apt.)

The only thing I will add is that when you have a collection of similar
rich text documents, you might be able to construct queries to respect
internal structures within the documents.  If all/most of your documents
have a unique line like "subject:", you might be able to be selective.

Also, if your documents are organized on disk in some categorical way,
you can include in your query, a reference to that categorical
information (via the id:*pattern* field).

Finally, there *might* be useful information in the metadata that you
can use in refining your searches.

Terry


On 07/11/2018 11:42 AM, Walter Underwood wrote:
> PDF is not a structured document format. It is a printer control format.
>
> PDF does not have a paragraph marker. Instead, it says to move
> to this spot on the page, choose this font, and print this letter. For a
> paragraph, it moves farther. For the next letter in a word, it moves a 
> little bit. Extracting paragraphs from that is a difficult pattern recognition
> problem.
>
> I worked with a PDF of a two-column magazine article that printed
> the first line of column 1, then the first line of column 2, then the 
> second line of column 1, and so on. If a line ended with a hyphenated
> word, too bad.
>
> Extracting structure from a PDF document is somewhere between 
> very hard and impossible. Someone I worked with said that getting
> structured text from PDF was like turning hamburger back into a cow.
>
> Since Acrobat 5, there is “tagged PDF”. I’m not sure how widely that
> is used. It appears to be an accessibility feature, so it still might not
> be useful for search.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>> On Jul 11, 2018, at 8:07 AM, Erick Erickson  wrote:
>>
>> Solr will not do this automatically, the Extracting Request Handler
>> simply indexes the entire contents of the doc without regard to things
>> like paragraphs etc. Ditto with HTML. This is actually a task that
>> requires getting into Tika and using all the bells and whistles there.
>>
>> I'd recommend two things:
>>
>> 1> Take the PDF parsing offline, i.e. in a separate client. There are
>> many reasons for this, in particular you can attempt to do what you're
>> asking. See: https://lucidworks.com/2012/02/14/indexing-with-solrj/
>>
>> 2> Talk to the Tika folks about the best ways to make Tika return the
>> information such that you can index them and get what you'd like.
>>
>> Best,
>> Erick
>>
>> On Wed, Jul 11, 2018 at 6:35 AM, Rahul Prasad Dwivedi
>>  wrote:
>>> Hello Team,
>>>
>>> I am using the Solr for indexing and searching for pdf document
>>>
>>> I have go through with your website document and installed solr but unable
>>> to index and search the document.
>>>
>>> For example: Suppose we have a PDF file which have no of paragraph with
>>> separate heading.
>>>
>>> So If I search for the title on indexed pdf the result should be contain
>>> the paragraph from where the title belongs.
>>>
>>> I am unable to perform this task.
>>>
>>> I have run the below command for upload the pdf
>>>
>>> *bin/post -c gettingstarted pdf-sample.pdf*
>>>
>>> and for searching I am running the command
>>>
>>> *curl http://localhost:8983/solr/gettingstarted/select?q='*
>>> >>
>>> Please suggest me anything and let me know if I am missing anything
>>>
>>> Thanks,
>>>
>>> Rahul
>

Re: solr filter query on text field

2018-07-11 Thread Erick Erickson

1> is looking for the _phrase_, so the four tokens "jurassic" "park"
"the" "movie" have to appear next to each other in that order.

2> is looking for the four tokens anywhere in the field. Whether they
_all_ must appear depends on whether the default operator (OR or AND).

3> is parsed as my_text_field:Jurassic default_text_field:pard
default_text_field:the default_text_field:movie.

Adding =query to your query will show you what the parsed query
looks like and help answer these kinds of questions.

Best,
Erick

On Wed, Jul 11, 2018 at 8:54 AM, Wei  wrote:
> Hi,
>
> I am running filter query on a field of text_general type and see
> completely different results for the following queries:
>
>fq= my_text_field:"Jurassic park the movie"   returns 0
> result
>
>fq= my_text_field:(Jurassic park the movie)   returns 20
> result
>
>fq= my_text_field:Jurassic park the movie  returns
> thousands of results
>
>
> Which one is the correct syntax? I am confused why the first query doesn't
> have any match at all.  I also thought 2 and 3 are the same, but turns out
> quite different.
>
>
> Thanks,
> Wei

Re: Regarding pdf indexing issue

2018-07-11 Thread Shamik Sinha

You may try to use tesseract tool to check data extraction from pdf or
images and then go forward accordingly. As far as I understand the PDF is
an image and not data. The searchable PDF actually overlays the selectable
text as hidden text over the PDF image. These PDFs can be indexed and
extracted. These are mostly supported in english and other latin
derivatives. You may face problems to extract/index text based on any other
language. Handwritten text converted to PDFs are next to impossible to
index/extract. Apache Tika may be the solution you are looking for
On Wed 11 Jul, 2018, 9:12 PM Walter Underwood, 
wrote:

> PDF is not a structured document format. It is a printer control format.
>
> PDF does not have a paragraph marker. Instead, it says to move
> to this spot on the page, choose this font, and print this letter. For a
> paragraph, it moves farther. For the next letter in a word, it moves a
> little bit. Extracting paragraphs from that is a difficult pattern
> recognition
> problem.
>
> I worked with a PDF of a two-column magazine article that printed
> the first line of column 1, then the first line of column 2, then the
> second line of column 1, and so on. If a line ended with a hyphenated
> word, too bad.
>
> Extracting structure from a PDF document is somewhere between
> very hard and impossible. Someone I worked with said that getting
> structured text from PDF was like turning hamburger back into a cow.
>
> Since Acrobat 5, there is “tagged PDF”. I’m not sure how widely that
> is used. It appears to be an accessibility feature, so it still might not
> be useful for search.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Jul 11, 2018, at 8:07 AM, Erick Erickson 
> wrote:
> >
> > Solr will not do this automatically, the Extracting Request Handler
> > simply indexes the entire contents of the doc without regard to things
> > like paragraphs etc. Ditto with HTML. This is actually a task that
> > requires getting into Tika and using all the bells and whistles there.
> >
> > I'd recommend two things:
> >
> > 1> Take the PDF parsing offline, i.e. in a separate client. There are
> > many reasons for this, in particular you can attempt to do what you're
> > asking. See: https://lucidworks.com/2012/02/14/indexing-with-solrj/
> >
> > 2> Talk to the Tika folks about the best ways to make Tika return the
> > information such that you can index them and get what you'd like.
> >
> > Best,
> > Erick
> >
> > On Wed, Jul 11, 2018 at 6:35 AM, Rahul Prasad Dwivedi
> >  wrote:
> >> Hello Team,
> >>
> >> I am using the Solr for indexing and searching for pdf document
> >>
> >> I have go through with your website document and installed solr but
> unable
> >> to index and search the document.
> >>
> >> For example: Suppose we have a PDF file which have no of paragraph with
> >> separate heading.
> >>
> >> So If I search for the title on indexed pdf the result should be contain
> >> the paragraph from where the title belongs.
> >>
> >> I am unable to perform this task.
> >>
> >> I have run the below command for upload the pdf
> >>
> >> *bin/post -c gettingstarted pdf-sample.pdf*
> >>
> >> and for searching I am running the command
> >>
> >> *curl http://localhost:8983/solr/gettingstarted/select?q='*
> >>  >>
> >> Please suggest me anything and let me know if I am missing anything
> >>
> >> Thanks,
> >>
> >> Rahul
>
>

solr filter query on text field

2018-07-11 Thread Wei

Hi,

I am running filter query on a field of text_general type and see
completely different results for the following queries:

   fq= my_text_field:"Jurassic park the movie"   returns 0
result

   fq= my_text_field:(Jurassic park the movie)   returns 20
result

   fq= my_text_field:Jurassic park the movie  returns
thousands of results


Which one is the correct syntax? I am confused why the first query doesn't
have any match at all.  I also thought 2 and 3 are the same, but turns out
quite different.


Thanks,
Wei

Re: Regarding pdf indexing issue

2018-07-11 Thread Walter Underwood

PDF is not a structured document format. It is a printer control format.

PDF does not have a paragraph marker. Instead, it says to move
to this spot on the page, choose this font, and print this letter. For a
paragraph, it moves farther. For the next letter in a word, it moves a 
little bit. Extracting paragraphs from that is a difficult pattern recognition
problem.

I worked with a PDF of a two-column magazine article that printed
the first line of column 1, then the first line of column 2, then the 
second line of column 1, and so on. If a line ended with a hyphenated
word, too bad.

Extracting structure from a PDF document is somewhere between 
very hard and impossible. Someone I worked with said that getting
structured text from PDF was like turning hamburger back into a cow.

Since Acrobat 5, there is “tagged PDF”. I’m not sure how widely that
is used. It appears to be an accessibility feature, so it still might not
be useful for search.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jul 11, 2018, at 8:07 AM, Erick Erickson  wrote:
> 
> Solr will not do this automatically, the Extracting Request Handler
> simply indexes the entire contents of the doc without regard to things
> like paragraphs etc. Ditto with HTML. This is actually a task that
> requires getting into Tika and using all the bells and whistles there.
> 
> I'd recommend two things:
> 
> 1> Take the PDF parsing offline, i.e. in a separate client. There are
> many reasons for this, in particular you can attempt to do what you're
> asking. See: https://lucidworks.com/2012/02/14/indexing-with-solrj/
> 
> 2> Talk to the Tika folks about the best ways to make Tika return the
> information such that you can index them and get what you'd like.
> 
> Best,
> Erick
> 
> On Wed, Jul 11, 2018 at 6:35 AM, Rahul Prasad Dwivedi
>  wrote:
>> Hello Team,
>> 
>> I am using the Solr for indexing and searching for pdf document
>> 
>> I have go through with your website document and installed solr but unable
>> to index and search the document.
>> 
>> For example: Suppose we have a PDF file which have no of paragraph with
>> separate heading.
>> 
>> So If I search for the title on indexed pdf the result should be contain
>> the paragraph from where the title belongs.
>> 
>> I am unable to perform this task.
>> 
>> I have run the below command for upload the pdf
>> 
>> *bin/post -c gettingstarted pdf-sample.pdf*
>> 
>> and for searching I am running the command
>> 
>> *curl http://localhost:8983/solr/gettingstarted/select?q='*
>> > 
>> Please suggest me anything and let me know if I am missing anything
>> 
>> Thanks,
>> 
>> Rahul

Re: Solr unable to start up after setting up SSL in Solr 7.4.0

2018-07-11 Thread Zheng Lin Edwin Yeo

Hi,

I found that if we replace the following files with the copy from Solr
7.3.1, the SSL can work
- jetty.xml
- jetty-http.xml
- jetty-https.xml
- jetty-ssl.xml

But the copies that comes with Solr 7.4.0 are not working.

I found there are some differences in the file, but not sure if there are
other changes required, or if there are bugs in the copies in Solr 7.4.0?

Regards,
Edwin

On 4 July 2018 at 11:20, Zheng Lin Edwin Yeo  wrote:

> Hi,
>
> Would like to check, if there are any major changes in the way the SSL
> works for Solr 7.4.0?
>
> I have tried to set up with the same method that I used for Solr 7.3.1,
> but after setting it up, the Solr is unable to load.
>
> Below is the error message that I get.
>
> Caused by: java.security.PrivilegedActionException:
> java.lang.ClassNotFoundExcep
> tion: org.apache.solr.util.configuration.SSLConfigurationsFactory
> at java.security.AccessController.doPrivileged(Native Method)
> at org.eclipse.jetty.xml.XmlConfiguration.main(
> XmlConfiguration.java:150
> 8)
> ... 7 more
> Caused by: java.lang.ClassNotFoundException: org.apache.solr.util.
> configuration.
> SSLConfigurationsFactory
> at java.net.URLClassLoader.findClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> at org.eclipse.jetty.util.Loader.loadClass(Loader.java:65)
> at org.eclipse.jetty.xml.XmlConfiguration$
> JettyXmlConfiguration.call(Xml
> Configuration.java:784)
> at org.eclipse.jetty.xml.XmlConfiguration$
> JettyXmlConfiguration.configur
> e(XmlConfiguration.java:469)
> at org.eclipse.jetty.xml.XmlConfiguration$
> JettyXmlConfiguration.configur
> e(XmlConfiguration.java:410)
> at org.eclipse.jetty.xml.XmlConfiguration.configure(
> XmlConfiguration.jav
> a:308)
> at org.eclipse.jetty.xml.XmlConfiguration$1.run(
> XmlConfiguration.java:15
> 55)
> at org.eclipse.jetty.xml.XmlConfiguration$1.run(
> XmlConfiguration.java:15
> 09)
> ... 9 more
> java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> at java.lang.reflect.Method.invoke(Unknown Source)
> at org.eclipse.jetty.start.Main.invokeMain(Main.java:220)
> at org.eclipse.jetty.start.Main.start(Main.java:486)
> at org.eclipse.jetty.start.Main.main(Main.java:77)
> Caused by: java.security.PrivilegedActionException:
> java.lang.ClassNotFoundExcep
> tion: org.apache.solr.util.configuration.SSLConfigurationsFactory
> at java.security.AccessController.doPrivileged(Native Method)
> at org.eclipse.jetty.xml.XmlConfiguration.main(
> XmlConfiguration.java:150
> 8)
> ... 7 more
> Caused by: java.lang.ClassNotFoundException: org.apache.solr.util.
> configuration.
> SSLConfigurationsFactory
> at java.net.URLClassLoader.findClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> at org.eclipse.jetty.util.Loader.loadClass(Loader.java:65)
> at org.eclipse.jetty.xml.XmlConfiguration$
> JettyXmlConfiguration.call(Xml
> Configuration.java:784)
> at org.eclipse.jetty.xml.XmlConfiguration$
> JettyXmlConfiguration.configur
> e(XmlConfiguration.java:469)
> at org.eclipse.jetty.xml.XmlConfiguration$
> JettyXmlConfiguration.configur
> e(XmlConfiguration.java:410)
> at org.eclipse.jetty.xml.XmlConfiguration.configure(
> XmlConfiguration.jav
> a:308)
> at org.eclipse.jetty.xml.XmlConfiguration$1.run(
> XmlConfiguration.java:15
> 55)
> at org.eclipse.jetty.xml.XmlConfiguration$1.run(
> XmlConfiguration.java:15
> 09)
> ... 9 more
>
> Usage: java -jar $JETTY_HOME/start.jar [options] [properties] [configs]
>java -jar $JETTY_HOME/start.jar --help  # for more information
>
>
> Regards,
> Edwin
>

Re: Number of fields in a solrCloud collection config

2018-07-11 Thread Erick Erickson

As Shawn says, there's no hard limit, but having that many fields
almost always indicates a flawed design.

I'd pretty strongly suggest that you reconsider your design with an
eye towards reducing the field count to
something less than, say, 1,000. That number is fairly arbitrary, and
ideally I'd want it even smaller
but it's not a bad target.

And do note that if you elect to do this, the meta-data will still be
hanging around and you must
create a new collection and re-index from scratch to purge it.

Best,
Erick

On Wed, Jul 11, 2018 at 6:55 AM, Shawn Heisey  wrote:
> On 7/11/2018 2:05 AM, Sharif Shahriar wrote:
>>
>> Is there any limitation on how many field can be added in a solrcloud
>> collection configset?
>> After adding 24,520 fields when I wan't to add new fields, it shows
>> -"Error persisting managed schema at /configs/*/managed-schema"
>> -"zkClient has disconnected"
>
>
> There is no hard limit on the number of fields.  It is likely that you are
> running into something else.   Without detailed logs it's difficult to know
> for sure what the problem is, but I do have one idea:
>
> The most likely limit that I can think of that you'd be running into is the
> limit on the size of an individual entry in the ZooKeeper database -- your
> schema file is getting too big.  By default this limit is right around one
> megabyte.  Changing it is possible, but it must be done on all ZK servers
> and every single ZK client (which includes Solr).
>
> https://zookeeper.apache.org/doc/r3.4.12/zookeeperAdmin.html#Experimental+Options%2FFeatures
>
> If that's not the problem, then we will need to see your logs to figure out
> what the problem might be.  To properly interpret the log, the exact Solr
> version must be known.
>
> Thanks,
> Shawn
>

LTR features not found in Solr after uploading

2018-07-11 Thread Zheng Lin Edwin Yeo

Hi,

I am setting up the Learing to Rank (LTR) in Solr 7.4.0, and I am following
the steps from the guide from
https://lucene.apache.org/solr/guide/7_4/learning-to-rank.html

However, after uploading the features file (myFeatures.json) using curl
with the same structure as the myFeatures.json in the example, I am not
able to find any of the features when I tried to run the URL
http://localhost:8983/solr/techproducts/schema/feature-store

Any idea what could be the issue here? There is no error message or
anything.

Regards,
Edwin

Re: Solr7.3.1 Installation

2018-07-11 Thread Erick Erickson

Note that the native test runs have the know-flaky tests _enabled_ by
default, run tests with

-Dtests.badapples=true

to disable them.

Second possibility is to look at the tests that failed and if there is
an annotation
@BadApple
or
@AwaitsFix
ignore the failure if you can get the tests to pass when running individually.

As Shawn says, this is a known issue that we're working on, but the
technical debt is such that it'll
be a long-term issue to fix.

Best,
Erick



On Wed, Jul 11, 2018 at 7:13 AM, Shawn Heisey  wrote:
> On 7/10/2018 11:20 PM, tapan1707 wrote:
>>
>> We are trying to install solr-7.3.1 into our existing system (We have also
>> made some changes by adding one custom query parser).
>>
>> I am having some build issues and it would be really helpful if someone
>> can
>> help.
>>
>> While running ant test(in the process of building the solr package), it
>> terminates because of failed tests.
>
>
> This is a known problem.  Solr's tests are in not in a good state.
> Sometimes they pass, sometimes they fail.  Since there are so many tests and
> a fair number of them do fail intermittently, this creates a situation where
> on most test runs, there is at least one test failure.  Run the tests enough
> times, and eventually they will all pass ... but this usually takes many
> runs.
>
> Looking at the commands you're using in your script:  After a user has run
> the "ant ivy-bootstrap" command once, ivy is downloaded into the user's home
> directory and does not need to be downloaded again.  Only the "ant package"
> command (run in the "solr" subdirectory) is actually needed to build Solr.
> The rest of the commands are not needed.
>
> As Emir said, you don't need to build Solr at all, even when using custom
> plugins.  You can download and use the binary package.
>
> There is effort underway to solve the problem with Solr tests. The initial
> phase of that effort is to disable the tests that fail most frequently.  The
> second overlapping phase of the effort is to actually fix those tests so
> that they don't fail - either by fixing bugs in the tests themselves, or by
> fixing real bugs in Solr.
>
>> Also, does ant version has any effects in build??
>
>
> Ant 1.8 and 1.9 should work.  Versions 1.10.0, 1.10.1, as well as 1.10.3 and
> later should be fine, but 1.10.2 has a bug that results in the lucene-solr
> build failing:
>
> https://issues.apache.org/jira/browse/LUCENE-8189
>
>> At last, at present, we are using solr-6.4.2 which has zookeeper-3.4.6
>> dependency but for solr-7, the zookeeper dependency has been upgraded to
>> 3.4.10, so my question is, At what extent does this might affect our
>> system
>> performance? Can we use zookeeper-3.4.6 with solr-7?
>> (same with the jetty version)
>
>
> You should be able to use any ZK 3.4.x server version with any version of
> Solr.  Most versions of Solr should also work with 3.5.x (still in beta)
> servers.  Early 4.x version s shipped with ZK 3.3.x, and the ZK project does
> not guarantee compatibility between 3.3.x and 3.5.x.
>
> I can't guarantee that you won't run into bugs, but ZK is generally a very
> stable piece of software.  Each new release of ZK includes a very large list
> of bugfixes.  I have no idea what implications there are for performance.
> You would need to ask a ZK support resource that question.  The latest
> stable release that is compatible with your software is the recommended
> version.  Currently that is 3.4.12.  The 3.5.x releases are in beta.
>
> Thanks,
> Shawn
>

Re: Regarding pdf indexing issue

2018-07-11 Thread Erick Erickson

Solr will not do this automatically, the Extracting Request Handler
simply indexes the entire contents of the doc without regard to things
like paragraphs etc. Ditto with HTML. This is actually a task that
requires getting into Tika and using all the bells and whistles there.

I'd recommend two things:

1> Take the PDF parsing offline, i.e. in a separate client. There are
many reasons for this, in particular you can attempt to do what you're
asking. See: https://lucidworks.com/2012/02/14/indexing-with-solrj/

2> Talk to the Tika folks about the best ways to make Tika return the
information such that you can index them and get what you'd like.

Best,
Erick

On Wed, Jul 11, 2018 at 6:35 AM, Rahul Prasad Dwivedi
 wrote:
> Hello Team,
>
> I am using the Solr for indexing and searching for pdf document
>
> I have go through with your website document and installed solr but unable
> to index and search the document.
>
> For example: Suppose we have a PDF file which have no of paragraph with
> separate heading.
>
> So If I search for the title on indexed pdf the result should be contain
> the paragraph from where the title belongs.
>
> I am unable to perform this task.
>
> I have run the below command for upload the pdf
>
> *bin/post -c gettingstarted pdf-sample.pdf*
>
> and for searching I am running the command
>
> *curl http://localhost:8983/solr/gettingstarted/select?q='*
> 
> Please suggest me anything and let me know if I am missing anything
>
> Thanks,
>
> Rahul

Regarding pdf indexing issue

2018-07-11 Thread Rahul Prasad Dwivedi

Hello Team,

I am using the Solr for indexing and searching for pdf document

I have go through with your website document and installed solr but unable
to index and search the document.

For example: Suppose we have a PDF file which have no of paragraph with
separate heading.

So If I search for the title on indexed pdf the result should be contain
the paragraph from where the title belongs.

I am unable to perform this task.

I have run the below command for upload the pdf

*bin/post -c gettingstarted pdf-sample.pdf*

and for searching I am running the command

*curl http://localhost:8983/solr/gettingstarted/select?q='*

Re: Sum and aggregation on nested documents field

2018-07-11 Thread jeebix

My apologize Mikhail, I try to explain it better :

This is actually what I get from SOLR with the query you helped me to build
:

"responseHeader":{
"status":0,
"QTime":15,
"params":{
  "q":"{!parent which=object_type_s:contact score=max v=$chq}",
  "indent":"on",
  "fl":"*,score,[child parentFilter=object_type_s:contact]",
  "fq":["_query_:\"{!parent which=object_type_s:contact} (enseigne_s:SAV
AND (type_cde_s:CDE OR type_cde_s:REASSORT) AND campagne_s:G)\"",
"-_query_:\"{!parent which=object_type_s:contact} (enseigne_s:SAV
AND type_cde_s:KIT AND campagne_s:I)\"",
"{!frange l=3}{!parent which=object_type_s:contact score=total
v=$chq}"],
  "chq":"+object_type:order +campagne_s:G +enseigne_s:SAV
+type_cde_s:(CDE OR REASSORT) {!func}TTC_i",
  "wt":"json",
  "debugQuery":"on",
  "_":"1531145722719"}},
  "response":{"numFound":2,"start":0,"maxScore":31640.16,"docs":[
  {
"id":"94000.94001.1117636",
"parent_i":94000,
"asso_i":94001,
"personne_i":1117636,
"object_type_s":"contact",
"date_derni_re_commande_dt":"2017-11-13T00:00:00Z",
"_version_":160558310031362,
*"score":31640.16*,
"_childDocuments_":[
{
  "object_type":["order"],
  "TTC_i":0,
  "type_cde_s":"KIT",
  "campagne_s":"G",
  "enseigne_s":"SAV"
 },
{
  "object_type":["order"],
  "TTC_i":31636,
  "type_cde_s":"CDE",
  "campagne_s":"G",
  "enseigne_s":"SAV"
 }

I already get the TTC_i sum, by parent document with the "score" parameter.
If I correctly understand, the {!frange} allows me to filter on that
"score", so I can get the result of that question : "get the parent which
sum of TTC_i, with campagne_s:G and enseigne_s:SAV and type_cde_s:CDE or
type_cde_s:REASSORT, is higher than 3".

If you think it's OK, I get the results I wanted. Then my goal is to build a
facet to get the number of parent docs which answer to the query, with
others constraints (like only one campagne_s, etc).
I think I have to use the json facet API, with facet.pivot for example.

Do you think I'm on the right way ?

Best

JB



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Sorting and pagination in Solr json range facet

2018-07-11 Thread simon

Looking carefully at the documentation for JSON facets, it looks as though
the offset parameter is not supported for range facets, only for term
facets.  You'd have to do pagination in your application.

-Simon

On Tue, Jul 10, 2018 at 11:45 AM, Anil  wrote:

> HI Eric,
>
> i mean pagination is offset and limit for facet results. Basically i am
> trying to sort the daily totals (from json facet field) and apply offset,
> limit to the buckets.
>
> json.facet=
> {
> daily_totals: {
> type: range,
> field: daily_window,
> start : "2017-11-01T00:00:00Z",
> end : "2018-03-14T00:00:00Z",
> gap:"%+1DAY",
> sort: daily_total,
> mincount:1,
> facet: {
> daily_total: "sum(daily_views)"
> }
> }
> }
>
> please let me know if you have any questions. thanks.
>
> Regards,
> Anil
>
> On 10 July 2018 at 20:22, Erick Erickson  wrote:
>
> > What exactly do you mean by "pagination" here? Facets are computed over
> > the entire result set. That is, if the number of documents found for the
> > query
> > is 1,000,000, the facets are returned counted over all 1M docs, even if
> > your
> > rows parameter is 10. The same numbers will be returned for facets
> > regardless of the start and rows parameters.
> >
> > This feels like an XY problem, you're asking how to do X (paginate
> facets)
> > to solve problem Y, but haven't stated what Y is. What's the use-case
> here?
> >
> > Best,
> > Erick
> >
> >
> >
> > On Tue, Jul 10, 2018 at 5:36 AM, Anil  wrote:
> > > Hi,
> > >
> > > Good Morning.
> > >
> > > I am trying solr json facet features. sort, offset, limit fields are
> not
> > > working for Range facet.
> > >
> > > and could not find the support in the documentation. is there any way
> to
> > > achieve sort and pagination for Range facet ? please help.
> > >
> > > Documentation of range facet says -
> > >
> > > Parameters:
> > >
> > >- field – The numeric field or date field to produce range buckets
> > from
> > >- mincount – Minimum document count for the bucket to be included in
> > the
> > >response. Defaults to 0.
> > >- start – Lower bound of the ranges
> > >- end – Upper bound of the ranges
> > >- gap – Size of each range bucket produced
> > >- hardend – A boolean, which if true means that the last bucket will
> > end
> > >at “end” even if it is less than “gap” wide. If false, the last
> > bucket will
> > >be “gap” wide, which may extend past “end”.
> > >- other – This param indicates that in addition to the counts for
> each
> > >range constraint between facet.range.start and facet.range.end,
> counts
> > >should also be computed for…
> > >   - "before" all records with field values lower then lower bound
> of
> > >   the first range
> > >   - "after" all records with field values greater then the upper
> > bound
> > >   of the last range
> > >   - "between" all records with field values between the start and
> end
> > >   bounds of all ranges
> > >   - "none" compute none of this information
> > >   - "all" shortcut for before, between, and after
> > >- include – By default, the ranges used to compute range faceting
> > >between facet.range.start and facet.range.end are inclusive of their
> > lower
> > >bounds and exclusive of the upper bounds. The “before” range is
> > exclusive
> > >and the “after” range is inclusive. This default, equivalent to
> lower
> > >below, will not result in double counting at the boundaries. This
> > behavior
> > >can be modified by the facet.range.include param, which can be any
> > >combination of the following options…
> > >   - "lower" all gap based ranges include their lower bound
> > >   - "upper" all gap based ranges include their upper bound
> > >   - "edge" the first and last gap ranges include their edge bounds
> > (ie:
> > >   lower for the first one, upper for the last one) even if the
> > > corresponding
> > >   upper/lower option is not specified
> > >   - "outer" the “before” and “after” ranges will be inclusive of
> > their
> > >   bounds, even if the first or last ranges already include those
> > boundaries.
> > >   - "all" shorthand for lower, upper, edge, outer
> > >
> > >
> > >
> > >  Thanks,
> > > Anil
> >
>

Re: Solr7.3.1 Installation

2018-07-11 Thread Shawn Heisey


On 7/10/2018 11:20 PM, tapan1707 wrote:

We are trying to install solr-7.3.1 into our existing system (We have also
made some changes by adding one custom query parser).

I am having some build issues and it would be really helpful if someone can
help.

While running ant test(in the process of building the solr package), it
terminates because of failed tests.


This is a known problem.  Solr's tests are in not in a good state.  
Sometimes they pass, sometimes they fail.  Since there are so many tests 
and a fair number of them do fail intermittently, this creates a 
situation where on most test runs, there is at least one test failure.  
Run the tests enough times, and eventually they will all pass ... but 
this usually takes many runs.


Looking at the commands you're using in your script:  After a user has 
run the "ant ivy-bootstrap" command once, ivy is downloaded into the 
user's home directory and does not need to be downloaded again.  Only 
the "ant package" command (run in the "solr" subdirectory) is actually 
needed to build Solr.  The rest of the commands are not needed.


As Emir said, you don't need to build Solr at all, even when using 
custom plugins.  You can download and use the binary package.


There is effort underway to solve the problem with Solr tests. The 
initial phase of that effort is to disable the tests that fail most 
frequently.  The second overlapping phase of the effort is to actually 
fix those tests so that they don't fail - either by fixing bugs in the 
tests themselves, or by fixing real bugs in Solr.



Also, does ant version has any effects in build??


Ant 1.8 and 1.9 should work.  Versions 1.10.0, 1.10.1, as well as 1.10.3 
and later should be fine, but 1.10.2 has a bug that results in the 
lucene-solr build failing:


https://issues.apache.org/jira/browse/LUCENE-8189


At last, at present, we are using solr-6.4.2 which has zookeeper-3.4.6
dependency but for solr-7, the zookeeper dependency has been upgraded to
3.4.10, so my question is, At what extent does this might affect our system
performance? Can we use zookeeper-3.4.6 with solr-7?
(same with the jetty version)


You should be able to use any ZK 3.4.x server version with any version 
of Solr.  Most versions of Solr should also work with 3.5.x (still in 
beta) servers.  Early 4.x version s shipped with ZK 3.3.x, and the ZK 
project does not guarantee compatibility between 3.3.x and 3.5.x.


I can't guarantee that you won't run into bugs, but ZK is generally a 
very stable piece of software.  Each new release of ZK includes a very 
large list of bugfixes.  I have no idea what implications there are for 
performance.  You would need to ask a ZK support resource that 
question.  The latest stable release that is compatible with your 
software is the recommended version.  Currently that is 3.4.12.  The 
3.5.x releases are in beta.


Thanks,
Shawn

Re: Number of fields in a solrCloud collection config

2018-07-11 Thread Shawn Heisey


On 7/11/2018 2:05 AM, Sharif Shahriar wrote:

Is there any limitation on how many field can be added in a solrcloud
collection configset?
After adding 24,520 fields when I wan't to add new fields, it shows
-"Error persisting managed schema at /configs/*/managed-schema"
-"zkClient has disconnected"


There is no hard limit on the number of fields.  It is likely that you 
are running into something else.   Without detailed logs it's difficult 
to know for sure what the problem is, but I do have one idea:


The most likely limit that I can think of that you'd be running into is 
the limit on the size of an individual entry in the ZooKeeper database 
-- your schema file is getting too big.  By default this limit is right 
around one megabyte.  Changing it is possible, but it must be done on 
all ZK servers and every single ZK client (which includes Solr).


https://zookeeper.apache.org/doc/r3.4.12/zookeeperAdmin.html#Experimental+Options%2FFeatures

If that's not the problem, then we will need to see your logs to figure 
out what the problem might be.  To properly interpret the log, the exact 
Solr version must be known.


Thanks,
Shawn

Re: Hi, happy to join this solr party.

2018-07-11 Thread Steve Rowe

Welcome!

To subscribe, send an email to solr-user-subscr...@lucene.apache.org .

More info here: http://lucene.apache.org/solr/community.html#mailing-lists-irc

--
Steve
www.lucidworks.com

> On Jul 10, 2018, at 6:07 AM, zhenyuan wei  wrote:
> 
> I'd like to subscribe this maillist, thanks.

Re: Sum and aggregation on nested documents field

2018-07-11 Thread Mikhail Khludnev

  "to facet the results or to be able to filter on the score returned"
This is not clear, you need to clarify it to be replied.


On Wed, Jul 11, 2018 at 2:48 AM jeebix  wrote:

> Hello Mikhail,
>
> First thanks a lot for your answers which are very useful for me... Then, I
> tried the query with the '$' parameter, and I get some great result like
> this :
>
> "id":"693897",
> "asso_i":693897,
> "etat_technique_s":"avec_documents",
> "etat_marketing_s":"actif",
> "type_parent_s":"Société",
> "groupe_type_parent_s":"SOCIETE",
> "nombre_commandes_brut_i":121,
> "nombre_commandes_i":101,
> "nombre_kits_saveur_i":0,
> "ca_periode_i":60524,
> "ca_periode_fleur_i":0,
> "ca_periode_saveur_i":58148,
> "zone_scolaire_s":"B",
> "territoire_s":"France Métropolitaine",
> "region_s":"CENTRE VAL DE LOIRE",
> "departement_s":"45 LOIRET",
> "postal_country_s":"FR",
> "asso_country_s":"FRANCE",
> "object_type_s":"contact",
> "date_derni_re_commande_dt":"2016-04-21T00:00:00Z",
> "_version_":1605523881394177,
> *"score":308940.0*,
> "_childDocuments_":[
> {
>   "fixe_facturation":["0238756400"],
>   "object_type":["order"],
>   "TTC_i":29120,
>   "kit_sans_suite":["false"],
>   "fixe_livraison":["0238756400"],
>   "type_cde_s":"CDE",
>   "statut_s":"V",
>   "campagne_s":"A",
>   "date_dt":"2016-04-19T00:00:00Z",
>   "id":"A22058",
>   "enseigne_s":"SAV",
>   "gamme":["CHOCOLAT > Assortiment",
> "CHOCOLAT > Mono-produit",
> "EQUIPEMENT MAISON > Contenant pour liquide",
> "SAVEURS > Pâtisserie"]},
> {
>   "fixe_facturation":["0238765400"],
>   "object_type":["order"],
>   "TTC_i":429,
>   "kit_sans_suite":["false"],
>   "fixe_livraison":["0238756400"],
>   "type_cde_s":"CDE",
>   "statut_s":"V",
>   "campagne_s":"A",
>   "date_dt":"2016-04-21T00:00:00Z",
>   "id":"A22511",
>   "enseigne_s":"BRI",
>   "gamme":["SAVEURS > Pâtisserie"]}
>
> The query looks like that :
>
> /solr//select?chq=%2Bobject_type:order%20{!func}TTC_i=on=*,score,[child%20parentFilter=object_type_s:contact]=on={!parent%20which=object_type_s:contact%20score=total%20v=$chq}=json
>
> I also test the queries on the SOLR admin interface, and when I get what I
> expect, I work on the development of the IHM which provides facet counting
> and filtering for users.
>
> The total score of TTC is a great advanced for me, but I need now to facet
> the results or to be able to filter on the score returned.
> I already tried using frange in filter query : {!frange l=0 u=1
> inclusive=true}$chq
> I also tried : {!frange l=0 u=1 inclusive=true}TTC_i
> With inclusive = false... Without any result.
>
> I know I'm not so far, could you put me on the right way if you have an
> idea
> ?
>
> Thanks for your time,
>
> Best
>
> JB
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Sincerely yours
Mikhail Khludnev

solr config ganglia report class not found exception

2018-07-11 Thread zhenyuan wei

Hi all,

My solr version is  release7.3.1,  and I follow the solr 7.3.0 ref guide to
config  ganglia
reporter in  solr.xml as below:


  ..
  

  emr-header-1
  8649

  


 than start solr service and encounted the  execption like:

2018-07-11 17:47:31.246 ERROR (main) [   ] o.a.s.s.SolrDispatchFilter Could
not start Solr. Check solr/home property and the logs
2018-07-11 17:47:31.266 ERROR (main) [   ] o.a.s.c.SolrCore
null:java.lang.NoClassDefFoundError: org/acplt/oncrpc/XdrEncodingStream
at info.ganglia.gmetric4j.gmetric.GMetric.(GMetric.java:82)
at info.ganglia.gmetric4j.gmetric.GMetric.(GMetric.java:58)
at info.ganglia.gmetric4j.gmetric.GMetric.(GMetric.java:40)
at
org.apache.solr.metrics.reporters.SolrGangliaReporter.lambda$start$0(SolrGangliaReporter.java:106)
at
org.apache.solr.metrics.reporters.ReporterClientCache.getOrCreate(ReporterClientCache.java:59)
at
org.apache.solr.metrics.reporters.SolrGangliaReporter.start(SolrGangliaReporter.java:106)
at
org.apache.solr.metrics.reporters.SolrGangliaReporter.doInit(SolrGangliaReporter.java:85)
at
org.apache.solr.metrics.SolrMetricReporter.init(SolrMetricReporter.java:70)
at
org.apache.solr.metrics.SolrMetricManager.loadReporter(SolrMetricManager.java:881)
at
org.apache.solr.metrics.SolrMetricManager.loadReporters(SolrMetricManager.java:817)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:546)
at
org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:263)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:183)
at
org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:139)
at
org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:741)
at
org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:348)
at
org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1515)
at
org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1477)
at
org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:785)
at
org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:261)
at
org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:545)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at
org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:41)
at
org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:188)
at
org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:502)
at
org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:150)
at
org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:180)
at
org.eclipse.jetty.deploy.providers.WebAppProvider.fileAdded(WebAppProvider.java:453)
at
org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:64)
at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:610)
at
org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:529)
at org.eclipse.jetty.util.Scanner.scan(Scanner.java:392)
at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:313)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at
org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:150)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at
org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:564)
at
org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:239)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
   at
org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:133)
at org.eclipse.jetty.server.Server.start(Server.java:418)
at
org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:115)
at
org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:113)
at org.eclipse.jetty.server.Server.doStart(Server.java:385)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at
org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1584)
at
org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1508)
at java.security.AccessController.doPrivileged(Native Method)
at
org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1507)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

Number of fields in a solrCloud collection config

2018-07-11 Thread Sharif Shahriar

Is there any limitation on how many field can be added in a solrcloud
collection configset?
After adding 24,520 fields when I wan't to add new fields, it shows 
-"Error persisting managed schema at /configs/*/managed-schema"
-"zkClient has disconnected" 

Thank you,
Sharif



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr7.3.1 Installation

2018-07-11 Thread tapan1707

Hi Emir,

Thanks for your reply.

Here building Solr has nothing to do with custom query parser.
Our system has been designed in such a way that package is created by
running following commands (there are other commands too but not related to
solr, hence omitting those)

${ANT} -buildfile ${SOLR_DIR}/build.xml ivy-bootstrap
${ANT} -buildfile ${SOLR_DIR}/build.xml compile
${ANT} -buildfile ${SOLR_DIR}/build.xml test
${ANT} -buildfile ${SOLR_DIR}/solr/build.xml dist
${ANT} -buildfile ${SOLR_DIR}/solr/build.xml server
${ANT} -buildfile ${SOLR_DIR}/lucene/build.xml package

And while building solr package, all are failing at *ant test* command.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr7.3.1 Installation

2018-07-11 Thread Emir Arnautović

Hi,
Why are you building Solr? Because you added your custom query parser? If 
that’s the case, then it is not the way to do it. You should set up separate 
project for your query parser, build it and include jar in your Solr setup.
It is not query parser, but here is blog/code for simple update processor: 
https://www.od-bits.com/2018/02/solr-docvalues-on-analysed-field.html 


HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 11 Jul 2018, at 07:20, tapan1707  wrote:
> 
> We are trying to install solr-7.3.1 into our existing system (We have also
> made some changes by adding one custom query parser).
> 
> I am having some build issues and it would be really helpful if someone can
> help.
> 
> While running ant test(in the process of building the solr package), it
> terminates because of failed tests.
> At first time (build with ant-1.9)
> Tests with failures [seed: C2C0D761AEAAE8A4] (first 10 out of 23):
> 21:25:20[junit4]   -
> org.apache.solr.client.solrj.response.TestSuggesterResponse (suite)
> 21:25:20[junit4]   -
> org.apache.solr.client.solrj.response.TermsResponseTest (suite)
> 21:25:20[junit4]   - org.apache.solr.client.solrj.TestSolrJErrorHandling
> (suite)
> 21:25:20[junit4]   - org.apache.solr.client.solrj.GetByIdTest (suite)
> 21:25:20[junit4]   -
> org.apache.solr.client.solrj.response.TestSpellCheckResponse (suite)
> 21:25:20[junit4]   -
> org.apache.solr.client.solrj.embedded.LargeVolumeEmbeddedTest (suite)
> 21:25:20[junit4]   -
> org.apache.solr.client.solrj.embedded.JettyWebappTest.testAdminUI
> 21:25:20[junit4]   -
> org.apache.solr.client.solrj.embedded.SolrExampleStreamingBinaryTest (suite)
> 21:25:20[junit4]   - org.apache.solr.client.solrj.SolrExampleBinaryTest
> (suite)
> 21:25:20[junit4]   -
> org.apache.solr.client.solrj.embedded.LargeVolumeBinaryJettyTest (suite)
> 
> Running the same ant test command without doing any changes (build with
> ant-1.10)
> Tests with failures [seed: 7E004642A6008D89]:
> 11:30:57[junit4]   -
> org.apache.solr.cloud.MoveReplicaHDFSTest.testFailedMove  
> 
> Thirds time (build with ant 1.10)
> [junit4] Tests with failures [seed: EFD939D82A6EC707]:
> [junit4]   - org.apache.solr.cloud.autoscaling.SystemLogListenerTest.test
> 
> Even though I'm not making any changes, build is failing with different
> failed tests. Can anyone help me with this, I mean if there is any problem
> with the code then shouldn't it fail with same test cases?
> Also, all above-mentioned test cases work fine if I check them individually.
> (using ant test -Dtests.class=)
> 
> Also, does ant version has any effects in build??
> 
> At last, at present, we are using solr-6.4.2 which has zookeeper-3.4.6
> dependency but for solr-7, the zookeeper dependency has been upgraded to
> 3.4.10, so my question is, At what extent does this might affect our system
> performance? Can we use zookeeper-3.4.6 with solr-7?
> (same with the jetty version) 
> 
> Thanks in advance
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Solr facet division of 2 aggregation function

2018-07-11 Thread hossein nasr esfahani

I'm storing 2 type of data for each record in my Solr core. first one is
the total number of tasks in each day (total) and the second one is the
total number of *finished* tasks in each day (finished). I want to compute
how many percents of tasks was finished in each *month* using solr json
facet query. Something like this:

{
  finishedRate : {
type : range,
field : date,
gap : "+1MONTH"
facet: "div(sum(finished),sum(total))"
  }
}

but it said

org.apache.solr.search.SyntaxError: Unknown aggregation agg_div

I test the following query to solve SyntaxError, but it gives me the
maximum percentage of tasks that finished in a day for each month:

{
  finishedRate : {
type : range,
field : date,
gap : "+1MONTH"
facet: "max(div(sum(finished),sum(total)))"
  }
}

how can I implement this query?

(Just copy my question StackOverflow

 )

Hi, happy to join this solr party.

2018-07-11 Thread zhenyuan wei

I'd like to subscribe this maillist, thanks.

RE: RE: Inquiry

2018-07-11 Thread Farooq Amin

Hello, what is your MOQ and price per 1 item? can you make order 
exactly as same in the link below? We will be waiting your reply 
to proceed. Please reply me on my business email ASAP as we need 
an urgent order: farooq_a...@outlook.com

best regards,
Farooq Amin



Sent from my iPhone

Re: Sum and aggregation on nested documents field

2018-07-11 Thread jeebix

Hello Mikhail,

First thanks a lot for your answers which are very useful for me... Then, I
tried the query with the '$' parameter, and I get some great result like
this :

"id":"693897",
"asso_i":693897,
"etat_technique_s":"avec_documents",
"etat_marketing_s":"actif",
"type_parent_s":"Société",
"groupe_type_parent_s":"SOCIETE",
"nombre_commandes_brut_i":121,
"nombre_commandes_i":101,
"nombre_kits_saveur_i":0,
"ca_periode_i":60524,
"ca_periode_fleur_i":0,
"ca_periode_saveur_i":58148,
"zone_scolaire_s":"B",
"territoire_s":"France Métropolitaine",
"region_s":"CENTRE VAL DE LOIRE",
"departement_s":"45 LOIRET",
"postal_country_s":"FR",
"asso_country_s":"FRANCE",
"object_type_s":"contact",
"date_derni_re_commande_dt":"2016-04-21T00:00:00Z",
"_version_":1605523881394177,
*"score":308940.0*,
"_childDocuments_":[
{
  "fixe_facturation":["0238756400"],
  "object_type":["order"],
  "TTC_i":29120,
  "kit_sans_suite":["false"],
  "fixe_livraison":["0238756400"],
  "type_cde_s":"CDE",
  "statut_s":"V",
  "campagne_s":"A",
  "date_dt":"2016-04-19T00:00:00Z",
  "id":"A22058",
  "enseigne_s":"SAV",
  "gamme":["CHOCOLAT > Assortiment",
"CHOCOLAT > Mono-produit",
"EQUIPEMENT MAISON > Contenant pour liquide",
"SAVEURS > Pâtisserie"]},
{
  "fixe_facturation":["0238765400"],
  "object_type":["order"],
  "TTC_i":429,
  "kit_sans_suite":["false"],
  "fixe_livraison":["0238756400"],
  "type_cde_s":"CDE",
  "statut_s":"V",
  "campagne_s":"A",
  "date_dt":"2016-04-21T00:00:00Z",
  "id":"A22511",
  "enseigne_s":"BRI",
  "gamme":["SAVEURS > Pâtisserie"]}

The query looks like that :
/solr//select?chq=%2Bobject_type:order%20{!func}TTC_i=on=*,score,[child%20parentFilter=object_type_s:contact]=on={!parent%20which=object_type_s:contact%20score=total%20v=$chq}=json

I also test the queries on the SOLR admin interface, and when I get what I
expect, I work on the development of the IHM which provides facet counting
and filtering for users.

The total score of TTC is a great advanced for me, but I need now to facet
the results or to be able to filter on the score returned.
I already tried using frange in filter query : {!frange l=0 u=1
inclusive=true}$chq
I also tried : {!frange l=0 u=1 inclusive=true}TTC_i
With inclusive = false... Without any result.

I know I'm not so far, could you put me on the right way if you have an idea
?

Thanks for your time,

Best

JB




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr OpenNLP named entity extraction

2018-07-11 Thread Jerome Yang

Thanks a lot Steve!

On Wed, Jul 11, 2018 at 10:24 AM Steve Rowe  wrote:

> Hi Jerome,
>
> I was able to setup a configset to perform OpenNLP NER, loading the model
> files from local storage.
>
> There is a trick though[1]: the model files must be located *in a jar* or
> *in a subdirectory* under ${solr.solr.home}/lib/ or under a directory
> specified via a solrconfig.xml  directive.
>
> I tested with the bin/solr cloud example, and put model files under the
> two solr home directories, at example/cloud/node1/solr/lib/opennlp/ and
> example/cloud/node1/solr/lib/opennlp/.  The “opennlp/“ subdirectory is
> required, though its name can be anything else you choose.
>
> [1] As you noted, ZkSolrResourceLoader delegates to its parent classloader
> when it can’t find resources in a configset, and the parent classloader is
> set up to load from subdirectories and jar files under
> ${solr.solr.home}/lib/ or under a directory specified via a solrconfig.xml
>  directive.  These directories themselves are not included in the set
> of directories from which resources are loaded; only their children are.
>
> --
> Steve
> www.lucidworks.com
>
> > On Jul 9, 2018, at 10:10 PM, Jerome Yang  wrote:
> >
> > Hi Steve,
> >
> > Put models under " ${solr.solr.home}/lib/ " is not working.
> > I check the "ZkSolrResourceLoader" seems it will first try to find modes
> in
> > config set.
> > If not find, then it uses class loader to load from resources.
> >
> > Regards,
> > Jerome
> >
> > On Tue, Jul 10, 2018 at 9:58 AM Jerome Yang  wrote:
> >
> >> Thanks Steve!
> >>
> >>
> >> On Tue, Jul 10, 2018 at 5:20 AM Steve Rowe  wrote:
> >>
> >>> Hi Jerome,
> >>>
> >>> See the ref guide[1] for a writeup of how to enable uploading files
> >>> larger than 1MB into ZooKeeper.
> >>>
> >>> Local storage should also work - have you tried placing OpenNLP model
> >>> files in ${solr.solr.home}/lib/ ? - make sure you do the same on each
> node.
> >>>
> >>> [1]
> >>>
> https://lucene.apache.org/solr/guide/7_4/setting-up-an-external-zookeeper-ensemble.html#increasing-the-file-size-limit
> >>>
> >>> --
> >>> Steve
> >>> www.lucidworks.com
> >>>
>  On Jul 9, 2018, at 12:50 AM, Jerome Yang  wrote:
> 
>  Hi guys,
> 
>  In Solrcloud mode, where to put the OpenNLP models?
>  Upload to zookeeper?
>  As I test on solr 7.3.1, seems absolute path on local host is not
> >>> working.
>  And can not upload into zookeeper if the model size exceed 1M.
> 
>  Regards,
>  Jerome
> 
>  On Wed, Apr 18, 2018 at 9:54 AM Steve Rowe  wrote:
> 
> > Hi Alexey,
> >
> > First, thanks for moving the conversation to the mailing list.
> >>> Discussion
> > of usage problems should take place here rather than in JIRA.
> >
> > I locally set up Solr 7.3 similarly to you and was able to get things
> >>> to
> > work.
> >
> > Problems with your setup:
> >
> > 1. Your update chain is missing the Log and Run update processors at
> >>> the
> > end (I see these are missing from the example in the javadocs for the
> > OpenNLP NER update processor; I’ll fix that):
> >
> >
> >
> >
> >  The Log update processor isn’t strictly necessary, but, from <
> >
> >>>
> https://lucene.apache.org/solr/guide/7_3/update-request-processors.html#custom-update-request-processor-chain
> >> :
> >
> >  Do not forget to add RunUpdateProcessorFactory at the end of any
> >  chains you define in solrconfig.xml. Otherwise update requests
> >  processed by that chain will not actually affect the indexed
> >>> data.
> >
> > 2. Your example document is missing an “id” field.
> >
> > 3. For whatever reason, the pre-trained model "en-ner-person.bin"
> >>> doesn’t
> > extract anything from text “This is Steve Jobs 2”.  It will extract
> >>> “Steve
> > Jobs” from text “This is Steve Jobs in white” e.g. though.
> >
> > 4. (Not a problem necessarily) You may want to use a multi-valued
> >>> “string”
> > field for the “dest” field in your update chain, e.g. “people_str”
> >>> (“*_str”
> > in the default configset is so configured).
> >
> > --
> > Steve
> > www.lucidworks.com
> >
> >> On Apr 17, 2018, at 8:23 AM, Alexey Ponomarenko <
> >>> alex1989s...@gmail.com>
> > wrote:
> >>
> >> Hi once more I am trying to implement named entities extraction
> using
> > this
> >> manual
> >>
> >
> >>>
> https://lucene.apache.org/solr/7_3_0//solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html
> >>
> >> I am modified solrconfig.xml like this:
> >>
> >> 
> >>  > class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
> >>   opennlp/en-ner-person.bin
> >>   text_opennlp
> >>   description_en
> >>   content
> >> 
> >> 
> >>
> >> But when I was trying to add data using:

Sum and aggregation on nested documents field

2018-07-11 Thread jeebix

Hello everybody,

I have a question about how to retrieve results from SOLR with some
aggregation (like sum in my case...) on the nested documents.

First, the data SOLR returned with a standard query : 

{
"id":"3911.3912.1278",
"parent_i":3191,
"asso_i":3112,
"personne_i":16278,
"etat_technique_s":"avec_documents",
"etat_marketing_s":"actif",
"type_parent_s":"Ecole élémentaire publique",
"type_asso_s":"APE (association de parents d'élèves)",
"groupe_type_parent_s":"ENSEIGNEMENT_PRIMAIRE",
"groupe_type_asso_s":"ASSOCIATION_DE_PARENTS",
"nombre_commandes_brut_i":8,
"nombre_commandes_i":3,
"nombre_kits_saveur_i":4,
"ca_periode_i":2977,
"ca_periode_fleur_i":0,
"ca_periode_saveur_i":2977,
"zone_scolaire_s":"C",
"territoire_s":"France Métropolitaine",
"region_s":"LANGUEDOC-ROUSSILLON MIDI-PYRENEES",
"departement_s":"30 GARD",
"postal_country_s":"FR",
"asso_country_s":"FRANCE",
"object_type_s":"contact",
"kits_sans_suite_ss":["Initiatives Saveurs"],
"date_derni_re_commande_dt":"2017-11-14T00:00:00Z",
"_version_":1605492468379287553,
"_childDocuments_":[
{
  "fixe_facturation":["0465221792"],
  "object_type":["order"],
  "TTC_i":1200,
  "mobile_livraison":["0672655536"],
  "kit_sans_suite":["true"],
  "fixe_livraison":["0466421792"],
  "type_cde_s":"KIT",
  "statut_s":"V",
  "mobile_facturation":["0675255536"],
  "campagne_s":"A",
  "date_dt":"2016-01-24T00:00:00Z",
  "id":"A04520",
  "enseigne_s":"SAV",
  "gamme":["KITS > Kits Saveurs"]},
{
  "fixe_facturation":["0466521792"],
  "object_type":["order"],
  "TTC_i":15,
  "mobile_livraison":["0672655536"],
  "kit_sans_suite":["false"],
  "fixe_livraison":["0464221792"],
  "type_cde_s":"DOCUMENTATION",
  "statut_s":"V",
  "mobile_facturation":["0672655536"],
  "campagne_s":"B",
  "date_dt":"2016-09-29T00:00:00Z",
  "id":"B15755",
  "enseigne_s":"INI",
  "gamme":["CATALOGUES > Catalogues Brioche",
"CATALOGUES > Catalogues Fleurs et Nature"]},
{
  "fixe_facturation":["0465221792"],
  "object_type":["order"],
  "TTC_i":156,
  "mobile_livraison":["0672655536"],
  "kit_sans_suite":["false"],
  "fixe_livraison":["0466221492"],
  "type_cde_s":"KIT",
  "statut_s":"V",
  "mobile_facturation":["0672245536"],
  "campagne_s":"B",
  "date_dt":"2016-09-29T00:00:00Z",
  "id":"B15769",
  "enseigne_s":"SAV",
  "gamme":["KITS > Kits Saveurs"]}

My goal is to get with one SOLR query the sum of TTC_i by parent document...
I tried with facet.pivot, stats, group, with no results...

Thanks for your advices

Best
JB



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

44 matches

Mail list logo