ttl on merge-time possible somehow ?

2016-12-15 Thread Dorian Hoxha
Hello searchers,

I did some search for TTL on solr, and found only a way to do it with a
delete-query. But that ~sucks, because you have to do a lot of inserts (and
queries).

The other(kinda better) way to do it, is to set a collection-level ttl, and
when indexes are merged, they will drop the documents that have expired in
the new merged segment. On the client, I will make sure to do date-range
queries so I don't get back old documents.

So:
1. is there a way to easily modify the segment-merger (or better way?) to
do that ?
2. is there a way to support this also on get ? looks like I can use
realtimeget + filter query and it should work based on documentation

Thank You


Re: Solr has a CPU% spike when indexing a batch of data

2016-12-15 Thread forest_soup
Thanks a lot, Shawn.

We'll consider your suggestion to tune our solr servers. Will let you know
the result. 

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-has-a-CPU-spike-when-indexing-a-batch-of-data-tp4309529p4310002.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr MapReduce Indexer Tool is failing for empty core name.

2016-12-15 Thread Reth RM
The primary difference has been solr to solr-cloud in later version,
starting from solr4.0  And what happens if you try starting solr in stand
alone mode, solr cloud does not consider 'core' anymore, it considers
'collection' as param.


On Thu, Dec 15, 2016 at 11:05 PM, Manan Sheth 
wrote:

> Thanks Reth. As noted this is the same map reduce based indexer tool that
> comes shipped with the solr distribution by default.
>
> It only take the zk_host details and extracts all required information
> from there only. It does not have core specific configurations. The same
> tool released with solr 4.10 distro is working correctly, it seems to be
> some issue/ changes from solr 5 onwards. I have tested it for both solr 5.5
> & solr 6.2.1 and the behaviour remains same for both.
>
> Thanks,
> Manan Sheth
> 
> From: Reth RM 
> Sent: Friday, December 16, 2016 12:21 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr MapReduce Indexer Tool is failing for empty core name.
>
> It looks like command line tool that you are using to initiate index
> process,  is expecting some name to solr-core with respective command line
> param. use -help on the command line tool that you are using and check the
> solr-core-name parameter key, pass that also with some value.
>
>
> On Tue, Dec 13, 2016 at 5:44 AM, Manan Sheth 
> wrote:
>
> > Hi All,
> >
> >
> > While working on a migration project from Solr 4 to Solr 6, I need to
> > reindex my data using Solr map reduce Indexer tool in offline mode with
> > avro data.
> >
> > While executing the map reduce indexer tool shipped with solr 6.2.1, it
> is
> > throwing error of cannot create core with empty name value. The solr
> > instances are running fine with new indexed are being added and modified
> > correctly. Below is the command that was being fired:
> >
> >
> > hadoop --config /etc/hadoop/conf jar /home/impadmin/solr-6.2.1/
> dist/solr-map-reduce-*.jar
> > -D 'mapred.child.java.opts=-Xmx500m' \
> >-libjars `echo /home/impadmin/solr6lib/*.jar | sed 's/ /,/g'`
> > --morphline-file /home/impadmin/app_quotes_morphline_actual.conf \
> >--zk-host 172.26.45.71:9984 --output-dir hdfs://
> > impetus-i0056.impetus.co.in:8020/user/impadmin/
> > MapReduceIndexerTool/output5 \
> >--collection app.quotes --log4j src/test/resources/log4j.
> properties
> > --verbose \
> >  "hdfs://impetus-i0056.impetus.co.in:8020/user/impadmin/
> > MapReduceIndexerTool/5d63e0f8-afc1-483e-bd3f-d508c885d794-00"
> >
> >
> > Below is the complete snapshot of error trace:
> >
> >
> > Failed to initialize record writer for org.apache.solr.hadoop.
> > MapReduceIndexerTool/MorphlineMapper, attempt_1479795440861_0343_r_
> > 00_0
> > at org.apache.solr.hadoop.SolrRecordWriter.(
> > SolrRecordWriter.java:128)
> > at org.apache.solr.hadoop.SolrOutputFormat.getRecordWriter(
> > SolrOutputFormat.java:163)
> > at org.apache.hadoop.mapred.ReduceTask$
> NewTrackingRecordWriter.
> > (ReduceTask.java:540)
> > at org.apache.hadoop.mapred.ReduceTask.runNewReducer(
> > ReduceTask.java:614)
> > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
> > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:422)
> > at org.apache.hadoop.security.UserGroupInformation.doAs(
> > UserGroupInformation.java:1709)
> > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> > Caused by: org.apache.solr.common.SolrException: Cannot create core with
> > empty name value
> > at org.apache.solr.core.CoreDescriptor.checkPropertyIsNotEmpty(
> > CoreDescriptor.java:280)
> > at org.apache.solr.core.CoreDescriptor.(
> CoreDescriptor.java:191)
> > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:754)
> > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:742)
> > at org.apache.solr.hadoop.SolrRecordWriter.createEmbeddedSolrServer(
> > SolrRecordWriter.java:163)
> > at org.apache.solr.hadoop.SolrRecordWriter.(
> SolrRecordWriter.java:121)
> > ... 9 more
> >
> > Additional points to note:
> >
> >
> >   *   The solrconfig and schema files are copied as is from Solr 4.
> >   *   Once collection is deployed, user can perform all operations on the
> > collection without any issue.
> >   *   The indexation process is working fine with the same tool on Solr
> 4.
> >
> > Please help.
> >
> >
> > Thanks,
> >
> > Manan Sheth
> >
> > 
> >
> >
> >
> >
> >
> >
> > NOTE: This message may contain information that is confidential,
> > proprietary, privileged or otherwise protected by law. The message is
> > intended solely for the named addressee. If received in error, please
> > destroy and notify the sender. Any use of this email is prohibited 

Re: Solr - Amazon like search

2016-12-15 Thread Reth RM
There's a ecommerce features checklist with what solr can do listed here
https://lucidworks.com/blog/2011/01/25/implementing-the-ecommerce-checklist-with-apache-solr-and-lucidworks/

That should be good start and then there are some more other references
links listed below, I would try all of those features and check if there's
any other more specific feature that you are looking for and unable to find
how to incorporate that in solr, ask about it as new question.

link2

link3



On Wed, Dec 14, 2016 at 5:48 AM, Shawn Heisey  wrote:

> On 12/13/2016 10:55 PM, vasanth vijayaraj wrote:
> > We are building an e-commerce mobile app. I have implemented Solr search
> and autocomplete.
> > But we like the Amazon search and are trying to implement something like
> that. Attached a screenshot
> > of what has been implemented so far
> >
> > The search/suggest should sort list of products based on popularity,
> document hits and more.
> > How do we achieve this? Please help us out here.
>
> Your attachment didn't make it to the list.  They rarely do.  We can't
> see whatever it is you were trying to include.
>
> Sorting on things like popularity and hits requires putting that
> information into the index so that each document has fields that encode
> this information, allowing you to use Solr's standard sorting
> functionality with those fields.  You also need a process to update that
> information when there's a new hit.  It's possible, but you have to
> write this into your indexing system.
>
> Solr doesn't include special functionality for this.  It would be hard
> to generalize, and it can all be done without special functionality.
>
> Thanks,
> Shawn
>
>


Re: Solr MapReduce Indexer Tool is failing for empty core name.

2016-12-15 Thread Manan Sheth
Thanks Reth. As noted this is the same map reduce based indexer tool that comes 
shipped with the solr distribution by default.

It only take the zk_host details and extracts all required information from 
there only. It does not have core specific configurations. The same tool 
released with solr 4.10 distro is working correctly, it seems to be some issue/ 
changes from solr 5 onwards. I have tested it for both solr 5.5 & solr 6.2.1 
and the behaviour remains same for both.

Thanks,
Manan Sheth

From: Reth RM 
Sent: Friday, December 16, 2016 12:21 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr MapReduce Indexer Tool is failing for empty core name.

It looks like command line tool that you are using to initiate index
process,  is expecting some name to solr-core with respective command line
param. use -help on the command line tool that you are using and check the
solr-core-name parameter key, pass that also with some value.


On Tue, Dec 13, 2016 at 5:44 AM, Manan Sheth 
wrote:

> Hi All,
>
>
> While working on a migration project from Solr 4 to Solr 6, I need to
> reindex my data using Solr map reduce Indexer tool in offline mode with
> avro data.
>
> While executing the map reduce indexer tool shipped with solr 6.2.1, it is
> throwing error of cannot create core with empty name value. The solr
> instances are running fine with new indexed are being added and modified
> correctly. Below is the command that was being fired:
>
>
> hadoop --config /etc/hadoop/conf jar 
> /home/impadmin/solr-6.2.1/dist/solr-map-reduce-*.jar
> -D 'mapred.child.java.opts=-Xmx500m' \
>-libjars `echo /home/impadmin/solr6lib/*.jar | sed 's/ /,/g'`
> --morphline-file /home/impadmin/app_quotes_morphline_actual.conf \
>--zk-host 172.26.45.71:9984 --output-dir hdfs://
> impetus-i0056.impetus.co.in:8020/user/impadmin/
> MapReduceIndexerTool/output5 \
>--collection app.quotes --log4j src/test/resources/log4j.properties
> --verbose \
>  "hdfs://impetus-i0056.impetus.co.in:8020/user/impadmin/
> MapReduceIndexerTool/5d63e0f8-afc1-483e-bd3f-d508c885d794-00"
>
>
> Below is the complete snapshot of error trace:
>
>
> Failed to initialize record writer for org.apache.solr.hadoop.
> MapReduceIndexerTool/MorphlineMapper, attempt_1479795440861_0343_r_
> 00_0
> at org.apache.solr.hadoop.SolrRecordWriter.(
> SolrRecordWriter.java:128)
> at org.apache.solr.hadoop.SolrOutputFormat.getRecordWriter(
> SolrOutputFormat.java:163)
> at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.
> (ReduceTask.java:540)
> at org.apache.hadoop.mapred.ReduceTask.runNewReducer(
> ReduceTask.java:614)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1709)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> Caused by: org.apache.solr.common.SolrException: Cannot create core with
> empty name value
> at org.apache.solr.core.CoreDescriptor.checkPropertyIsNotEmpty(
> CoreDescriptor.java:280)
> at org.apache.solr.core.CoreDescriptor.(CoreDescriptor.java:191)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:754)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:742)
> at org.apache.solr.hadoop.SolrRecordWriter.createEmbeddedSolrServer(
> SolrRecordWriter.java:163)
> at 
> org.apache.solr.hadoop.SolrRecordWriter.(SolrRecordWriter.java:121)
> ... 9 more
>
> Additional points to note:
>
>
>   *   The solrconfig and schema files are copied as is from Solr 4.
>   *   Once collection is deployed, user can perform all operations on the
> collection without any issue.
>   *   The indexation process is working fine with the same tool on Solr 4.
>
> Please help.
>
>
> Thanks,
>
> Manan Sheth
>
> 
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is 

Re: (Newbie Help!) Seeking guidance in regards to Solr's suggestor and others

2016-12-15 Thread Reth RM
This issue is on solarium-client php code, which is likely not traversing
further to pick results from collation tag of solr response.
at line 190
https://github.com/solariumphp/solarium/blob/master/library/Solarium/QueryType/Suggester/Result/Result.php#L190
verify if this is issue and do pull request, solarium contributors might
fix it.





On Mon, Dec 12, 2016 at 5:23 PM, KV Medmeme  wrote:

> Hi Friends,
>
> I'm new to solr, been working on it for the past 2-3 months trying to
> really get my feet wet with it so that I can transition the current search
> engine at my current job to solr. (Eww sphinx  haha) anyway I need some
> help. I was running around the net getting my suggester working and im
> stuck and I need some help. This is what I have so far. (I will explain
> after I posted links to the config files)
>
> here is a link to my managed-schema.xml
> http://pastebin.com/MiEWwESP
>
> solr config.xml
> http://pastebin.com/fq2yxbvp
>
> I am currently using Solr 6.2.1, my issue is..
>
> I am trying to build a suggester that builds search terms or phrases based
> off of the index that is in memory. I was playing around with the analyzers
> and the tokenizers as well as reading some very old books that touch base
> on solr 4. And I came up with this set of tokenizers and analyzer chain.
> Please correct it if its wrong. But my index contains Medical Abstracts
> published by Doctors and terms that I would really need to search for are
> "brain cancer" , "anti-inflammatory" , "hiv-1" kinda see where im going
> with? So i need to sorta preserve the white space and some sort of hyphen
> delimiter. After I discovered that, (now here comes the fun part)
>
> I type in the url:
>
> http://localhost:8983/solr/AbstractSuggest/suggest/?spellcheck.build=true
>
> then after when its built I query,
>
> http://localhost:8983/solr/AbstractSuggest/suggest/?
> spellcheck.q=suggest_field:%22anti-infl%22
>
> Which is perfectly fine It works great. I can see the collations so that In
> my dropdown search bar for when clients search these medical articles they
> can see these terms. Now In regards to PHP (solarium api to talk to solr)
> now. Since this is a website and I intend on making an AJAX call to php  I
> cannot see the collation list. Solarium fails on hyphenated terms as well
> as fails on building the collations list. For example if I would type in
>
> "brain canc" ( i want to search brain cancer)
>
> It auto suggests brain , then cancer but in collations nothing is shown. If
> I would to send this to the URL (localhost url, which will soon change when
> moved to prod enviornment) i can see the collations. A screenshot is here..
>
> brain can (url) -> https://gyazo.com/30a9d11e4b9b73b0768a12d342223dc3
>
> bran canc(solarium) -> https://gyazo.com/507b02e50d0e39d7daa96655dff83c76
> php code ->https://gyazo.com/1d2b8c90013784d7cde5301769cd230c
>
> So here is where I am. The ideal goal is to have the PHP api produce the
> same results just like the URL so when users type into a search bar they
> can see the collations.
>
>  Can someone please help? Im looking towards the community as the savior to
> all my problems. I want to learn about solr at the same time so if future
> problems popup I can solve them accordingly.
>
> Thanks!
> Happy Holidays
> Kevin.
>


Re: error diagnosis help.

2016-12-15 Thread Reth RM
Are you indexing xml files through nutch? This exception purely looks like
processing of in-correct format xml file.

On Mon, Dec 12, 2016 at 11:53 AM, KRIS MUSSHORN 
wrote:

> ive scoured my nutch and solr config files and I cant find any cause.
> suggestions?
> Monday, December 12, 2016 2:37:13 PMERROR   null
> RequestHandlerBase  org.apache.solr.common.SolrException: Unexpected
> character '&' (code 38) in epilog; expected '<'
> org.apache.solr.common.SolrException: Unexpected character '&' (code 38)
> in epilog; expected '<'
>  at [row,col {unknown-source}]: [1,36]
> at org.apache.solr.handler.loader.XMLLoader.load(
> XMLLoader.java:180)
> at org.apache.solr.handler.UpdateRequestHandler$1.load(
> UpdateRequestHandler.java:95)
> at org.apache.solr.handler.ContentStreamHandlerBase.
> handleRequestBody(ContentStreamHandlerBase.java:70)
> at org.apache.solr.handler.RequestHandlerBase.handleRequest(
> RequestHandlerBase.java:156)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2073)
> at org.apache.solr.servlet.HttpSolrCall.execute(
> HttpSolrCall.java:658)
> at org.apache.solr.servlet.HttpSolrCall.call(
> HttpSolrCall.java:457)
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:223)
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:181)
> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1652)
> at org.eclipse.jetty.servlet.ServletHandler.doHandle(
> ServletHandler.java:585)
> at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:143)
> at org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:577)
> at org.eclipse.jetty.server.session.SessionHandler.
> doHandle(SessionHandler.java:223)
> at org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1127)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(
> ServletHandler.java:515)
> at org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:185)
> at org.eclipse.jetty.server.handler.ContextHandler.
> doScope(ContextHandler.java:1061)
> at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:141)
> at org.eclipse.jetty.server.handler.ContextHandlerCollection.
> handle(ContextHandlerCollection.java:215)
> at org.eclipse.jetty.server.handler.HandlerCollection.
> handle(HandlerCollection.java:110)
> at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:97)
> at org.eclipse.jetty.server.Server.handle(Server.java:499)
> at org.eclipse.jetty.server.HttpChannel.handle(
> HttpChannel.java:310)
> at org.eclipse.jetty.server.HttpConnection.onFillable(
> HttpConnection.java:257)
> at org.eclipse.jetty.io.AbstractConnection$2.run(
> AbstractConnection.java:540)
> at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> QueuedThreadPool.java:635)
> at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(
> QueuedThreadPool.java:555)
> at java.lang.Thread.run(Thread.java:745)
>
>


Re: Solr MapReduce Indexer Tool is failing for empty core name.

2016-12-15 Thread Reth RM
It looks like command line tool that you are using to initiate index
process,  is expecting some name to solr-core with respective command line
param. use -help on the command line tool that you are using and check the
solr-core-name parameter key, pass that also with some value.


On Tue, Dec 13, 2016 at 5:44 AM, Manan Sheth 
wrote:

> Hi All,
>
>
> While working on a migration project from Solr 4 to Solr 6, I need to
> reindex my data using Solr map reduce Indexer tool in offline mode with
> avro data.
>
> While executing the map reduce indexer tool shipped with solr 6.2.1, it is
> throwing error of cannot create core with empty name value. The solr
> instances are running fine with new indexed are being added and modified
> correctly. Below is the command that was being fired:
>
>
> hadoop --config /etc/hadoop/conf jar 
> /home/impadmin/solr-6.2.1/dist/solr-map-reduce-*.jar
> -D 'mapred.child.java.opts=-Xmx500m' \
>-libjars `echo /home/impadmin/solr6lib/*.jar | sed 's/ /,/g'`
> --morphline-file /home/impadmin/app_quotes_morphline_actual.conf \
>--zk-host 172.26.45.71:9984 --output-dir hdfs://
> impetus-i0056.impetus.co.in:8020/user/impadmin/
> MapReduceIndexerTool/output5 \
>--collection app.quotes --log4j src/test/resources/log4j.properties
> --verbose \
>  "hdfs://impetus-i0056.impetus.co.in:8020/user/impadmin/
> MapReduceIndexerTool/5d63e0f8-afc1-483e-bd3f-d508c885d794-00"
>
>
> Below is the complete snapshot of error trace:
>
>
> Failed to initialize record writer for org.apache.solr.hadoop.
> MapReduceIndexerTool/MorphlineMapper, attempt_1479795440861_0343_r_
> 00_0
> at org.apache.solr.hadoop.SolrRecordWriter.(
> SolrRecordWriter.java:128)
> at org.apache.solr.hadoop.SolrOutputFormat.getRecordWriter(
> SolrOutputFormat.java:163)
> at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.
> (ReduceTask.java:540)
> at org.apache.hadoop.mapred.ReduceTask.runNewReducer(
> ReduceTask.java:614)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1709)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> Caused by: org.apache.solr.common.SolrException: Cannot create core with
> empty name value
> at org.apache.solr.core.CoreDescriptor.checkPropertyIsNotEmpty(
> CoreDescriptor.java:280)
> at org.apache.solr.core.CoreDescriptor.(CoreDescriptor.java:191)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:754)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:742)
> at org.apache.solr.hadoop.SolrRecordWriter.createEmbeddedSolrServer(
> SolrRecordWriter.java:163)
> at 
> org.apache.solr.hadoop.SolrRecordWriter.(SolrRecordWriter.java:121)
> ... 9 more
>
> Additional points to note:
>
>
>   *   The solrconfig and schema files are copied as is from Solr 4.
>   *   Once collection is deployed, user can perform all operations on the
> collection without any issue.
>   *   The indexation process is working fine with the same tool on Solr 4.
>
> Please help.
>
>
> Thanks,
>
> Manan Sheth
>
> 
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>


Re: Solr on HDFS: increase in query time with increase in data

2016-12-15 Thread Reth RM
I think the shard index size is huge and should be split.

On Wed, Dec 14, 2016 at 10:58 AM, Chetas Joshi 
wrote:

> Hi everyone,
>
> I am running Solr 5.5.0 on HDFS. It is a solrCloud of 50 nodes and I have
> the following config.
> maxShardsperNode: 1
> replicationFactor: 1
>
> I have been ingesting data into Solr for the last 3 months. With increase
> in data, I am observing increase in the query time. Currently the size of
> my indices is 70 GB per shard (i.e. per node).
>
> I am using cursor approach (/export handler) using SolrJ client to get back
> results from Solr. All the fields I am querying on and all the fields that
> I get back from Solr are indexed and have docValues enabled as well. What
> could be the reason behind increase in query time?
>
> Has this got something to do with the OS disk cache that is used for
> loading the Solr indices? When a query is fired, will Solr wait for all
> (70GB) of disk cache being available so that it can load the index file?
>
> Thnaks!
>


Re: CharacterUtils is removed from lucene-analyzers-common >6.1

2016-12-15 Thread Xie, Sean
Thanks for pointing out the java.lang.Character. I did find the existence of 
org.apache.lucene.analysis.CharacterUtils, but I was not able to find the 
needed methods in it.

Sean


On 12/15/16, 8:58 PM, "Shawn Heisey"  wrote:

On 12/15/2016 6:20 PM, Xie, Sean wrote:
> We have implemented some customized filter/tokenizer, that is using
> org.apache.lucene.analysis.util.CharacterUtils. After upgrading to
> Solr 6.3, the class is no longer available. Is there any reason the
> utility class is removed? 

This is not really a good question for this list.  It would be more at
home on the java-user mailing list for Lucene.

With a little bit of research, I was able to determine that this class
moved.  It is now:

org.apache.lucene.analysis.CharacterUtils

Some of the functionality that used to be in the old CharacterUtils is
available in java.lang.Character -- part of Java itself.

Thanks,
Shawn




Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.


Re: Stemming with SOLR

2016-12-15 Thread Alexandre Rafalovitch
If you need the full fidelity solution taking care of multiple
edge-cases, it could be worth looking at commercial solutions.


http://www.basistech.com/ has one, including a free-level SAAS plan.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 15 December 2016 at 21:28, Lasitha Wattaladeniya  wrote:
> Hi all,
>
> Thanks for the replies,
>
> @eric, ahmet : since those stemmers are logical stemmers it won't work on
> words such as caught, ran and so on. So in our case it won't work
>
> @susheel : Yes I thought about it but problems we have is, the documents we
> index are some what large text, so copy fielding these into duplicate
> fields will affect on the index time ( we have jobs to index data
> periodically) and query time. I wonder why there isn't a correct solution
> to this
>
> Regards,
> Lasitha
>
> Lasitha Wattaladeniya
> Software Engineer
>
> Mobile : +6593896893
> Blog : techreadme.blogspot.com
>
> On Fri, Dec 16, 2016 at 12:58 AM, Susheel Kumar 
> wrote:
>
>> We did extensive comparison in the past for Snowball, KStem and Hunspell
>> and there are cases where one of them works better but not other or
>> vice-versa. You may utilise all three of them by having 3 different fields
>> (fieldTypes) and during query, search in all of them.
>>
>> For some of the cases where none of them works (e.g wolves, wolf etc)., use
>> StemOverriderFactory.
>>
>> HTH.
>>
>> Thanks,
>> Susheel
>>
>> On Thu, Dec 15, 2016 at 11:32 AM, Ahmet Arslan 
>> wrote:
>>
>> > Hi,
>> >
>> > KStemFilter returns legitimate English words, please use it.
>> >
>> > Ahmet
>> >
>> >
>> >
>> > On Thursday, December 15, 2016 6:17 PM, Lasitha Wattaladeniya <
>> > watt...@gmail.com> wrote:
>> > Hello devs,
>> >
>> > I'm trying to develop this indexing and querying flow where it converts
>> the
>> > words to its original form (lemmatization). I was doing bit of research
>> > lately but the information on the internet is very limited. I tried using
>> > hunspellfactory but it doesn't convert the word to it's original form,
>> > instead it gives suggestions for some words (hunspell works for some
>> > english words correctly but for some it gives multiple suggestions or no
>> > suggestions, i used the en_us.dic provided by openoffice)
>> >
>> > I know this is a generic problem in searching, so is there anyone who can
>> > point me to correct direction or some information :)
>> >
>> > Best regards,
>> > Lasitha Wattaladeniya
>> > Software Engineer
>> >
>> > Mobile : +6593896893
>> > Blog : techreadme.blogspot.com
>> >
>>


Re: Stemming with SOLR

2016-12-15 Thread Lasitha Wattaladeniya
Hi all,

Thanks for the replies,

@eric, ahmet : since those stemmers are logical stemmers it won't work on
words such as caught, ran and so on. So in our case it won't work

@susheel : Yes I thought about it but problems we have is, the documents we
index are some what large text, so copy fielding these into duplicate
fields will affect on the index time ( we have jobs to index data
periodically) and query time. I wonder why there isn't a correct solution
to this

Regards,
Lasitha

Lasitha Wattaladeniya
Software Engineer

Mobile : +6593896893
Blog : techreadme.blogspot.com

On Fri, Dec 16, 2016 at 12:58 AM, Susheel Kumar 
wrote:

> We did extensive comparison in the past for Snowball, KStem and Hunspell
> and there are cases where one of them works better but not other or
> vice-versa. You may utilise all three of them by having 3 different fields
> (fieldTypes) and during query, search in all of them.
>
> For some of the cases where none of them works (e.g wolves, wolf etc)., use
> StemOverriderFactory.
>
> HTH.
>
> Thanks,
> Susheel
>
> On Thu, Dec 15, 2016 at 11:32 AM, Ahmet Arslan 
> wrote:
>
> > Hi,
> >
> > KStemFilter returns legitimate English words, please use it.
> >
> > Ahmet
> >
> >
> >
> > On Thursday, December 15, 2016 6:17 PM, Lasitha Wattaladeniya <
> > watt...@gmail.com> wrote:
> > Hello devs,
> >
> > I'm trying to develop this indexing and querying flow where it converts
> the
> > words to its original form (lemmatization). I was doing bit of research
> > lately but the information on the internet is very limited. I tried using
> > hunspellfactory but it doesn't convert the word to it's original form,
> > instead it gives suggestions for some words (hunspell works for some
> > english words correctly but for some it gives multiple suggestions or no
> > suggestions, i used the en_us.dic provided by openoffice)
> >
> > I know this is a generic problem in searching, so is there anyone who can
> > point me to correct direction or some information :)
> >
> > Best regards,
> > Lasitha Wattaladeniya
> > Software Engineer
> >
> > Mobile : +6593896893
> > Blog : techreadme.blogspot.com
> >
>


Re: CharacterUtils is removed from lucene-analyzers-common >6.1

2016-12-15 Thread Shawn Heisey
On 12/15/2016 6:20 PM, Xie, Sean wrote:
> We have implemented some customized filter/tokenizer, that is using
> org.apache.lucene.analysis.util.CharacterUtils. After upgrading to
> Solr 6.3, the class is no longer available. Is there any reason the
> utility class is removed? 

This is not really a good question for this list.  It would be more at
home on the java-user mailing list for Lucene.

With a little bit of research, I was able to determine that this class
moved.  It is now:

org.apache.lucene.analysis.CharacterUtils

Some of the functionality that used to be in the old CharacterUtils is
available in java.lang.Character -- part of Java itself.

Thanks,
Shawn



CharacterUtils is removed from lucene-analyzers-common >6.1

2016-12-15 Thread Xie, Sean
Dear user group,

We have implemented some customized filter/tokenizer, that is using 
org.apache.lucene.analysis.util.CharacterUtils. After upgrading to Solr 6.3, 
the class is no longer available. Is there any reason the utility class is 
removed?

What I had to do is copy the class implementation our class lib as a 
workaround. Any other way to deal with it?

Thanks
Sean

Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.


Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-15 Thread Erick Erickson
bq: shouldn't the two replicas have the same number of deletions

Not necessarily. We're back to the fact that commits on the replicas in
a single shard fire at different wall clock times. Plus, when segments
are merged, the deleted docs are purged. So it's quite common that
two replicas in the same shard do _not_ have the same deleted doc
count and will also have different maxDoc counts.

The fact that they aren't showing the same numDocs is the only part
of this that "shouldn't be happening"...

Best,
Erick

On Thu, Dec 15, 2016 at 11:41 AM, Webster Homer  wrote:
> Something I hadn't know until now. The source cdcr collection has 2 shards
> with 1 replica, our target cloud has 2 shards with 2 replicas
> Both Source and Target have indexes that are not current
>
> Also we have set all of our collections to ignore external commits
>
> On Thu, Dec 15, 2016 at 1:31 PM, Webster Homer 
> wrote:
>
>> Looking through our replicas I noticed that in one of our shards (each
>> shard has 2 replicas)
>> 1 replica shows:
>> "replicas": [
>>
>> {
>> "name": "core_node1",
>> "core": "sial-catalog-material_shard2_replica2",
>> "baseUrl": "http://ae1b-ecom-msc04:8983/solr;,
>> "nodeName": "ae1b-ecom-msc04:8983_solr",
>> "state": "active",
>> "leader": false,
>> "index":
>> {
>> "numDocs": 487123,
>> "maxDocs": 711973,
>> *"deletedDocs": 224850,*
>> "size": "331.96 MB",
>> "lastModified": "2016-12-08T11:10:05.969Z",
>> "current": false,
>> "version": 17933,
>> "segmentCount": 17
>> }
>> }
>> ,
>> while the second replica shows this:
>>
>> {
>> "name": "core_node3",
>> "core": "sial-catalog-material_shard2_replica1",
>> "baseUrl": "http://ae1b-ecom-msc02:8983/solr;,
>> "nodeName": "ae1b-ecom-msc02:8983_solr",
>> "state": "active",
>> "leader": true,
>> "index":
>> {
>> "numDocs": 487063,
>> "maxDocs": 487064,
>> "deletedDocs": 1,
>> "size": "224.83 MB",
>> "lastModified": "2016-12-08T11:10:02.625Z",
>> "current": false,
>> "version": 8208,
>> "segmentCount": 19
>> }
>> }
>> ],
>> I wrote a routine that uses the Collections API Info call and then for
>> each replica calls the Core API to get the information on the index
>>
>> shouldn't the two replicas have the same number of deletions?
>>
>> On Thu, Dec 15, 2016 at 12:36 PM, Webster Homer 
>> wrote:
>>
>>> I am trying to find the reported inconsistencies now.
>>>
>>> The timestamp I have was created by our ETL process, which may not be in
>>> exactly the same order as the indexing occurred
>>>
>>> When I tried to sort the results by _docid_ desc, solr through a 500
>>> error:
>>> { "responseHeader":{ "zkConnected":true, "status":500, "QTime":7, "params
>>> ":{ "q":"*:*", "indent":"on", "fl":"record_spec,s_id,pid,search_concat_pno,
>>> search_pno, search_user_term, search_lform, search_eform, search_acronym,
>>> search_synonyms, root_name, search_s_pri_name, search_p_pri_name,
>>> search_keywords, lookahead_search_terms, sortkey, search_rtecs,
>>> search_chem_comp, cas_number, search_component_cas, search_beilstein,
>>> search_color_idx, search_ecnumber, search_femanumber, search_isbn,
>>> search_mdl_number, search_descriptions, page_title,
>>> search_xref_comparable_pno, search_xref_comparable_sku,
>>> search_xref_equivalent_pno, search_xref_exact_pno create_date
>>> search_xref_exact_sku, score", "sort":"_docid_ desc", "rows":"20", "wt":
>>> "json", "_":"1481821047026"}}, "error":{ "msg":"Index: 1, Size: 0", "
>>> trace":"java.lang.IndexOutOfBoundsException: Index: 1, Size: 0\n\tat
>>> java.util.ArrayList.rangeCheck(ArrayList.java:653)\n\tat
>>> java.util.ArrayList.get(ArrayList.java:429)\n\tat
>>> org.apache.solr.common.util.NamedList.getVal(NamedList.java:174)\n\tat
>>> org.apache.solr.handler.component.ShardFieldSortedHitQueue$S
>>> hardComparator.sortVal(ShardFieldSortedHitQueue.java:146)\n\tat
>>> org.apache.solr.handler.component.ShardFieldSortedHitQueue$
>>> 1.compare(ShardFieldSortedHitQueue.java:167)\n\tat
>>> org.apache.solr.handler.component.ShardFieldSortedHitQueue$
>>> 1.compare(ShardFieldSortedHitQueue.java:159)\n\tat
>>> org.apache.solr.handler.component.ShardFieldSortedHitQueue.l
>>> essThan(ShardFieldSortedHitQueue.java:91)\n\tat
>>> org.apache.solr.handler.component.ShardFieldSortedHitQueue.l
>>> essThan(ShardFieldSortedHitQueue.java:33)\n\tat
>>> org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:158)\n\tat
>>> org.apache.solr.handler.component.QueryComponent.mergeIds(
>>> QueryComponent.java:1098)\n\tat org.apache.solr.handler.compon
>>> ent.QueryComponent.handleRegularResponses(QueryComponent.java:758)\n\tat
>>> org.apache.solr.handler.component.QueryComponent.handleRespo
>>> nses(QueryComponent.java:737)\n\tat org.apache.solr.handler.compon
>>> ent.SearchHandler.handleRequestBody(SearchHandler.java:428)\n\tat
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(Req
>>> uestHandlerBase.java:154)\n\tat org.apache.solr.core.SolrCore.
>>> 

Re: Uncaught exception java.lang.StackOverflowError in 6.3.0

2016-12-15 Thread Erick Erickson
Right, so if I'm doing the math right you have 2,400 replicas per JVM?
I'm not clear whether each node has a single JVM or not.

Anyway. 2048 is indeed much too high. If nothing else, dropping it to,
say, 64 would show whether this was the real root of your problem or not.
Even if it slowed startup unacceptably, it would show you that this was,
indeed, the problem.

Is this a multi-tenant situation? I'm trying to understand why you need
so many cores. Having 1,200 collections each with 12 shards seems like
massive over-sharding. How many docs exist in each core? I'm
wondering if you've backed yourself into a corner by unnecessary sharding.
If you could, say, reduce your shards per collection to 2 (or even one?) you
might get out of this bind cheaply.

I regularly see 50M docs on a single shard give very good performance
FWIW.

Best,
Erick

On Thu, Dec 15, 2016 at 11:55 AM, Yago Riveiro  wrote:
> Yes, I changed the value of coreLoadThreads.
>
> With the default value a node takes like 40 minutes to be available with all 
> replicas up.
>
> Right now I have ~1.2K collections with 12 shards each, 2 replicas spread in 
> 12 nodes. Indeed the value I configured maybe is too much (2048) but I can 
> start nodes in 10 minutes.
>
> I need to review the value to something more conservative maybe.
>
> --
>
> /Yago Riveiro
>
> On 15 Dec 2016, 16:43 +, Erick Erickson , wrote:
>> Hmmm, have you changed coreLoadThreads? We had a problem with this a
>> while back with loading lots and lots of cores, see:
>> https://issues.apache.org/jira/browse/SOLR-7280
>>
>> But that was fixed in 6.2, so unless you changed the number of threads
>> used to load cores it shouldn't be a problem on 6.3...
>>
>> The symptom was also that replicas would never change to "active",
>> they'd be stuck in ercovery or down.
>>
>> Best,
>> Erick
>>
>> On Thu, Dec 15, 2016 at 3:07 AM, Yago Riveiro  wrote:
>> > Hi,
>> >
>> > I'm getting this error in my log
>> >
>> > 12/15/2016, 9:28:18 AM ERROR true ExecutorUtil Uncaught exception
>> > java.lang.StackOverflowError thrown by thread:
>> > coreZkRegister-1-thread-48-processing-n:XXX.XXX.XXX.XXX:8983_solr
>> > x:collection1_shard3_replica2 s:shard3 c:collection1-visitors r:core_node5
>> > java.lang.Exception: Submitter stack trace
>> > at
>> > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:204)
>> > at org.apache.solr.core.ZkContainer.registerInZk(ZkContainer.java:204)
>> > at org.apache.solr.core.CoreContainer.lambda$load$0(CoreContainer.java:505)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> > at
>> > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>> > at
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> > at
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> > at java.lang.Thread.run(Thread.java:745)
>> >
>> >
>> >
>> > -
>> > Best regards
>> > --
>> > View this message in context: 
>> > http://lucene.472066.n3.nabble.com/Uncaught-exception-java-lang-StackOverflowError-in-6-3-0-tp4309849.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.


Re: Checking Optimal Values for BM25

2016-12-15 Thread Sascha Szott

Hi Furkan,

in order to change the BM25 parameter values k1 and b, the following XML 
snippet needs to be added in your schema.xml configuration file:



  1.3
  0.7


It is even possible to specify the SimilarityFactory on individual index 
fields. See [1] for more details.


Best
Sascha

[1] https://wiki.apache.org/solr/SchemaXml#Similarity


Am 15.12.2016 um 14:58 schrieb Furkan KAMACI:

Hi,

Sole's default similarity is BM25 anymore. Its parameters are defined as

k1=1.2, b=0.75

as default. However is there any way that to check the effect of using
different coefficients to calculate BM25 to find the optimal values?

Kind Regards,
Furkan KAMACI



Re: field length within BM25 score calculation in Solr 6.3

2016-12-15 Thread Sascha Szott

Hi,

bumping my question after 10 days. Any clarification is appreciated.

Best
Sascha



Hi folks,

my Solr index consists of one document with a single valued field "title" of type 
"text_general". The title field was index with the content: 1 2 3 4 5 6 7 8 9. The field 
type text_general uses a StandardTokenizer which should result in 9 tokens. The corresponding 
length of field title in the given document is 9.

The field type is defined as follows:

   
 
   
   
   
 
 
   
   
   
   
 
   


I’ve checked that none of the nine tokens (1, 2, …, 9) is a stop word.

As expected, the query title:1 returns the given document. The BM25 score of 
the document for the given query is 0.272.

But why does Solr 6.3 states that the length of field title is 10.24?

0.27233246 = weight(title_alt:1 in 0) [SchemaSimilarity], result of:
   0.27233246 = score(doc=0,freq=1.0 = termFreq=1.0), product of:
 0.2876821 = idf(docFreq=1, docCount=1)
 0.94664377 = tfNorm, computed from:
   1.0 = termFreq=1.0
   1.2 = parameter k1
   0.75 = parameter b
   9.0 = avgFieldLength
   10.24 = fieldLength

In contrast, the value of avgFieldLength is correct.

The same observation can be made if the index consists of two simple documents:

doc1: title = 1 2 3 4
doc2: title = 1 2 3 4 5 6 7 8

The BM25 score calculation of doc2 is explained as:

0.14143422 = weight(title_alt:1 in 1) [SchemaSimilarity], result of:
   0.14143422 = score(doc=1,freq=1.0 = termFreq=1.0), product of:
 0.18232156 = idf(docFreq=2, docCount=2)
 0.7757405 = tfNorm, computed from:
   1.0 = termFreq=1.0
   1.2 = parameter k1
   0.75 = parameter b
   6.0 = avgFieldLength
   10.24 = fieldLength

The value of fieldLength does not match 8.

Is there same "magic“ applied to the value of field length that goes beyond the 
standard BM25 score formula?

If so, what is the idea behind this modification. If not, is this a Lucene / 
Solr bug?

Best regards,
Sascha







--
Sascha Szott :: KOBV/ZIB :: +49 30 84185-457


Re: Uncaught exception java.lang.StackOverflowError in 6.3.0

2016-12-15 Thread Yago Riveiro
Yes, I changed the value of coreLoadThreads.

With the default value a node takes like 40 minutes to be available with all 
replicas up.

Right now I have ~1.2K collections with 12 shards each, 2 replicas spread in 12 
nodes. Indeed the value I configured maybe is too much (2048) but I can start 
nodes in 10 minutes.

I need to review the value to something more conservative maybe.

--

/Yago Riveiro

On 15 Dec 2016, 16:43 +, Erick Erickson , wrote:
> Hmmm, have you changed coreLoadThreads? We had a problem with this a
> while back with loading lots and lots of cores, see:
> https://issues.apache.org/jira/browse/SOLR-7280
>
> But that was fixed in 6.2, so unless you changed the number of threads
> used to load cores it shouldn't be a problem on 6.3...
>
> The symptom was also that replicas would never change to "active",
> they'd be stuck in ercovery or down.
>
> Best,
> Erick
>
> On Thu, Dec 15, 2016 at 3:07 AM, Yago Riveiro  wrote:
> > Hi,
> >
> > I'm getting this error in my log
> >
> > 12/15/2016, 9:28:18 AM ERROR true ExecutorUtil Uncaught exception
> > java.lang.StackOverflowError thrown by thread:
> > coreZkRegister-1-thread-48-processing-n:XXX.XXX.XXX.XXX:8983_solr
> > x:collection1_shard3_replica2 s:shard3 c:collection1-visitors r:core_node5
> > java.lang.Exception: Submitter stack trace
> > at
> > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:204)
> > at org.apache.solr.core.ZkContainer.registerInZk(ZkContainer.java:204)
> > at org.apache.solr.core.CoreContainer.lambda$load$0(CoreContainer.java:505)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > at
> > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> > at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> > at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> > at java.lang.Thread.run(Thread.java:745)
> >
> >
> >
> > -
> > Best regards
> > --
> > View this message in context: 
> > http://lucene.472066.n3.nabble.com/Uncaught-exception-java-lang-StackOverflowError-in-6-3-0-tp4309849.html
> > Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-15 Thread Webster Homer
Something I hadn't know until now. The source cdcr collection has 2 shards
with 1 replica, our target cloud has 2 shards with 2 replicas
Both Source and Target have indexes that are not current

Also we have set all of our collections to ignore external commits

On Thu, Dec 15, 2016 at 1:31 PM, Webster Homer 
wrote:

> Looking through our replicas I noticed that in one of our shards (each
> shard has 2 replicas)
> 1 replica shows:
> "replicas": [
>
> {
> "name": "core_node1",
> "core": "sial-catalog-material_shard2_replica2",
> "baseUrl": "http://ae1b-ecom-msc04:8983/solr;,
> "nodeName": "ae1b-ecom-msc04:8983_solr",
> "state": "active",
> "leader": false,
> "index":
> {
> "numDocs": 487123,
> "maxDocs": 711973,
> *"deletedDocs": 224850,*
> "size": "331.96 MB",
> "lastModified": "2016-12-08T11:10:05.969Z",
> "current": false,
> "version": 17933,
> "segmentCount": 17
> }
> }
> ,
> while the second replica shows this:
>
> {
> "name": "core_node3",
> "core": "sial-catalog-material_shard2_replica1",
> "baseUrl": "http://ae1b-ecom-msc02:8983/solr;,
> "nodeName": "ae1b-ecom-msc02:8983_solr",
> "state": "active",
> "leader": true,
> "index":
> {
> "numDocs": 487063,
> "maxDocs": 487064,
> "deletedDocs": 1,
> "size": "224.83 MB",
> "lastModified": "2016-12-08T11:10:02.625Z",
> "current": false,
> "version": 8208,
> "segmentCount": 19
> }
> }
> ],
> I wrote a routine that uses the Collections API Info call and then for
> each replica calls the Core API to get the information on the index
>
> shouldn't the two replicas have the same number of deletions?
>
> On Thu, Dec 15, 2016 at 12:36 PM, Webster Homer 
> wrote:
>
>> I am trying to find the reported inconsistencies now.
>>
>> The timestamp I have was created by our ETL process, which may not be in
>> exactly the same order as the indexing occurred
>>
>> When I tried to sort the results by _docid_ desc, solr through a 500
>> error:
>> { "responseHeader":{ "zkConnected":true, "status":500, "QTime":7, "params
>> ":{ "q":"*:*", "indent":"on", "fl":"record_spec,s_id,pid,search_concat_pno,
>> search_pno, search_user_term, search_lform, search_eform, search_acronym,
>> search_synonyms, root_name, search_s_pri_name, search_p_pri_name,
>> search_keywords, lookahead_search_terms, sortkey, search_rtecs,
>> search_chem_comp, cas_number, search_component_cas, search_beilstein,
>> search_color_idx, search_ecnumber, search_femanumber, search_isbn,
>> search_mdl_number, search_descriptions, page_title,
>> search_xref_comparable_pno, search_xref_comparable_sku,
>> search_xref_equivalent_pno, search_xref_exact_pno create_date
>> search_xref_exact_sku, score", "sort":"_docid_ desc", "rows":"20", "wt":
>> "json", "_":"1481821047026"}}, "error":{ "msg":"Index: 1, Size: 0", "
>> trace":"java.lang.IndexOutOfBoundsException: Index: 1, Size: 0\n\tat
>> java.util.ArrayList.rangeCheck(ArrayList.java:653)\n\tat
>> java.util.ArrayList.get(ArrayList.java:429)\n\tat
>> org.apache.solr.common.util.NamedList.getVal(NamedList.java:174)\n\tat
>> org.apache.solr.handler.component.ShardFieldSortedHitQueue$S
>> hardComparator.sortVal(ShardFieldSortedHitQueue.java:146)\n\tat
>> org.apache.solr.handler.component.ShardFieldSortedHitQueue$
>> 1.compare(ShardFieldSortedHitQueue.java:167)\n\tat
>> org.apache.solr.handler.component.ShardFieldSortedHitQueue$
>> 1.compare(ShardFieldSortedHitQueue.java:159)\n\tat
>> org.apache.solr.handler.component.ShardFieldSortedHitQueue.l
>> essThan(ShardFieldSortedHitQueue.java:91)\n\tat
>> org.apache.solr.handler.component.ShardFieldSortedHitQueue.l
>> essThan(ShardFieldSortedHitQueue.java:33)\n\tat
>> org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:158)\n\tat
>> org.apache.solr.handler.component.QueryComponent.mergeIds(
>> QueryComponent.java:1098)\n\tat org.apache.solr.handler.compon
>> ent.QueryComponent.handleRegularResponses(QueryComponent.java:758)\n\tat
>> org.apache.solr.handler.component.QueryComponent.handleRespo
>> nses(QueryComponent.java:737)\n\tat org.apache.solr.handler.compon
>> ent.SearchHandler.handleRequestBody(SearchHandler.java:428)\n\tat
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(Req
>> uestHandlerBase.java:154)\n\tat org.apache.solr.core.SolrCore.
>> execute(SolrCore.java:2089)\n\tat org.apache.solr.servlet.HttpSo
>> lrCall.execute(HttpSolrCall.java:652)\n\tat
>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:459)\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\n\tat
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte
>> r(ServletHandler.java:1668)\n\tat org.eclipse.jetty.servlet.Serv
>> letHandler.doHandle(ServletHandler.java:581)\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
>> 

Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-15 Thread Webster Homer
Looking through our replicas I noticed that in one of our shards (each
shard has 2 replicas)
1 replica shows:
"replicas": [

{
"name": "core_node1",
"core": "sial-catalog-material_shard2_replica2",
"baseUrl": "http://ae1b-ecom-msc04:8983/solr;,
"nodeName": "ae1b-ecom-msc04:8983_solr",
"state": "active",
"leader": false,
"index":
{
"numDocs": 487123,
"maxDocs": 711973,
*"deletedDocs": 224850,*
"size": "331.96 MB",
"lastModified": "2016-12-08T11:10:05.969Z",
"current": false,
"version": 17933,
"segmentCount": 17
}
}
,
while the second replica shows this:

{
"name": "core_node3",
"core": "sial-catalog-material_shard2_replica1",
"baseUrl": "http://ae1b-ecom-msc02:8983/solr;,
"nodeName": "ae1b-ecom-msc02:8983_solr",
"state": "active",
"leader": true,
"index":
{
"numDocs": 487063,
"maxDocs": 487064,
"deletedDocs": 1,
"size": "224.83 MB",
"lastModified": "2016-12-08T11:10:02.625Z",
"current": false,
"version": 8208,
"segmentCount": 19
}
}
],
I wrote a routine that uses the Collections API Info call and then for each
replica calls the Core API to get the information on the index

shouldn't the two replicas have the same number of deletions?

On Thu, Dec 15, 2016 at 12:36 PM, Webster Homer 
wrote:

> I am trying to find the reported inconsistencies now.
>
> The timestamp I have was created by our ETL process, which may not be in
> exactly the same order as the indexing occurred
>
> When I tried to sort the results by _docid_ desc, solr through a 500 error:
> { "responseHeader":{ "zkConnected":true, "status":500, "QTime":7, "params
> ":{ "q":"*:*", "indent":"on", "fl":"record_spec,s_id,pid,search_concat_pno,
> search_pno, search_user_term, search_lform, search_eform, search_acronym,
> search_synonyms, root_name, search_s_pri_name, search_p_pri_name,
> search_keywords, lookahead_search_terms, sortkey, search_rtecs,
> search_chem_comp, cas_number, search_component_cas, search_beilstein,
> search_color_idx, search_ecnumber, search_femanumber, search_isbn,
> search_mdl_number, search_descriptions, page_title,
> search_xref_comparable_pno, search_xref_comparable_sku,
> search_xref_equivalent_pno, search_xref_exact_pno create_date
> search_xref_exact_sku, score", "sort":"_docid_ desc", "rows":"20", "wt":
> "json", "_":"1481821047026"}}, "error":{ "msg":"Index: 1, Size: 0", "trace
> ":"java.lang.IndexOutOfBoundsException: Index: 1, Size: 0\n\tat
> java.util.ArrayList.rangeCheck(ArrayList.java:653)\n\tat
> java.util.ArrayList.get(ArrayList.java:429)\n\tat
> org.apache.solr.common.util.NamedList.getVal(NamedList.java:174)\n\tat
> org.apache.solr.handler.component.ShardFieldSortedHitQueue$
> ShardComparator.sortVal(ShardFieldSortedHitQueue.java:146)\n\tat
> org.apache.solr.handler.component.ShardFieldSortedHitQueue$1.compare(
> ShardFieldSortedHitQueue.java:167)\n\tat org.apache.solr.handler.
> component.ShardFieldSortedHitQueue$1.compare(
> ShardFieldSortedHitQueue.java:159)\n\tat org.apache.solr.handler.
> component.ShardFieldSortedHitQueue.lessThan(ShardFieldSortedHitQueue.java:91)\n\tat
> org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(
> ShardFieldSortedHitQueue.java:33)\n\tat org.apache.lucene.util.
> PriorityQueue.insertWithOverflow(PriorityQueue.java:158)\n\tat
> org.apache.solr.handler.component.QueryComponent.
> mergeIds(QueryComponent.java:1098)\n\tat org.apache.solr.handler.
> component.QueryComponent.handleRegularResponses(QueryComponent.java:758)\n\tat
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:737)\n\tat
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:428)\n\tat
> org.apache.solr.handler.RequestHandlerBase.handleRequest(
> RequestHandlerBase.java:154)\n\tat org.apache.solr.core.SolrCore.
> execute(SolrCore.java:2089)\n\tat org.apache.solr.servlet.
> HttpSolrCall.execute(HttpSolrCall.java:652)\n\tat org.apache.solr.servlet.
> HttpSolrCall.call(HttpSolrCall.java:459)\n\tat org.apache.solr.servlet.
> SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:208)\n\tat org.eclipse.jetty.servlet.
> ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)\n\tat
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
> org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:548)\n\tat org.eclipse.jetty.server.
> session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
> org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1160)\n\tat org.eclipse.jetty.servlet.
> ServletHandler.doScope(ServletHandler.java:511)\n\tat
> org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:185)\n\tat org.eclipse.jetty.server.
> handler.ContextHandler.doScope(ContextHandler.java:1092)\n\tat
> 

RE: DocTransformer not always working

2016-12-15 Thread Chris Hostetter

: Well, i can work with this really fine knowing this, but does it make 
: sense? I did assume (or be wrong in doing so) that fl=minhash:[binstr] 
: should mean get that field and pass it through the transformer. At least 
: i just now fell for it, maybe other shouldn't :)

that's what it *can* mean, but it's not -- fundementally -- what it means.

foo:[bar x=y ...] means run the "bar" transformer and request that it 
uses the name "foo" as an output key in the resulting documents.

when "bar" is executing it knows what name it was asked to use, so it can 
use that information for other purposes (like in your case: you can use 
that as a stored field name to do some processing on) but there's no 
reason "foo" has to be a real field name.

many processors don't treat the "name" special in any way, and in gneral a 
processor should behave sanely if there is no name specified (ie: 
"fl=[bar]" should be totally valid)

the key reason why it's not really a good idea to *force* the "name" used 
in the response to match a "real" stored field is because it prevents you 
from using multiple transformers on the same field, or from returning the 
same field unmodified.

Another/Better way for you to have designed your transformer would have 
been that the field to apply the binstr logic too should be specified as a 
local param, ie...

  fl=minhash,b2_minhash:[binstr f=minhash base=2],b8_minhash:[binstr f=minhash 
base=16]


...see what i mean?




: 
: Anyway, thanks again today,
: Markus
: 
: -Original message-
: > From:Chris Hostetter 
: > Sent: Wednesday 14th December 2016 23:14
: > To: solr-user 
: > Subject: Re: DocTransformer not always working
: > 
: > 
: > Fairly certain you aren't overridding getExtraRequestFields, so when your 
: > DocTransformer is evaluated it can'd find the field you want it to 
: > transform.
: > 
: > By default, the ResponseWriters don't provide any fields that aren't 
: > explicitly requested by the user, or specified as "extra" by the 
: > DocTransformer.
: > 
: > IIUC you want the stored value of the "minhash" field to be available to 
: > you, but the response writer code doesn't know that -- it just knows you 
: > want "minhash" to be the output respons key for the "[binstr]" 
: > transformer.
: > 
: > 
: > Take a look at RawValueTransformerFactory as an example to borrow from.
: > 
: > 
: > 
: > 
: > : Date: Wed, 14 Dec 2016 21:55:26 +
: > : From: Markus Jelsma 
: > : Reply-To: solr-user@lucene.apache.org
: > : To: solr-user 
: > : Subject: DocTransformer not always working
: > : 
: > : Hello - I just spotted an oddity with all two custom DocTransformers we 
sometimes use on Solr 6.3.0. This particular transformer in the example just 
transforms a long (or int) into a sequence of bits. I just use it as an 
convenience to compare minhashes with my eyeballs. First example is very 
straightforward, fl=minhash:[binstr], show only the minhash field, but as a bit 
sequence.
: > : 
: > : 
solr/search/select?omitHeader=true=json=true=1=id%20asc=*:*=minhash:[binstr]
: > : {
: > :   "response":{"numFound":96933,"start":0,"docs":[
: > :   {}]
: > :   }}
: > : 
: > : The document is empty! This also happens with another transformer. The 
next example i also request the lang field:
: > : 
: > : solr/search/select?omitHeader=true=json=true=1=id 
asc=*:*=lang,minhash:[binstr]
: > : {
: > :   "response":{"numFound":96933,"start":0,"docs":[
: > :   {
: > : "lang":"nl"}]
: > :   }}
: > : 
: > : Ok, at least i now get the lang field, but the transformed minhash is 
nowhere to be seen. In the next example i request all fields and the 
transformed minhash:
: > : 
: > : 
/solr/search/select?omitHeader=true=json=true=1=id%20asc=*:*=*,minhash:[binstr]
: > : {
: > :   "response":{"numFound":96933,"start":0,"docs":[
: > :   {
: > : 
"minhash":"11101101001010001101001010111101100100110010",
: > : ...other fields here
: > : "_version_":1553728923368423424}]
: > :   }}
: > : 
: > : So it seems that right now, i can only use a transformer properly if i 
request all fields. I believe it used to work with all three examples just as 
you would expect. But since i haven't used transformers for a while, i don't 
know at which version it stopped working like that (if it ever did of course :)
: > : 
: > : Did i mess something up or did a bug creep on me?
: > : 
: > : Thanks,
: > : Markus
: > : 
: > 
: > -Hoss
: > http://www.lucidworks.com/
: > 
: 

-Hoss
http://www.lucidworks.com/


Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-15 Thread Webster Homer
I am trying to find the reported inconsistencies now.

The timestamp I have was created by our ETL process, which may not be in
exactly the same order as the indexing occurred

When I tried to sort the results by _docid_ desc, solr through a 500 error:
{ "responseHeader":{ "zkConnected":true, "status":500, "QTime":7, "params":{
"q":"*:*", "indent":"on", "fl":"record_spec,s_id,pid,search_concat_pno,
search_pno, search_user_term, search_lform, search_eform, search_acronym,
search_synonyms, root_name, search_s_pri_name, search_p_pri_name,
search_keywords, lookahead_search_terms, sortkey, search_rtecs,
search_chem_comp, cas_number, search_component_cas, search_beilstein,
search_color_idx, search_ecnumber, search_femanumber, search_isbn,
search_mdl_number, search_descriptions, page_title,
search_xref_comparable_pno, search_xref_comparable_sku,
search_xref_equivalent_pno, search_xref_exact_pno create_date
search_xref_exact_sku, score", "sort":"_docid_ desc", "rows":"20", "wt":
"json", "_":"1481821047026"}}, "error":{ "msg":"Index: 1, Size: 0",
"trace":"java.lang.IndexOutOfBoundsException:
Index: 1, Size: 0\n\tat
java.util.ArrayList.rangeCheck(ArrayList.java:653)\n\tat
java.util.ArrayList.get(ArrayList.java:429)\n\tat
org.apache.solr.common.util.NamedList.getVal(NamedList.java:174)\n\tat
org.apache.solr.handler.component.ShardFieldSortedHitQueue$ShardComparator.sortVal(ShardFieldSortedHitQueue.java:146)\n\tat
org.apache.solr.handler.component.ShardFieldSortedHitQueue$1.compare(ShardFieldSortedHitQueue.java:167)\n\tat
org.apache.solr.handler.component.ShardFieldSortedHitQueue$1.compare(ShardFieldSortedHitQueue.java:159)\n\tat
org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardFieldSortedHitQueue.java:91)\n\tat
org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardFieldSortedHitQueue.java:33)\n\tat
org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:158)\n\tat
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:1098)\n\tat
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:758)\n\tat
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:737)\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:428)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:154)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:2089)\n\tat
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:652)\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:459)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\tat
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\tat
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)\n\tat
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\n\tat
java.lang.Thread.run(Thread.java:745)\n", "code":500}}

On Wed, Dec 14, 2016 at 7:41 PM, Erick Erickson 
wrote:

> Let's back up a bit. You say "This seems to cause 

Re: Exception while creating a HttpSolrClinet

2016-12-15 Thread Shawn Heisey
On 12/15/2016 10:32 AM, tesm...@gmail.com wrote:
> I am getting the following exception while creating a Solr client. Any help
> is appreciated
>
> =This is code snipper to create a SolrClient===
>
> public void populate (String args) throws IOException, SolrServerException
>  {
>   String urlString =  "http://localhost:8983/solr;;
>SolrClient  server = new HttpSolrClient.Builder(urlString).build();
> ..
> ..
> ===

> Type 'org/apache/http/impl/client/SystemDefaultHttpClient' (current
> frame, stack[0]) is not assignable to
> 'org/apache/http/impl/client/CloseableHttpClient' (from method signature)

What's likely happening here is that you've got a problem with the
httpcomponents jars (httpclient, httpcore, httpmime) -- these are
dependencies of SolrJ.  Either you've got an incorrect version of one or
more of these jars, or you've got more than one version of them on your
classpath.

In general, a 4.5.x version of httpclient should work with most recent
versions of SolrJ.  Running with httpcomponents older than 4.3.0 could
be problematic.

Thanks,
Shawn



Re: Has anyone used linode.com to run Solr | ??Best way to deliver PHP/Apache clients with Solr question

2016-12-15 Thread Shawn Heisey
On 12/14/2016 7:36 AM, GW wrote:
> I understand accessing solr directly. I'm doing REST calls to a single
> machine.
>
> If I have a cluster of five servers and say three Apache servers, I can
> round robin the REST calls to all five in the cluster?
>
> I guess I'm going to find out. :-)  If so I might be better off just
> running Apache on all my solr instances.

If you're running SolrCloud (which uses zookeeper) then sending multiple
query requests to any node will load balance the requests across all
replicas for the collection.  This is an inherent feature of SolrCloud. 
Indexing requests will be forwarded to the correct place.

The node you're sending to is a potential single point of failure, which
you can eliminate by putting a load balancer in front of Solr that
connects to at least two of the nodes.  As I just mentioned, SolrCloud
will do further load balancing to all nodes which are capable of serving
the requests.

I use haproxy for a load balancer in front of Solr.  I'm not running in
Cloud mode, but a load balancer would also work for Cloud, and is
required for high availability when your client only connects to one
server and isn't cloud aware.

http://www.haproxy.org/

Solr includes a cloud-aware Java client that talks to zookeeper and
always knows the state of the cloud.  This eliminates the requirement
for a load balancer, but using that client would require that you write
your website in Java.

The PHP clients are third-party software, and as far as I know, are not
cloud-aware.

https://wiki.apache.org/solr/IntegratingSolr#PHP

Some advantages of using a Solr client over creating HTTP requests
yourself:  The code is easier to write, and to read.  You generally do
not need to worry about making sure that your requests are properly
escaped for URLs, XML, JSON, etc.  The response to the requests is
usually translated into data structures appropriate to the language --
your program probably doesn't need to know how to parse XML or JSON.

Thanks,
Shawn



Exception while creating a HttpSolrClinet

2016-12-15 Thread tesm...@gmail.com
Hi,

I am getting the following exception while creating a Solr client. Any help
is appreciated

=This is code snipper to create a SolrClient===

public void populate (String args) throws IOException, SolrServerException
 {
  String urlString =  "http://localhost:8983/solr;;
   SolrClient  server = new HttpSolrClient.Builder(urlString).build();
..
..
===




Exception in thread "main" java.lang.VerifyError: Bad return type
Exception Details:
  Location:

org/apache/solr/client/solrj/impl/HttpClientUtil.createClient(Lorg/apache/solr/common/params/SolrParams;)Lorg/apache/http/impl/client/CloseableHttpClient;
@57: areturn
  Reason:
Type 'org/apache/http/impl/client/SystemDefaultHttpClient' (current
frame, stack[0]) is not assignable to
'org/apache/http/impl/client/CloseableHttpClient' (from method signature)
  Current Frame:
bci: @57
flags: { }
locals: { 'org/apache/solr/common/params/SolrParams',
'org/apache/solr/common/params/ModifiableSolrParams',
'org/apache/http/impl/client/SystemDefaultHttpClient' }
stack: { 'org/apache/http/impl/client/SystemDefaultHttpClient' }
  Bytecode:
0x000: bb00 0359 2ab7 0004 4cb2 0005 b900 0601
0x010: 0099 001e b200 05bb 0007 59b7 0008 1209
0x020: b600 0a2b b600 0bb6 000c b900 0d02 00b8
0x030: 000e 4d2c 2bb8 000f 2cb0
  Stackmap Table:
append_frame(@47,Object[#143])

at
org.apache.solr.client.solrj.impl.HttpSolrClient.(HttpSolrClient.java:209)
at
org.apache.solr.client.solrj.impl.HttpSolrClient$Builder.build(HttpSolrClient.java:874)
at PDFParseExtract.populate(PDFParseExtract.java:60)
at PDFParseExtract.main(PDFParseExtract.java:53)


Re: Stemming with SOLR

2016-12-15 Thread Susheel Kumar
We did extensive comparison in the past for Snowball, KStem and Hunspell
and there are cases where one of them works better but not other or
vice-versa. You may utilise all three of them by having 3 different fields
(fieldTypes) and during query, search in all of them.

For some of the cases where none of them works (e.g wolves, wolf etc)., use
StemOverriderFactory.

HTH.

Thanks,
Susheel

On Thu, Dec 15, 2016 at 11:32 AM, Ahmet Arslan 
wrote:

> Hi,
>
> KStemFilter returns legitimate English words, please use it.
>
> Ahmet
>
>
>
> On Thursday, December 15, 2016 6:17 PM, Lasitha Wattaladeniya <
> watt...@gmail.com> wrote:
> Hello devs,
>
> I'm trying to develop this indexing and querying flow where it converts the
> words to its original form (lemmatization). I was doing bit of research
> lately but the information on the internet is very limited. I tried using
> hunspellfactory but it doesn't convert the word to it's original form,
> instead it gives suggestions for some words (hunspell works for some
> english words correctly but for some it gives multiple suggestions or no
> suggestions, i used the en_us.dic provided by openoffice)
>
> I know this is a generic problem in searching, so is there anyone who can
> point me to correct direction or some information :)
>
> Best regards,
> Lasitha Wattaladeniya
> Software Engineer
>
> Mobile : +6593896893
> Blog : techreadme.blogspot.com
>


Re: Nested JSON Facets (Subfacets)

2016-12-15 Thread Chantal Ackermann
> 
> Interesting I don't recall a bug like that being fixed.
> Anyway, glad it works for you now!
> -Yonik


Then it’s probably because it’s Christmas time! :-)

Re: Uncaught exception java.lang.StackOverflowError in 6.3.0

2016-12-15 Thread Erick Erickson
Hmmm, have you changed coreLoadThreads? We had a problem with this a
while back with loading lots and lots of cores, see:
https://issues.apache.org/jira/browse/SOLR-7280

But that was fixed in 6.2, so unless you changed the number of threads
used to load cores it shouldn't be a problem on 6.3...

The symptom was also that replicas would never change to "active",
they'd be stuck in ercovery or down.

Best,
Erick

On Thu, Dec 15, 2016 at 3:07 AM, Yago Riveiro  wrote:
> Hi,
>
> I'm getting this error in my log
>
> 12/15/2016, 9:28:18 AM  ERROR true  ExecutorUtilUncaught 
> exception
> java.lang.StackOverflowError thrown by thread:
> coreZkRegister-1-thread-48-processing-n:XXX.XXX.XXX.XXX:8983_solr
> x:collection1_shard3_replica2 s:shard3 c:collection1-visitors r:core_node5
> java.lang.Exception: Submitter stack trace
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:204)
> at org.apache.solr.core.ZkContainer.registerInZk(ZkContainer.java:204)
> at 
> org.apache.solr.core.CoreContainer.lambda$load$0(CoreContainer.java:505)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
>
>
> -
> Best regards
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Uncaught-exception-java-lang-StackOverflowError-in-6-3-0-tp4309849.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search only for single value of Solr multivalue field

2016-12-15 Thread Erick Erickson
Phrase queries and slop and positionIncrementGap ;)

The fieldType has a positionIncrementGap. This is the token delta
between the end token of one entry and the beginning of the next.

so the first entry: IFREMER, Ctr Brest, DRO Geosci Marines, F-29280
Plouzane, France
IFREMER would have a position of 1 and France would have a position of 9 or so.
If the positionIncrementGap was 100 then this entry:
Univ Lisbon, Ctr Geofis, P-1269102 Lisbon, Portugal.
Univ would have a position of 110.

Now if I seach "IFREMER France"~99 it'd match the first one
but searching "IFREMER Lisbon"~99 it would not match since the
positions are > 99 apart.

So you configure the positionIncrementGap to be greater than the
longest number of tokens you ever expect to have in a single entry.

HTH
Erick

On Thu, Dec 15, 2016 at 3:44 AM, Dorian Hoxha  wrote:
> You should be able to filter "(word1 in field OR word2 in field) AND
> NOT(word1 in field AND word2 in field)". Translate that into the right
> syntax.
> I don't know if lucene is smart enough to execute the filter only once (it
> should be i guess).
> Makes sense ?
>
> On Thu, Dec 15, 2016 at 12:12 PM, Leo BRUVRY-LAGADEC  partenaire-exterieur.ifremer.fr> wrote:
>
>> Hi,
>>
>> I have a multivalued field in my schema called "idx_affilliation".
>>
>> IFREMER, Ctr Brest, DRO Geosci Marines,
>> F-29280 Plouzane, France.
>> Univ Lisbon, Ctr Geofis, P-1269102 Lisbon,
>> Portugal.
>> Univ Bretagne Occidentale, Inst Univ
>> Europeen Mer, Lab Domaines Ocean, F-29280 Plouzane, France.
>> Total Explorat Prod Geosci Projets Nouveaux
>> Exper, F-92078 Paris, France.
>>
>> I want to be able to do a query like: idx_affilliation:(IFREMER Portugal)
>> and not have this document returned. In other words, I do not want queries
>> to span individual values for the field.
>>
>> 
>> ---
>>
>> Here are some further examples using the document above of how I want this
>> to work:
>>
>> idx_affilliation:(IFREMER France) --> Returns it.
>> idx_affilliation:(IFREMER Plouzane) --> Returns it.
>> idx_affilliation:("Univ Bretagne Occidentale") --> Returns it.
>> idx_affilliation:("Univ Lisbon" Portugal) --> Returns it.
>> idx_affilliation:(IFREMER Portugal) --> DOES NOT RETURN IT.
>>
>> Does someone known if it's possible to do this ?
>>
>> Best regards,
>> Leo.
>>


Re: Stemming with SOLR

2016-12-15 Thread Ahmet Arslan
Hi,

KStemFilter returns legitimate English words, please use it.

Ahmet



On Thursday, December 15, 2016 6:17 PM, Lasitha Wattaladeniya 
 wrote:
Hello devs,

I'm trying to develop this indexing and querying flow where it converts the
words to its original form (lemmatization). I was doing bit of research
lately but the information on the internet is very limited. I tried using
hunspellfactory but it doesn't convert the word to it's original form,
instead it gives suggestions for some words (hunspell works for some
english words correctly but for some it gives multiple suggestions or no
suggestions, i used the en_us.dic provided by openoffice)

I know this is a generic problem in searching, so is there anyone who can
point me to correct direction or some information :)

Best regards,
Lasitha Wattaladeniya
Software Engineer

Mobile : +6593896893
Blog : techreadme.blogspot.com


Re: Stemming with SOLR

2016-12-15 Thread Erick Erickson
What about things like PorterStemFilterFactory,
EnglishMinimalStemFilterFactory and the like?

Best,
Erick

On Thu, Dec 15, 2016 at 7:16 AM, Lasitha Wattaladeniya
 wrote:
> Hello devs,
>
> I'm trying to develop this indexing and querying flow where it converts the
> words to its original form (lemmatization). I was doing bit of research
> lately but the information on the internet is very limited. I tried using
> hunspellfactory but it doesn't convert the word to it's original form,
> instead it gives suggestions for some words (hunspell works for some
> english words correctly but for some it gives multiple suggestions or no
> suggestions, i used the en_us.dic provided by openoffice)
>
> I know this is a generic problem in searching, so is there anyone who can
> point me to correct direction or some information :)
>
> Best regards,
> Lasitha Wattaladeniya
> Software Engineer
>
> Mobile : +6593896893
> Blog : techreadme.blogspot.com


Re: File system choices?

2016-12-15 Thread Walter Underwood
About ten years ago, I accidentally put indexes on an NFS volume. Solr ran 
about 100X slower, so I haven’t tried it since.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Dec 15, 2016, at 8:17 AM, Michael Kuhlmann  wrote:
> 
> Yes, and we're doing such things at my company. However we most often do
> things you shouldn't do; this is one of these.
> 
> Solr needs to load data quite fast, otherwise you'll be having a
> performance killer. It's often recommended to use an SSD instead of a
> normal hard disk; a network share would be quite contrary to it.
> 
> It might make sense when you update very seldom, and all your index fits
> into memory.
> 
> -Michael
> 
> 
> Am 15.12.2016 um 16:37 schrieb Michael Joyner (NewsRx):
>> Hello all,
>> 
>> Can the Solr indexes be safely stored and used via mounted NFS shares?
>> 
>> -Mike
>> 
> 



Re: File system choices?

2016-12-15 Thread Erick Erickson
NFS isn't the first choice. That said, numbers of organizations _doou
have to manally remove _ use NFS for their Lucene indexes. See the
recommendations here:
https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/store/NativeFSLockFactory.html

What it really amounts to is that you may find yourself in situations
where you have to manually clean up lock fines if Solr exits
abnormally, so don't 'kill -9'. And follow the advice above for
configuring your lock factory in solrcofnig.xml

Best,
Erick

On Thu, Dec 15, 2016 at 7:37 AM, Michael Joyner (NewsRx)
 wrote:
> Hello all,
>
> Can the Solr indexes be safely stored and used via mounted NFS shares?
>
> -Mike
>


Re: Has anyone used linode.com to run Solr | ??Best way to deliver PHP/Apache clients with Solr question

2016-12-15 Thread GW
Thanks Tom,

It looks like there is an PHP extension on Git. seems like a phpized C lib
to create a Zend module to work with ZK. No mention of solr but I'm
guessing I can poll the ensemble for pretty much anything ZK.

Thanks for the direction! A ZK aware app is the way I need to go. I'll give
it go in the next few days.

Best,

GW





On 15 December 2016 at 09:52, Tom Evans  wrote:

> On Thu, Dec 15, 2016 at 12:37 PM, GW  wrote:
> > While my client is all PHP it does not use a solr client. I wanted to
> stay
> > with he latest Solt Cloud and the PHP clients all seemed to have some
> kind
> > of issue being unaware of newer Solr Cloud versions. The client makes
> pure
> > REST calls with Curl. It is stateful through local storage. There is no
> > persistent connection. There are no cookies and PHP work is not sticky so
> > it is designed for round robin on both the internal network.
> >
> > I'm thinking we have a different idea of persistent. To me something like
> > MySQL can be persistent, ie a fifo queue for requests. The stack can be
> > always on/connected on something like a heap storage.
> >
> > I never thought about the impact of a solr node crashing with PHP on top.
> > Many thanks!
> >
> > Was thinking of running a conga line (Ricci & Luci projects) and shutting
> > down and replacing failed nodes. Never done this with Solr. I don't see
> any
> > reasons why it would not work.
> >
> > ** When you say an array of connections per host. It would still require
> an
> > internal DNS because hosts files don't round robin. perhaps this is
> handled
> > in the Python client??
>
>
> The best Solr clients will take the URIs of the Zookeeper servers;
> they do not make queries via Zookeeper, but will read the current
> cluster status from zookeeper in order to determine which solr node to
> actually connect to, taking in to account what nodes are alive, and
> the state of particular shards.
>
> SolrJ (Java) will do this, as will pysolr (python), I'm not aware of a
> PHP client that is ZK aware.
>
> If you don't have a ZK aware client, there are several options:
>
> 1) Make your favourite client ZK aware, like in [1]
> 2) Use round robin DNS to distribute requests amongst the cluster.
> 3) Use a hardware or software load balancer in front of the cluster.
> 4) Use shared state to store the names of active nodes*
>
> All apart from 1) have significant downsides:
>
> 2) Has no concept of a node being down. Down nodes should not cause
> query failures, the requests should go elsewhere in the cluster.
> Requires updating DNS to add or remove nodes.
> 3) Can detect "down" nodes. Has no idea about the state of the
> cluster/shards (usually).
> 4) Basically duplicates what ZooKeeper does, but less effectively -
> doesn't know cluster state, down nodes, nodes that are up but with
> unhealthy replicas...
>
> >
> > You have given me some good clarification. I think lol. I know I can spin
> > out WWW servers based on load. I'm not sure how shit will fly spinning up
> > additional solr nodes. I'm not sure what happens if you spin up an empty
> > solr node and what will happen with replication, shards and load cost of
> > spinning an instance. I'm facing some experimentation me thinks. This
> will
> > be a manual process at first, for sure
> >
> > I guess I could put the solr connect requests in my clients into a try
> > loop, looking for successful connections by name before any action.
>
> In SolrCloud mode, you can spin up/shut down nodes as you like.
> Depending on how you have configured your collections, new replicas
> may be automatically created on the new node, or the node will simply
> become part of the cluster but empty, ready for you to assign new
> replicas to it using the Collections API.
>
> You can also use what are called "snitches" to define rules for how
> you want replicas/shards allocated amongst the nodes, eg to avoid
> placing all the replicas for a shard in the same rack.
>
> Cheers
>
> Tom
>
> [1] https://github.com/django-haystack/pysolr/commit/
> 366f14d75d2de33884334ff7d00f6b19e04e8bbf
>


Re: File system choices?

2016-12-15 Thread Michael Kuhlmann
Yes, and we're doing such things at my company. However we most often do
things you shouldn't do; this is one of these.

Solr needs to load data quite fast, otherwise you'll be having a
performance killer. It's often recommended to use an SSD instead of a
normal hard disk; a network share would be quite contrary to it.

It might make sense when you update very seldom, and all your index fits
into memory.

-Michael


Am 15.12.2016 um 16:37 schrieb Michael Joyner (NewsRx):
> Hello all,
>
> Can the Solr indexes be safely stored and used via mounted NFS shares?
>
> -Mike
>



Re: Nested JSON Facets (Subfacets)

2016-12-15 Thread Yonik Seeley
Interesting I don't recall a bug like that being fixed.
Anyway, glad it works for you now!
-Yonik


On Thu, Dec 15, 2016 at 11:01 AM, Chantal Ackermann
 wrote:
> Hi Yonik,
>
> after upgrading to Solr 6.3.0, the nested function works as expected! (Both 
> with and without docValues.)
>
> "facets":{
> "count":3179500,
> "all_pop":1.5901646171168616E8,
> "shop_cat":{
>   "buckets":[{
>   "val":"Kontaktlinsen > Torische Linsen",
>   "count":75168,
>   "cat_sum":3752665.0497611803},
>
>
> Thanks,
> Chantal
>
>
>> Am 15.12.2016 um 16:00 schrieb Chantal Ackermann 
>> :
>>
>> Hi Yonik,
>>
>> are you certain that nesting a function works as documented on 
>> http://yonik.com/solr-subfacets/?
>>
>> top_authors:{
>>type: terms,
>>field: author,
>>limit: 7,
>>sort: "revenue desc",
>>facet:{
>>  revenue: "sum(sales)"
>>}
>>  }
>>
>>
>> I’m getting the feeling that the function is never really executed because, 
>> for my index, calling sum() with a non-number field (e.g. a multi-valued 
>> string field) throws an error when *not nested* but does *not throw an 
>> error* when nested:
>>
>>json.facet={all_pop: "sum(gtin)“}
>>
>>"error":{
>>"trace":“java.lang.UnsupportedOperationException
>>   at 
>> org.apache.lucene.queries.function.FunctionValues.doubleVal(FunctionValues.java:47)
>>
>> And the following does not throw an error but definitely should if the 
>> function would be executed:
>>
>>json.facet={all_pop:"sum(popularity)",shop_cat: {type:terms, 
>> field:shop_cat, facet: {cat_pop:"sum(gtin)"}}}
>>
>> returns:
>>
>> "facets":{
>>"count":2815500,
>>"all_pop":1.4065865823321116E8,
>>"shop_cat":{
>>  "buckets":[{
>>  "val":"Kontaktlinsen > Torische Linsen",
>>  "count":75168,
>>  "cat_pop":0.0},
>>{
>>  "val":"Damen-Mode/Inspirationen",
>>  "count":47053,
>>  "cat_pop":0.0},
>>
>> For completeness: here is the field directive for „gtin“ with 
>> „text_noleadzero“ based on „solr.TextField“:
>>
>>> required="false" multiValued="true“/>
>>
>>
>> Is this a bug or is the documentation a glimpse of the future? I will try 
>> version 6.3.0, now. I was still on 6.1.0 for the above tests.
>> (I have also tried with the „avg“ function, just to make sure, but same 
>> there.)
>>
>> Cheers,
>> Chantal
>


Re: Nested JSON Facets (Subfacets)

2016-12-15 Thread Chantal Ackermann
Hi Yonik,

after upgrading to Solr 6.3.0, the nested function works as expected! (Both 
with and without docValues.)

"facets":{
"count":3179500,
"all_pop":1.5901646171168616E8,
"shop_cat":{
  "buckets":[{
  "val":"Kontaktlinsen > Torische Linsen",
  "count":75168,
  "cat_sum":3752665.0497611803},


Thanks,
Chantal


> Am 15.12.2016 um 16:00 schrieb Chantal Ackermann :
> 
> Hi Yonik,
> 
> are you certain that nesting a function works as documented on 
> http://yonik.com/solr-subfacets/?
> 
> top_authors:{ 
>type: terms,
>field: author,
>limit: 7,
>sort: "revenue desc",
>facet:{
>  revenue: "sum(sales)"
>}
>  }
> 
> 
> I’m getting the feeling that the function is never really executed because, 
> for my index, calling sum() with a non-number field (e.g. a multi-valued 
> string field) throws an error when *not nested* but does *not throw an error* 
> when nested:
> 
>json.facet={all_pop: "sum(gtin)“}
> 
>"error":{
>"trace":“java.lang.UnsupportedOperationException
>   at 
> org.apache.lucene.queries.function.FunctionValues.doubleVal(FunctionValues.java:47)
> 
> And the following does not throw an error but definitely should if the 
> function would be executed:
> 
>json.facet={all_pop:"sum(popularity)",shop_cat: {type:terms, 
> field:shop_cat, facet: {cat_pop:"sum(gtin)"}}}
> 
> returns:
> 
> "facets":{
>"count":2815500,
>"all_pop":1.4065865823321116E8,
>"shop_cat":{
>  "buckets":[{
>  "val":"Kontaktlinsen > Torische Linsen",
>  "count":75168,
>  "cat_pop":0.0},
>{
>  "val":"Damen-Mode/Inspirationen",
>  "count":47053,
>  "cat_pop":0.0},
> 
> For completeness: here is the field directive for „gtin“ with 
> „text_noleadzero“ based on „solr.TextField“:
> 
> required="false" multiValued="true“/>
> 
> 
> Is this a bug or is the documentation a glimpse of the future? I will try 
> version 6.3.0, now. I was still on 6.1.0 for the above tests.
> (I have also tried with the „avg“ function, just to make sure, but same 
> there.)
> 
> Cheers,
> Chantal



File system choices?

2016-12-15 Thread Michael Joyner (NewsRx)

Hello all,

Can the Solr indexes be safely stored and used via mounted NFS shares?

-Mike



Stemming with SOLR

2016-12-15 Thread Lasitha Wattaladeniya
Hello devs,

I'm trying to develop this indexing and querying flow where it converts the
words to its original form (lemmatization). I was doing bit of research
lately but the information on the internet is very limited. I tried using
hunspellfactory but it doesn't convert the word to it's original form,
instead it gives suggestions for some words (hunspell works for some
english words correctly but for some it gives multiple suggestions or no
suggestions, i used the en_us.dic provided by openoffice)

I know this is a generic problem in searching, so is there anyone who can
point me to correct direction or some information :)

Best regards,
Lasitha Wattaladeniya
Software Engineer

Mobile : +6593896893
Blog : techreadme.blogspot.com


Re: Nested JSON Facets (Subfacets)

2016-12-15 Thread Chantal Ackermann
Hi Yonik,

are you certain that nesting a function works as documented on 
http://yonik.com/solr-subfacets/?

top_authors:{ 
type: terms,
field: author,
limit: 7,
sort: "revenue desc",
facet:{
  revenue: "sum(sales)"
}
  }


I’m getting the feeling that the function is never really executed because, for 
my index, calling sum() with a non-number field (e.g. a multi-valued string 
field) throws an error when *not nested* but does *not throw an error* when 
nested:

json.facet={all_pop: "sum(gtin)“}

"error":{
"trace":“java.lang.UnsupportedOperationException
at 
org.apache.lucene.queries.function.FunctionValues.doubleVal(FunctionValues.java:47)

And the following does not throw an error but definitely should if the function 
would be executed:

json.facet={all_pop:"sum(popularity)",shop_cat: {type:terms, 
field:shop_cat, facet: {cat_pop:"sum(gtin)"}}}

returns:

"facets":{
"count":2815500,
"all_pop":1.4065865823321116E8,
"shop_cat":{
  "buckets":[{
  "val":"Kontaktlinsen > Torische Linsen",
  "count":75168,
  "cat_pop":0.0},
{
  "val":"Damen-Mode/Inspirationen",
  "count":47053,
  "cat_pop":0.0},

For completeness: here is the field directive for „gtin“ with „text_noleadzero“ 
based on „solr.TextField“:

 required="false" multiValued="false" docValues="true“/>
> 
> I have also re-indexed (removed data/ and indexed from scratch). The 
> popularity field is populated with random values (as I don’t have the real 
> values from production) meaning that all documents have values > 0.
> 
> Here the statistics output:
> 
> "stats":{
>"stats_fields":{
>  "popularity":{
>"min":7.952374289743602E-5,
>"max":99.3896484375,
>"count":1687500,
>"missing":0,
>"sum":8.436878611434968E7,
>"sumOfSquares":5.626142812197906E9,
>"mean":49.9963176973924,
>"stddev":28.885623872869992},
> 
> And this is the relevant facet output from calling
> 
> /solr//query?
> json.facet={
> num_pop:{query: "popularity[* TO  *]“},
> all_pop: "sum(popularity)“,
> shop_cat: {type:terms, field:shop_cat, facet: {cat_pop: 
> "sum(popularity)"}}}=*:*=1=popularity=json
> 
> "facets":{
>"count":1687500,
>"all_pop":1.5893775613258794E8,
>"num_pop":{
>  "count":1687500},
>"shop_cat":{
>  "buckets":[{
>  "val":"Kontaktlinsen > Torische Linsen",
>  "count":75168,
>  "cat_pop":0.0},
>{
>  "val":"Neu",
>  "count":31772,
>  "cat_pop":0.0},
>{
>  "val":"Gesundheit & Schönheit > Gesundheitspflege",
>  "count":20281,
>  "cat_pop":0.0},
> [… more facets omitted]
> 
> 
> The /query handler is an edismax configuration, though I don’t think this 
> matters as long as the results include documents with popularity > 0 which is 
> the case as seen in the facet output (and sum() works in general for all of 
> the documents just not for the buckets as seen in „all_pop").
> 
> I will try to explicitly turn off the docValues and add stored=„true“ just to 
> try out more. If someone has any other suggestions that I should try - I 
> would be glad to here them. If it is not possible to retrieve the sum in this 
> way I would have to fetch each sum separately which would be a huge 
> performance impact.
> 
> Thanks!
> Chantal
> 
> 
> 
> 
> 
>> Am 15.12.2016 um 10:16 schrieb CA :
>> 
>>> num_pop:{query:"popularity:[* TO *]"}
> 



Re: Has anyone used linode.com to run Solr | ??Best way to deliver PHP/Apache clients with Solr question

2016-12-15 Thread Tom Evans
On Thu, Dec 15, 2016 at 12:37 PM, GW  wrote:
> While my client is all PHP it does not use a solr client. I wanted to stay
> with he latest Solt Cloud and the PHP clients all seemed to have some kind
> of issue being unaware of newer Solr Cloud versions. The client makes pure
> REST calls with Curl. It is stateful through local storage. There is no
> persistent connection. There are no cookies and PHP work is not sticky so
> it is designed for round robin on both the internal network.
>
> I'm thinking we have a different idea of persistent. To me something like
> MySQL can be persistent, ie a fifo queue for requests. The stack can be
> always on/connected on something like a heap storage.
>
> I never thought about the impact of a solr node crashing with PHP on top.
> Many thanks!
>
> Was thinking of running a conga line (Ricci & Luci projects) and shutting
> down and replacing failed nodes. Never done this with Solr. I don't see any
> reasons why it would not work.
>
> ** When you say an array of connections per host. It would still require an
> internal DNS because hosts files don't round robin. perhaps this is handled
> in the Python client??


The best Solr clients will take the URIs of the Zookeeper servers;
they do not make queries via Zookeeper, but will read the current
cluster status from zookeeper in order to determine which solr node to
actually connect to, taking in to account what nodes are alive, and
the state of particular shards.

SolrJ (Java) will do this, as will pysolr (python), I'm not aware of a
PHP client that is ZK aware.

If you don't have a ZK aware client, there are several options:

1) Make your favourite client ZK aware, like in [1]
2) Use round robin DNS to distribute requests amongst the cluster.
3) Use a hardware or software load balancer in front of the cluster.
4) Use shared state to store the names of active nodes*

All apart from 1) have significant downsides:

2) Has no concept of a node being down. Down nodes should not cause
query failures, the requests should go elsewhere in the cluster.
Requires updating DNS to add or remove nodes.
3) Can detect "down" nodes. Has no idea about the state of the
cluster/shards (usually).
4) Basically duplicates what ZooKeeper does, but less effectively -
doesn't know cluster state, down nodes, nodes that are up but with
unhealthy replicas...

>
> You have given me some good clarification. I think lol. I know I can spin
> out WWW servers based on load. I'm not sure how shit will fly spinning up
> additional solr nodes. I'm not sure what happens if you spin up an empty
> solr node and what will happen with replication, shards and load cost of
> spinning an instance. I'm facing some experimentation me thinks. This will
> be a manual process at first, for sure
>
> I guess I could put the solr connect requests in my clients into a try
> loop, looking for successful connections by name before any action.

In SolrCloud mode, you can spin up/shut down nodes as you like.
Depending on how you have configured your collections, new replicas
may be automatically created on the new node, or the node will simply
become part of the cluster but empty, ready for you to assign new
replicas to it using the Collections API.

You can also use what are called "snitches" to define rules for how
you want replicas/shards allocated amongst the nodes, eg to avoid
placing all the replicas for a shard in the same rack.

Cheers

Tom

[1] 
https://github.com/django-haystack/pysolr/commit/366f14d75d2de33884334ff7d00f6b19e04e8bbf


Re: Searching for a term which isn't a part of an expression

2016-12-15 Thread Dean Gurvitz
I think queries would usually not contain more than one phrase per query,
but there isn't a fixed list.

Anyways, your solution is very very good for us. We could write a
QueryParser or a SearchComponent that edits the Lucene Query object in the
ResponseBuilder to include the relevant SpanNotQuery. Thanks a lot!!!

On Thu, Dec 15, 2016 at 4:01 PM, Ahmet Arslan  wrote:

> Hi,
>
>
> Span query family would be a pure query-time solution, SpanNotQuery in
> particular.
>
>
> SpanNearQuery include = new SpanTermQuery(new Term(FIELD, "world");
>
>
> SpanNearQuery exclude = new SpanNearQuery(new SpanQuery[] {
> new SpanTermQuery(new Term(FIELD, "hello")),
> new SpanTermQuery(new Term(FIELD, "world"))},
> 0,
> true);
>
> SpanQuery finalQuery = new SpanNotQuery(include, exclude)
> This finalQuery supposed to retrieve documents that have the term "world"
> but not as a part of "hello world".
>
>
>
> Is your list of phrases query dependent? If yes how many phrases per-query?
>
> Or you have a global list of phrases?
>
> Ahmet
>
> On Thursday, December 15, 2016 10:32 AM, Dean Gurvitz 
> wrote:
>
>
>
> Hi,
> The list of phrases wil be relatively dynamic, so changing the indexing
> process isn't a very good solution for us.
>
> We also considered using a PostFilter or adding a SearchComponent to filter
> out the "bad" results, but obviously a true query-time support would be a
> lot better.
>
>
>
> On Wed, Dec 14, 2016 at 10:52 PM, Ahmet Arslan 
> wrote:
>
> > Hi,
> >
> > Do you have a common list of phrases that you want to prohibit partial
> > match?
> > You can index those phrases in a special way, for example,
> >
> > This is a new world hello_world hot_dog tap_water etc.
> >
> > ahmet
> >
> >
> > On Wednesday, December 14, 2016 9:20 PM, deansg 
> wrote:
> > We would like to enable queries for a specific term that doesn't appear
> as
> > a
> > part of a given expression. Negating the expression will not help, as we
> > still want to return items that contain the term independently, even if
> > they
> > contain full expression as well.
> > For example, we would like to search for items that have the term "world"
> > but not as a part of "hello world". If the text is: "This is a new world.
> > Hello world", we would still want to return the item, as "world" appears
> > independently as well as a part of "Hello world". However, we will not
> want
> > to return items that only have the expression "hello world" in them.
> > Does Solr support these types of queries? We thought about using regex,
> but
> > since the text is tokenized I don't think that will be possible.
> >
> >
> >
> > --
> > View this message in context: http://lucene.472066.n3.
> > nabble.com/Searching-for-a-term-which-isn-t-a-part-of-an-
> > expression-tp4309746.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>


Re: Nested JSON Facets (Subfacets)

2016-12-15 Thread Chantal Ackermann
Hi Yonik,


here is an update on what I’ve tried so far, unfortunately without any more 
luck.

The field directive is (should have included this when asking the question):

   /query?
json.facet={
num_pop:{query: "popularity[* TO  *]“},
all_pop: "sum(popularity)“,
shop_cat: {type:terms, field:shop_cat, facet: {cat_pop: 
"sum(popularity)"}}}=*:*=1=popularity=json

"facets":{
"count":1687500,
"all_pop":1.5893775613258794E8,
"num_pop":{
  "count":1687500},
"shop_cat":{
  "buckets":[{
  "val":"Kontaktlinsen > Torische Linsen",
  "count":75168,
  "cat_pop":0.0},
{
  "val":"Neu",
  "count":31772,
  "cat_pop":0.0},
{
  "val":"Gesundheit & Schönheit > Gesundheitspflege",
  "count":20281,
  "cat_pop":0.0},
[… more facets omitted]


The /query handler is an edismax configuration, though I don’t think this 
matters as long as the results include documents with popularity > 0 which is 
the case as seen in the facet output (and sum() works in general for all of the 
documents just not for the buckets as seen in „all_pop").

I will try to explicitly turn off the docValues and add stored=„true“ just to 
try out more. If someone has any other suggestions that I should try - I would 
be glad to here them. If it is not possible to retrieve the sum in this way I 
would have to fetch each sum separately which would be a huge performance 
impact.

Thanks!
Chantal





> Am 15.12.2016 um 10:16 schrieb CA :
> 
>> num_pop:{query:"popularity:[* TO *]"}



Re: Searching for a term which isn't a part of an expression

2016-12-15 Thread Ahmet Arslan
Hi,


Span query family would be a pure query-time solution, SpanNotQuery in 
particular.


SpanNearQuery include = new SpanTermQuery(new Term(FIELD, "world");


SpanNearQuery exclude = new SpanNearQuery(new SpanQuery[] {
new SpanTermQuery(new Term(FIELD, "hello")),
new SpanTermQuery(new Term(FIELD, "world"))},
0,
true);

SpanQuery finalQuery = new SpanNotQuery(include, exclude)
This finalQuery supposed to retrieve documents that have the term "world" but 
not as a part of "hello world".



Is your list of phrases query dependent? If yes how many phrases per-query?

Or you have a global list of phrases?

Ahmet

On Thursday, December 15, 2016 10:32 AM, Dean Gurvitz  wrote:



Hi,
The list of phrases wil be relatively dynamic, so changing the indexing
process isn't a very good solution for us.

We also considered using a PostFilter or adding a SearchComponent to filter
out the "bad" results, but obviously a true query-time support would be a
lot better.



On Wed, Dec 14, 2016 at 10:52 PM, Ahmet Arslan 
wrote:

> Hi,
>
> Do you have a common list of phrases that you want to prohibit partial
> match?
> You can index those phrases in a special way, for example,
>
> This is a new world hello_world hot_dog tap_water etc.
>
> ahmet
>
>
> On Wednesday, December 14, 2016 9:20 PM, deansg  wrote:
> We would like to enable queries for a specific term that doesn't appear as
> a
> part of a given expression. Negating the expression will not help, as we
> still want to return items that contain the term independently, even if
> they
> contain full expression as well.
> For example, we would like to search for items that have the term "world"
> but not as a part of "hello world". If the text is: "This is a new world.
> Hello world", we would still want to return the item, as "world" appears
> independently as well as a part of "Hello world". However, we will not want
> to return items that only have the expression "hello world" in them.
> Does Solr support these types of queries? We thought about using regex, but
> since the text is tokenized I don't think that will be possible.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Searching-for-a-term-which-isn-t-a-part-of-an-
> expression-tp4309746.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Checking Optimal Values for BM25

2016-12-15 Thread Furkan KAMACI
Hi,

Sole's default similarity is BM25 anymore. Its parameters are defined as

k1=1.2, b=0.75

as default. However is there any way that to check the effect of using
different coefficients to calculate BM25 to find the optimal values?

Kind Regards,
Furkan KAMACI


Re: Has anyone used linode.com to run Solr | ??Best way to deliver PHP/Apache clients with Solr question

2016-12-15 Thread GW
While my client is all PHP it does not use a solr client. I wanted to stay
with he latest Solt Cloud and the PHP clients all seemed to have some kind
of issue being unaware of newer Solr Cloud versions. The client makes pure
REST calls with Curl. It is stateful through local storage. There is no
persistent connection. There are no cookies and PHP work is not sticky so
it is designed for round robin on both the internal network.

I'm thinking we have a different idea of persistent. To me something like
MySQL can be persistent, ie a fifo queue for requests. The stack can be
always on/connected on something like a heap storage.

I never thought about the impact of a solr node crashing with PHP on top.
Many thanks!

Was thinking of running a conga line (Ricci & Luci projects) and shutting
down and replacing failed nodes. Never done this with Solr. I don't see any
reasons why it would not work.

** When you say an array of connections per host. It would still require an
internal DNS because hosts files don't round robin. perhaps this is handled
in the Python client??

You have given me some good clarification. I think lol. I know I can spin
out WWW servers based on load. I'm not sure how shit will fly spinning up
additional solr nodes. I'm not sure what happens if you spin up an empty
solr node and what will happen with replication, shards and load cost of
spinning an instance. I'm facing some experimentation me thinks. This will
be a manual process at first, for sure

I guess I could put the solr connect requests in my clients into a try
loop, looking for successful connections by name before any action.

Many thanks,

GW




On 15 December 2016 at 04:46, Dorian Hoxha  wrote:

> See replies inline:
>
> On Wed, Dec 14, 2016 at 3:36 PM, GW  wrote:
>
> > Thanks,
> >
> > I understand accessing solr directly. I'm doing REST calls to a single
> > machine.
> >
> > If I have a cluster of five servers and say three Apache servers, I can
> > round robin the REST calls to all five in the cluster?
> >
> I don't know about php, but it would be better to have "persistent
> connections" or something to the solr servers. In python for example this
> is done automatically. It would be better if each php-server has a
> different order of an array of [list of solr ips]. This way each box will
> contact a ~different solr instance, and will have better chance of not
> creating too may new connections (since the connection cache is
> per-url/ip).
>
> >
> > I guess I'm going to find out. :-)  If so I might be better off just
> > running Apache on all my solr instances.
> >
> I've done that before (though with es, but it's ~same). And just contacting
> the localhost solr. The problem with that, is that if the solr on the
> current host fails, your php won't work. So best in this scenario is to
> have an array of hosts, but the first being the local solr.
>
> >
> >
> >
> >
> >
> > On 14 December 2016 at 07:08, Dorian Hoxha 
> wrote:
> >
> > > See replies inline:
> > >
> > > On Wed, Dec 14, 2016 at 11:16 AM, GW  wrote:
> > >
> > > > Hello folks,
> > > >
> > > > I'm about to set up a Web service I created with PHP/Apache <--> Solr
> > > Cloud
> > > >
> > > > I'm hoping to index a bazillion documents.
> > > >
> > > ok , how many inserts/second ?
> > >
> > > >
> > > > I'm thinking about using Linode.com because the pricing looks great.
> > Any
> > > > opinions??
> > > >
> > > Pricing is 'ok'. For bazillion documents, I would skip vps and go
> > straight
> > > dedicated. Check out ovh.com / online.net etc etc
> > >
> > > >
> > > > I envision using an Apache/PHP round robin in front of a solr cloud
> > > >
> > > > My thoughts are that I send my requests to the Solr instances on the
> > > > Zookeeper Ensemble. Am I missing something?
> > > >
> > > You contact with solr directly, don't have to connect to zookeeper for
> > > loadbalancing.
> > >
> > > >
> > > > What can I say.. I'm software oriented and a little hardware
> > challenged.
> > > >
> > > > Thanks in advance,
> > > >
> > > > GW
> > > >
> > >
> >
>


Re: Search only for single value of Solr multivalue field

2016-12-15 Thread Dorian Hoxha
You should be able to filter "(word1 in field OR word2 in field) AND
NOT(word1 in field AND word2 in field)". Translate that into the right
syntax.
I don't know if lucene is smart enough to execute the filter only once (it
should be i guess).
Makes sense ?

On Thu, Dec 15, 2016 at 12:12 PM, Leo BRUVRY-LAGADEC  wrote:

> Hi,
>
> I have a multivalued field in my schema called "idx_affilliation".
>
> IFREMER, Ctr Brest, DRO Geosci Marines,
> F-29280 Plouzane, France.
> Univ Lisbon, Ctr Geofis, P-1269102 Lisbon,
> Portugal.
> Univ Bretagne Occidentale, Inst Univ
> Europeen Mer, Lab Domaines Ocean, F-29280 Plouzane, France.
> Total Explorat Prod Geosci Projets Nouveaux
> Exper, F-92078 Paris, France.
>
> I want to be able to do a query like: idx_affilliation:(IFREMER Portugal)
> and not have this document returned. In other words, I do not want queries
> to span individual values for the field.
>
> 
> ---
>
> Here are some further examples using the document above of how I want this
> to work:
>
> idx_affilliation:(IFREMER France) --> Returns it.
> idx_affilliation:(IFREMER Plouzane) --> Returns it.
> idx_affilliation:("Univ Bretagne Occidentale") --> Returns it.
> idx_affilliation:("Univ Lisbon" Portugal) --> Returns it.
> idx_affilliation:(IFREMER Portugal) --> DOES NOT RETURN IT.
>
> Does someone known if it's possible to do this ?
>
> Best regards,
> Leo.
>


Search only for single value of Solr multivalue field

2016-12-15 Thread Leo BRUVRY-LAGADEC

Hi,

I have a multivalued field in my schema called "idx_affilliation".

IFREMER, Ctr Brest, DRO Geosci Marines, 
F-29280 Plouzane, France.
Univ Lisbon, Ctr Geofis, P-1269102 
Lisbon, Portugal.
Univ Bretagne Occidentale, Inst Univ 
Europeen Mer, Lab Domaines Ocean, F-29280 Plouzane, France.
Total Explorat Prod Geosci Projets 
Nouveaux Exper, F-92078 Paris, France.


I want to be able to do a query like: idx_affilliation:(IFREMER 
Portugal) and not have this document returned. In other words, I do not 
want queries to span individual values for the field.


---

Here are some further examples using the document above of how I want 
this to work:


idx_affilliation:(IFREMER France) --> Returns it.
idx_affilliation:(IFREMER Plouzane) --> Returns it.
idx_affilliation:("Univ Bretagne Occidentale") --> Returns it.
idx_affilliation:("Univ Lisbon" Portugal) --> Returns it.
idx_affilliation:(IFREMER Portugal) --> DOES NOT RETURN IT.

Does someone known if it's possible to do this ?

Best regards,
Leo.


Uncaught exception java.lang.StackOverflowError in 6.3.0

2016-12-15 Thread Yago Riveiro
Hi,

I'm getting this error in my log

12/15/2016, 9:28:18 AM  ERROR true  ExecutorUtilUncaught 
exception
java.lang.StackOverflowError thrown by thread:
coreZkRegister-1-thread-48-processing-n:XXX.XXX.XXX.XXX:8983_solr
x:collection1_shard3_replica2 s:shard3 c:collection1-visitors r:core_node5
java.lang.Exception: Submitter stack trace
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:204)
at org.apache.solr.core.ZkContainer.registerInZk(ZkContainer.java:204)
at 
org.apache.solr.core.CoreContainer.lambda$load$0(CoreContainer.java:505)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Uncaught-exception-java-lang-StackOverflowError-in-6-3-0-tp4309849.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Getting Error - Session expired for /collections/sprod/state.json

2016-12-15 Thread Piyush Kunal
This is happening when heavy indexing like 100/second is going on.

On Thu, Dec 15, 2016 at 4:33 PM, Piyush Kunal 
wrote:

> - We have solr6.1.0 cluster running on production with 1 shard and 5
> replicas.
> - Zookeeper quorum on 3 nodes.
> - Using a chroot in zookeeper to segregate the configs from other
> collections.
> - Using solrj5.1.0 as our client to query solr.
>
>
>
> Usually things work fine but on and off we witness this exception coming
> up:
> =
> org.apache.solr.common.SolrException: Could not load collection from
> ZK:sprod
> at org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(
> ZkStateReader.java:815)
> at org.apache.solr.common.cloud.ZkStateReader$5.get(
> ZkStateReader.java:477)
> at org.apache.solr.client.solrj.impl.CloudSolrClient.getDocCollection(
> CloudSolrClient.java:1174)
> at org.apache.solr.client.solrj.impl.CloudSolrClient.
> requestWithRetryOnStaleState(CloudSolrClient.java:807)
> at org.apache.solr.client.solrj.impl.CloudSolrClient.request(
> CloudSolrClient.java:782)
> --
> Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for /collections/sprod/state.json
> at org.apache.zookeeper.KeeperException.create(
> KeeperException.java:127)
> at org.apache.zookeeper.KeeperException.create(
> KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
> at org.apache.solr.common.cloud.SolrZkClient$5.execute(
> SolrZkClient.java:311)
> at org.apache.solr.common.cloud.SolrZkClient$5.execute(
> SolrZkClient.java:308)
> at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(
> ZkCmdExecutor.java:61)
> at org.apache.solr.common.cloud.SolrZkClient.exists(
> SolrZkClient.java:308)
> --
> org.apache.solr.common.SolrException: Could not load collection from
> ZK:sprod
> at org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(
> ZkStateReader.java:815)
> at org.apache.solr.common.cloud.ZkStateReader$5.get(
> ZkStateReader.java:477)
> at org.apache.solr.client.solrj.impl.CloudSolrClient.getDocCollection(
> CloudSolrClient.java:1174)
> at org.apache.solr.client.solrj.impl.CloudSolrClient.
> requestWithRetryOnStaleState(CloudSolrClient.java:807)
> at org.apache.solr.client.solrj.impl.CloudSolrClient.request(
> CloudSolrClient.java:782)
> --
> Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for /collections/sprod/state.json
> at org.apache.zookeeper.KeeperException.create(
> KeeperException.java:127)
> at org.apache.zookeeper.KeeperException.create(
> KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
> at org.apache.solr.common.cloud.SolrZkClient$5.execute(
> SolrZkClient.java:311)
> at org.apache.solr.common.cloud.SolrZkClient$5.execute(
> SolrZkClient.java:308)
> at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(
> ZkCmdExecutor.java:61)
> at org.apache.solr.common.cloud.SolrZkClient.exists(
> SolrZkClient.java:308)
> =
>
>
>
>
>
> This is our zoo.cfg:
> ==
> tickTime=2000
> dataDir=/var/lib/zookeeper
> clientPort=2181
> initLimit=5
> syncLimit=2
> server.1=192.168.70.27:2888:3888
> server.2=192.168.70.64:2889:3889
> server.3=192.168.70.26:2889:3889
> maxClientCnxns=300
> maxSessionTimeout=9
> ===
>
>
>
>
>
> This is our solr.xml on server side
> ===
>
> 
>
>   
>
> ${host:}
> ${jetty.port:8983}
> ${hostContext:solr}
>
> ${genericCoreNodeNames:true}
>
> ${zkClientTimeout:3}
> ${distribUpdateSoTimeout:60}
>  name="distribUpdateConnTimeout">${distribUpdateConnTimeout:6}
>  name="zkCredentialsProvider">${zkCredentialsProvider:org.apache.solr.common.cloud.DefaultZkCredentialsProvider}
>  name="zkACLProvider">${zkACLProvider:org.apache.solr.common.cloud.DefaultZkACLProvider}
>
>   
>
>class="HttpShardHandlerFactory">
> ${socketTimeout:60}
> ${connTimeout:6}
>   
> 
>
> ===
>
>
>
>
> Any help appreciated.
>
> Regards,
> Piyush
>


Getting Error - Session expired for /collections/sprod/state.json

2016-12-15 Thread Piyush Kunal
- We have solr6.1.0 cluster running on production with 1 shard and 5
replicas.
- Zookeeper quorum on 3 nodes.
- Using a chroot in zookeeper to segregate the configs from other
collections.
- Using solrj5.1.0 as our client to query solr.



Usually things work fine but on and off we witness this exception coming up:
=
org.apache.solr.common.SolrException: Could not load collection from
ZK:sprod
at
org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(ZkStateReader.java:815)
at
org.apache.solr.common.cloud.ZkStateReader$5.get(ZkStateReader.java:477)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.getDocCollection(CloudSolrClient.java:1174)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:807)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:782)
--
Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /collections/sprod/state.json
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
at
org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:311)
at
org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:308)
at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)
at
org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:308)
--
org.apache.solr.common.SolrException: Could not load collection from
ZK:sprod
at
org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(ZkStateReader.java:815)
at
org.apache.solr.common.cloud.ZkStateReader$5.get(ZkStateReader.java:477)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.getDocCollection(CloudSolrClient.java:1174)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:807)
at
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:782)
--
Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /collections/sprod/state.json
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
at
org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:311)
at
org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:308)
at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)
at
org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:308)
=





This is our zoo.cfg:
==
tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=192.168.70.27:2888:3888
server.2=192.168.70.64:2889:3889
server.3=192.168.70.26:2889:3889
maxClientCnxns=300
maxSessionTimeout=9
===





This is our solr.xml on server side
===



  

${host:}
${jetty.port:8983}
${hostContext:solr}

${genericCoreNodeNames:true}

${zkClientTimeout:3}
${distribUpdateSoTimeout:60}
${distribUpdateConnTimeout:6}
${zkCredentialsProvider:org.apache.solr.common.cloud.DefaultZkCredentialsProvider}
${zkACLProvider:org.apache.solr.common.cloud.DefaultZkACLProvider}

  

  
${socketTimeout:60}
${connTimeout:6}
  


===




Any help appreciated.

Regards,
Piyush


HBase table indexing in Solr using morphline conf

2016-12-15 Thread Gurdeep Singh
Hi All,

I am trying to index a HBase table into Solr using HBase indexer and
morphline conf. file.

The issue I'm facing is that, one of the column in HBase table is a count
field (with values as integer) and except this column all other string type
HBase columns are getting indexed in Solr as expected. (only this count
field is not getting indexed in Solr.)

Below is how I configured this column in morphline file:

--
{
   inputColumn : "a:count"(a is one of the column family in HBase table)
   outputField : "count"
   type :"int"
   source: value
}
--

In Solr schema.xml also, I kept count as int.
I also tried changing type in morphline file as long/double, but no luck.

However when I set this column as "string" in morphline and in Solr's
schema.xml, I see the column in Solr but it shows data with type mismatch
error:
"ERROR SCHEMA-INDEX-MISMATCH", stringValue=123

Please advice how to index integer type data from HBase table into Solr
using morphline.

Thanks in advance



Best Regards,
Gurdeep

gurdeepgan...@gmail.com


Re: Has anyone used linode.com to run Solr | ??Best way to deliver PHP/Apache clients with Solr question

2016-12-15 Thread Dorian Hoxha
See replies inline:

On Wed, Dec 14, 2016 at 3:36 PM, GW  wrote:

> Thanks,
>
> I understand accessing solr directly. I'm doing REST calls to a single
> machine.
>
> If I have a cluster of five servers and say three Apache servers, I can
> round robin the REST calls to all five in the cluster?
>
I don't know about php, but it would be better to have "persistent
connections" or something to the solr servers. In python for example this
is done automatically. It would be better if each php-server has a
different order of an array of [list of solr ips]. This way each box will
contact a ~different solr instance, and will have better chance of not
creating too may new connections (since the connection cache is per-url/ip).

>
> I guess I'm going to find out. :-)  If so I might be better off just
> running Apache on all my solr instances.
>
I've done that before (though with es, but it's ~same). And just contacting
the localhost solr. The problem with that, is that if the solr on the
current host fails, your php won't work. So best in this scenario is to
have an array of hosts, but the first being the local solr.

>
>
>
>
>
> On 14 December 2016 at 07:08, Dorian Hoxha  wrote:
>
> > See replies inline:
> >
> > On Wed, Dec 14, 2016 at 11:16 AM, GW  wrote:
> >
> > > Hello folks,
> > >
> > > I'm about to set up a Web service I created with PHP/Apache <--> Solr
> > Cloud
> > >
> > > I'm hoping to index a bazillion documents.
> > >
> > ok , how many inserts/second ?
> >
> > >
> > > I'm thinking about using Linode.com because the pricing looks great.
> Any
> > > opinions??
> > >
> > Pricing is 'ok'. For bazillion documents, I would skip vps and go
> straight
> > dedicated. Check out ovh.com / online.net etc etc
> >
> > >
> > > I envision using an Apache/PHP round robin in front of a solr cloud
> > >
> > > My thoughts are that I send my requests to the Solr instances on the
> > > Zookeeper Ensemble. Am I missing something?
> > >
> > You contact with solr directly, don't have to connect to zookeeper for
> > loadbalancing.
> >
> > >
> > > What can I say.. I'm software oriented and a little hardware
> challenged.
> > >
> > > Thanks in advance,
> > >
> > > GW
> > >
> >
>


Re: Nested JSON Facets (Subfacets)

2016-12-15 Thread CA
Hi Yonik,

thank you for your quick reply.

(((I just send my original e-mail a second time (I did not confirm the 
subscription so I thought it might not have been send the first time, I’m 
sorry.

We are using SOLR 6.1.0. Sorry, I should have mentioned.

The low number is because of the test data. It’s not how it would look like in 
production. That’s also why I was never wondering about 0 values in the 
beginning. But now that I have tweaked the data I can see that it’s not 
returning the values as it should. And in production there are values > 0 as 
expected but the sum() returns 0 nevertheless, that’s why we are aware that 
something is wrong.

In production the data is re-indexed constantly. Though, we might have changed 
the field type from int to float. I’m not sure whether we have really 
re-indexed from scratch after that, in production, but I think in my local env 
I did re-create the index. I will check this out.

I’ll also play around with the range query, thanks for the tip!

Cheers,
Chantal



> That should work... what version of Solr are you using?  Did you 
> change the type of the popularity field w/o completely reindexing? 
> 
> You can try to verify the number of documents in each bucket that have 
> the popularity field by adding another sub-facet next to cat_pop: 
> num_pop:{query:"popularity:[* TO *]"} 
> 
> > A quick check with this json.facet parameter: 
> > 
> > json.facet: {cat_pop:"sum(popularity)“} 
> > 
> > returns: 
> > 
> > "facets“: { 
> > "count":2508, 
> > "cat_pop":21.0}, 
> 
> That looks like a pretty low sum for all those documents perhaps 
> most of them are missing "popularity" (or have a 0 popularity). 
> To test one of the buckets at the top-level this way, you could add 
> fq=shop_cat:"Men > Clothing > Jumpers & Cardigans" 
> and see if you get anything. 
> 
> -Yonik 



Nested JSON Facets (Subfacets)

2016-12-15 Thread CA
Hi all,

this is about using a function in nested facets, specifically the „sum()“ 
function inside a „terms“ facet using the json.facet api.

My json.facet parameter looks like this:

   json.facet={shop_cat: {type:terms, field:shop_cat, facet: 
{cat_pop:"sum(popularity)"}}}

A snippet of the result:

   "facets“: {
   "count":2508,
   "shop_cat“: {
   "buckets“: [{
   "val“: "Men > Clothing > Jumpers & Cardigans",
   "count":252,
   "cat_pop“:0.0
}, {
  "val":"Men > Clothing > Jackets & Coats",
  "count":157,
  "cat_pop“:0.0
}, // and more

This looks fine all over but it turns out that „cat_pop“, the result of 
„sum(popularity)“ is always 0.0 even if the documents for this facet value have 
popularities > 0.

A quick check with this json.facet parameter:

   json.facet: {cat_pop:"sum(popularity)“}

returns:

   "facets“: {
   "count":2508,
   "cat_pop":21.0},

To me, it seems it works fine on the base level but not when nested. Still, 
Yonik’s documentation and the Jira issues indicate that it is possible to use 
functions in nested facets so I might just be using the wrong structure? I have 
a hard time finding any other examples on the i-net and I had no luck changing 
the structure around.
Could someone shed some light on this for me? It would also help to know if it 
is not possible to sum the values up this way.

Thanks a lot!
Chantal