Re: identifying source of queries

2017-08-08 Thread Shalin Shekhar Mangar
There is no in-built way but if you are willing to create a custom query component then it should be easy to mandate that every query must have a tag parameter by throwing an exception otherwise. Any query param you pass to a distributed query request should be propagated to all query nodes as

identifying source of queries

2017-08-08 Thread suresh pendap
Hi, We have found that application teams often fire ad-hoc queries, some of these are very expensive queries and can bring the solr cluster down. Some times they just build custom scripts which does some offline analytics by firing expensive queries, the solr cluster was originally not sized for

Re: missing documents after restart

2017-08-08 Thread John Blythe
Thanks Erick. I don't think all of those ifs are in place. Must be something in our nightly process that is conflicting. Will dive in tomorrow to figure out and report back. On Tue, Aug 8, 2017 at 1:27 PM Erick Erickson wrote: > First, are you absolutely sure you're

RE: Storing data in Solr

2017-08-08 Thread Phil Scadden
When I am putting PDF documents and rows from a table into the same index, I create "dataSource" field to identify the source and I don't copy database fields - only index them - apart from the unique key which is stored as "document". On search, you process the output before passing to user.

JSON Logs in SOLR 5.x

2017-08-08 Thread John Bickerstaff
I'm running Solr 5.x and have the need to push logs into AWS's kinesis firehose. As I understand it, I need the logs to be in JSON format. This page: https://cwiki.apache.org/confluence/display/solr/Configuring+Logging Tells me that SOLR is using Log4J version 1.2 I've played with Log4J

RE: Solr 6 and IDF

2017-08-08 Thread Markus Jelsma
I agree! I also share the thought experiment, these changes seem justifiable indeed! It is just that i am interested in the evidence. Again, if you can webster.homer, please share significant figures. They should be interesting! Regards, M. -Original message- > From:Walter Underwood

Re: Solr 6 and IDF

2017-08-08 Thread Walter Underwood
There are good use cases for disabling idf and even tf for labels and categories. Searching resumes, maybe you care that “microsoft word” is less selective than “r programming”, but maybe you want all the ones that match three skills followed by the ones that match two skills, regardless of

Re: SolrCloud - leader updates not updating followers

2017-08-08 Thread Erick Erickson
How did you set up your cluster? I hope you just used the collections create API. If you tried to do this by adding individual cores via the core admin APIs, I'd advise you to delete them all and use the collections API. In fact you should use the collections API for everything; the "core admin

RE: Solr 6 and IDF

2017-08-08 Thread Markus Jelsma
Do you measure MRR or sales conversion right now? It would be interesting to see the graph change after your modification, or not of course. Please let us know! -Original message- > From:Webster Homer > Sent: Tuesday 8th August 2017 23:04 > To:

Re: Solr 6 and IDF

2017-08-08 Thread Webster Homer
I think just disabling idf is what we want. For product searching we really don't want to raise a rarer match. What we see analyzing results is that some good hits are suppressed, have lower scores, due to idf. This is so we can test this. We think it will help, but we'll see. On Tue, Aug 8,

RE: Solr 6 and IDF

2017-08-08 Thread Markus Jelsma
Yes, extend the default Similarity, return 1.0f for idf and probably the idfExplain methods, and configure it in your schema, global or per-field. If you think this is a good idea, why not also return 1.0f for tf? And while you're at it, also omitNorms on all fields entirely? I am curious if

Re: Solr 6 and IDF

2017-08-08 Thread Webster Homer
It appears that all I need to do is create a class that extends BM25Similarity, and have the new class return 1 as the idf. Is that correct? On Tue, Aug 8, 2017 at 3:15 PM, Webster Homer wrote: > I do want to use BM25, just disable IDF > > On Tue, Aug 8, 2017 at 2:58 PM,

Re: Solr 6 and IDF

2017-08-08 Thread Webster Homer
I do want to use BM25, just disable IDF On Tue, Aug 8, 2017 at 2:58 PM, Peter Lancaster < peter.lancas...@findmypast.com> wrote: > Hi Webster, > > If you're not worried about using BM25 searcher then you should just be > able to continue as you were before by providing your own similarity class

RE: Solr 6 and IDF

2017-08-08 Thread Peter Lancaster
Hi Webster, If you're not worried about using BM25 searcher then you should just be able to continue as you were before by providing your own similarity class that extends ClassicSimilarity and then override the idf method to always return 1, then reference that in your schema e.g. As far

RE: SolrCloud - leader updates not updating followers

2017-08-08 Thread Peter Lancaster
Hi Erik, Thanks for your quick reply. It's given me a few things to research and work on tomorrow. In the meantime, in case it triggers any other thoughts, just to say that our AutoCommit settings are 180 30 1 When I

Solr 6 and IDF

2017-08-08 Thread Webster Homer
Our most common use for solr is searching for products, not text search. My company is in the process of migrating away from an Endeca search engine, the goal to keep the business happy is to make sure that search results from the different engines be fairly similar, one area that we have found

Re: MongoDb vs Solr

2017-08-08 Thread Lars Karlsson
Yet another reason to stick to Solr: https://aphyr.com/posts/323-call-me-maybe-elasticsearch-1-5-0 On Mon, 7 Aug 2017 at 15:40, Charlie Hull wrote: > On 05/08/2017 12:28, GW wrote: > > For The Guardian, Solr is the new database | Lucidworks > > < >

Re: How many collections in a solrcloud are too many, how to determine this?

2017-08-08 Thread Webster Homer
Yes we do see replicas go into recovery. Most of our clouds are hosted in the google cloud. So flaky networks are probably not an issue, though firewalls to the clouds can be On Tue, Aug 8, 2017 at 2:14 PM, Erick Erickson wrote: > So in total you have 56 replicas,

Re: How many collections in a solrcloud are too many, how to determine this?

2017-08-08 Thread Erick Erickson
So in total you have 56 replicas, correct? This shouldn't be a problem, we've seen many more replicas than that. Many many many. Do you ever see any replicas go into recovery? One common problem is that GC exceeds the timeouts for, say, Zookeeper to contact nodes and they'll cycle through

How many collections in a solrcloud are too many, how to determine this?

2017-08-08 Thread Webster Homer
We have a Solrcloud environments that have 4 solr nodes and a 3 node Zookeeper ensemble. All of the collections are configured to have 2 shards with 2 replicas. In this environment we have 14 different collections. Some of these collections are hardly touched others have a fairly heavy search and

Re: missing documents after restart

2017-08-08 Thread Erick Erickson
First, are you absolutely sure you're committing before shutting down? Hard commit in this case, openSearcher shouldn't matter. SolrCloud? And if not SolrCloud, how are you shutting Solr down? "Kill -9" is evil. If you have transaction logs enabled then you shouldn't be losing docs, any

Re: SolrCloud - leader updates not updating followers

2017-08-08 Thread Erick Erickson
This _better_ be a problem with your configuration or all my assumptions are false ;) What are you autocommit settings? The documents should be forwarded to each replica from the leader during ingestion. However, they are not visible on the follower until a hard commit(openSearcher=true) or soft

Re: Get results in multiple orders (multiple boosts)

2017-08-08 Thread Rick Leir
Luca, What is the algorithm for the custom sort order? -- Rick On August 7, 2017 6:38:49 AM EDT, Luca Dall'Osto wrote: >Hello Rick, >thanks for your answer. >Yes, I compose solr query from frontend request, but I'm not able to >sort by a custom order, only by

SolrCloud - leader updates not updating followers

2017-08-08 Thread lancasp22
Hi, I've recently created a solr cloud on solr 5.5.2 with a separate zookeeper cluster. I write to the cloud by posting to update/json and the documents appear fine in the leader. The problem I have is that new documents added to the cloud aren't then being automatically applied to the followers

Highlighting all hits in a search with the Collapse/Expand filter in place

2017-08-08 Thread Peter Matthew Eichman
Hello all, We have a Solr index that contains documents representing OCR text blocks that each have a reference to the page they appear on. The pages are also Solr documents in our index. We have successfully used the Collapse/Expand query parser to group all of the text blocks that appear on one

Re: solrnet + stemming issue

2017-08-08 Thread Erick Erickson
Two things to try. First, if you changed the analysis chain, did you reindex? Second, what does the admin/analysis page show? Ok, there's a third, what does it show when you add =query as the parsed query? Best, Erick On Aug 8, 2017 12:04 AM, "KG S" wrote: Hi, I am

Re: IndexReaders cannot exceed 2 Billion

2017-08-08 Thread Mike Drob
> I have no idea whether you can successfully recover anything from that > index now that it has broken the hard limit. Theoretically, I think it's possible with some very surgical edits. However, I've tried to do this in the past and abandoned it. The code to split the index needs to be able to

Re: SynonymFilterFactory needs bounce for every change

2017-08-08 Thread Erick Erickson
If this is SolrCloud, you should not use the core admin API, use the collections API. FWIW. Erick On Aug 8, 2017 8:20 AM, "Abhijit Pawar" wrote: > Thanks Eric! > Yes that helped.I am reloading my core >> :8983/solr/admin/cores? > action=RELOAD=mycore > Now the

Re: IndexReaders cannot exceed 2 Billion

2017-08-08 Thread Shawn Heisey
On 8/7/2017 9:41 AM, Wael Kader wrote: > I faced an issue that is making me go crazy. > I am running SOLR saving data on HDFS and I have a single node setup with > an index that has been running fine until today. > I know that 2 billion documents is too much on a single node but it has > been

Re: SynonymFilterFactory needs bounce for every change

2017-08-08 Thread Abhijit Pawar
Thanks Eric! Yes that helped.I am reloading my core >> :8983/solr/admin/cores? action=RELOAD=mycore Now the synonyms are showing up w/o bounce. Thanks Shawn! I will try this option of managed synonyms to update the list... Regards, Abhijit On Mon, Aug 7, 2017 at 6:56 PM, Erick

full import of a single entity using cron job

2017-08-08 Thread bhargava ravali koganti
Hi, I have two enteties where both have to be done full-import and both the entities are related to same table. I gave a cron job where in the http request i specified the required entity, however, both the entities are being executed. what is the issue? How to let only one entity run? Thanks,

Re: Issue with delta import

2017-08-08 Thread bhargava ravali koganti
Tried it had no impact. -- View this message in context: http://lucene.472066.n3.nabble.com/Issue-with-delta-import-tp4347680p4349577.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr in memory testing

2017-08-08 Thread Xie, Sean
There is MiniSolrCloudCluster that you can use for testing. This is from solr-test-framework: https://github.com/apache/lucene-solr/tree/master/solr/test-framework. On 8/8/17, 7:54 AM, "Thaer Sammar" wrote: Hi, We are using solr 6.6, and we are looking for

missing documents after restart

2017-08-08 Thread John Blythe
hi all. i have a core that contains about 22 million documents. when the solr server is restarted it drops to 200-400k. the dashbaord says that it's both optimized and current. is there config issues i need to address in solr or the server? not really sure where to begin in hunting this down.

Re: JSON facet SUM precision and accuracy is incorrect

2017-08-08 Thread Yonik Seeley
This is due to function queries currently lacking type information (this problem will occur anywhere function queries are used and is not unique to JSON Facet). Function queries were originally only used in lucene scoring (which only uses float). The inner sum(amount1_d,amount2_d) uses

solr in memory testing

2017-08-08 Thread Thaer Sammar
Hi, We are using solr 6.6, and we are looking for guidance documentation or java example on how to create a solr core inmeory for the purpose of testing using solrj. We found https://wiki.searchtechnologies.com/index.php/Unit_Testing_with_Embedded_Solr but this works for solr v.4 and earlier

JSON facet SUM precision and accuracy is incorrect

2017-08-08 Thread Patrick Chan
Appreciate if anyone can help raise an issue for the JSON facet sum error my staff Edwin raised earlier but have not gotten any response from the Solr community and developers. Our production operation is urgently needing this accuracy to proceed as it impacts audit issues. Best regards,

Re: Unable to integrate OpenNLP with Solr

2017-08-08 Thread sweta
Hi All, We were able to fix the issue by Manually making changes to the files which didn't come through by reading the patch line by line -- View this message in context: http://lucene.472066.n3.nabble.com/Unable-to-integrate-OpenNLP-with-Solr-tp4345601p4349568.html Sent from the Solr -

Re: Managed Synonyms query

2017-08-08 Thread sweta
Hi, Sorry for delayed response.. We are using Solr 5.3 version and index side Synonyms -- View this message in context: http://lucene.472066.n3.nabble.com/Managed-Synonyms-query-tp4338946p4349567.html Sent from the Solr - User mailing list archive at Nabble.com.

IndexReaders cannot exceed 2 Billion

2017-08-08 Thread Wael Kader
> > Hello, > > I am facing an issue on my live environment and I couldn’t find a solution > yet. > I am running SOLR saving data on HDFS and I have a single node setup with an > index that has been running fine until today. > I know that 2 billion documents is too much on a single node but it

Re: DataImportHandler: full import of a single entity

2017-08-08 Thread bhargava ravali koganti
It is not working if a cron job is given. It is executing the other enities as well. Is there any solution? -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-full-import-of-a-single-entity-tp2258037p4349551.html Sent from the Solr - User mailing list archive

solrnet + stemming issue

2017-08-08 Thread KG S
Hi, I am using SoLR 4.2.1 since many years. I am using test_search field for free text search an its definition is as below in schema.xml. Recently I introduced a new issue in free text search in my site. When I search for "saks" it gives me result for those documents which have word "sak" but