Re: Sort with subquery

2017-11-27 Thread Erick Erickson
No. You're missing the point that [subquery] is called when assembling the return packet, which consists of only the top N docs from your query against the static collection, _not_ as part of the search which it would have to be to do what you want. To sort the complete result set, you have to

Re: Sort with subquery

2017-11-27 Thread Jinyi Lu
Thank you for the reply! In terms of sort, I am wondering is it possible to sort the docs from my static collection based on the corresponding counts in the dynamic collection, since we have combined them together in the result. Something like: sort=max(status.cnt) asc Or is it possible to add

RE: Solr Spellcheck

2017-11-27 Thread GVK Prasad
Hi Alessandro, My search and request handler are as included below. This config included with version 6.3.0 text_general default term solr.DirectSolrSpellChecker internal 0.5 2 1 5

IndexMergeTool to adhere to TieredMergePolicyFactory settings

2017-11-27 Thread Zheng Lin Edwin Yeo
Hi, I'm currently using Solr 6.5.1. I found that in the IndexMergeTool.java, we found that there is this line which set the maxNumSegments to 1. writer.forceMerge(1); For this, does it means that there will always be only 1 segment after the merging? From what I see, that seems to be the

Re: Sort with subquery

2017-11-27 Thread Erick Erickson
I'm not quite sure what "sort the results" means here. The [subquery] bit just adds a field to the output of the top N. So what you'd be doing here is just getting the top 10 (if =10) from your static collection, then adding the counts to them from the "dynamic" collection. So the sort here you're

Sort with subquery

2017-11-27 Thread Jinyi Lu
Hi all, I have a question about how to sort results based on the fields in the subquery. It’s exactly same as this question posted on the stackoverflow https://stackoverflow.com/questions/47127478/solr-how-to-sort-based-on-subquery but no answer yet. Basically, I have two collections: 1.

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-27 Thread Joe Obernberger
Just to add onto this.  Right now the cluster has recovered, and life is good.  My concern with a cluster restart are, lock files, and network timeouts on startup.  The 1st can be addressed by stopping indexing, waiting until things flush out, and then halting all the nodes.  No lock files.

Inverted Index positions vs Term Vector positions

2017-11-27 Thread alessandro.benedetti
Hi all, it may sounds a silly question, but is there any reason that the term positions in the inverted index are using 1 based numbering while the Term Vector positions are using a 0 based numbering[1] ? This may affect different areas in Solr and cause problems which are quite tricky to spot.

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-11-27 Thread Joe Obernberger
Thank you Erick.  Right now, we have our autoCommit time set to 180 (30 minutes), and our autoSoftCommit set to 12.  The thought was that with HDFS we want less frequent, but larger operations, since HDFS has such a large block size.  Is that incorrect thinking? As to why we are using

Re: Solr7 org.apache.lucene.index.IndexUpgrader

2017-11-27 Thread Shawn Heisey
On 11/27/2017 2:58 AM, Leo Prince wrote: > Actually I have two major cores. One I have primary document store as MySQL > and I can populate and re-index data from MySQL. However the other core > with 40mil, is keeping as primary store (with stored=true), I get the fact > that it's not a good

Re: Solr Spellcheck

2017-11-27 Thread alessandro.benedetti
Do you mean you are over-spellchecking ? Correcting even "not mispelled words" ? Can you give us the request handler configuration, spellcheck configuration and the schema ? Cheers - --- Alessandro Benedetti Search Consultant, R Software Engineer, Director Sease Ltd. -

RE: Spellchecker Results

2017-11-27 Thread Sadiki Latty
This is perfect, thanks Emir. -Original Message- From: Emir Arnautović [mailto:emir.arnauto...@sematext.com] Sent: November-27-17 4:21 AM To: solr-user@lucene.apache.org Subject: Re: Spellchecker Results Hi Sid, I don’t think that such feature is added to Solr, but there is Sematext’s

Re: Solr7 org.apache.lucene.index.IndexUpgrader

2017-11-27 Thread Rick Leir
Leo Your low priority data could be accumulated in a Couchbase DB or just in JSONL. Then it would be easy to re-index. Cheers -- Rick -- Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Embedded SOLR - Best practice?

2017-11-27 Thread alessandro.benedetti
When you say " caching 100.000 docs" what do you mean ? being able to quickly find information in a corpus which increases in size ( 100.000 docs) everyday ? I second Erick, I think this is fairly normal Solr use case. If you really care about fast searches, you will get a fairly acceptable

Re: TimeZone issue

2017-11-27 Thread alessandro.benedetti
Hi, it is on my TO DO list with low priority, there is a Jira issue already[1], feel free to contribute it ! [1] https://issues.apache.org/jira/browse/SOLR-8952 - --- Alessandro Benedetti Search Consultant, R Software Engineer, Director Sease Ltd. - www.sease.io -- Sent

Re: Merging of index in Solr

2017-11-27 Thread Zheng Lin Edwin Yeo
Hi, I found that in the IndexMergeTool.java, we found that there is this line which set the maxNumSegments to 1 writer.forceMerge(1); For this, does it means that there will always be only 1 segment after the merging? Is there any way which we can allow the merging to be in multiple segment,

Re: Solr7 org.apache.lucene.index.IndexUpgrader

2017-11-27 Thread Leo Prince
Hi Daniel, Thanks for the help. Actually I have two major cores. One I have primary document store as MySQL and I can populate and re-index data from MySQL. However the other core with 40mil, is keeping as primary store (with stored=true), I get the fact that it's not a good practice, however

Re: Strip out punctuation at the end of token

2017-11-27 Thread Emir Arnautović
Hi Sergio, Is this the only case that needs “special” handling? If you are only after matching phone numbers then you need to think about both false negatives and false positives. E.g. if you go with only WDFF you will end up with ‘008’ token. That means that you will also return this doc for

Re: Solr7 org.apache.lucene.index.IndexUpgrader

2017-11-27 Thread Daniel Collins
Leo, the general rule of thumb here is that the Solr index should *not* be your main document store. It is the index to your document store, but if it needs to be re-indexed, you should use your document store as the place to index from. Your index will not have the full source data (unless ALL

Re: Spellchecker Results

2017-11-27 Thread Emir Arnautović
Hi Sid, I don’t think that such feature is added to Solr, but there is Sematext’s component that does what you need: https://github.com/sematext/solr-researcher/tree/master/dym HTH, Emir -- Monitoring - Log Management - Alerting -