Re: Solr 5.2.1 deadlock on commit

2015-12-13 Thread Emir Arnautović
Hi Ali, I don't think that it is deadlock but like Mikhail said it is saturated and should try reduce load - try to make it work firs and then increase load to see its limits. It would be beneficial if you monitor your Solr and see bottlenecks. One such solution is Sematext's SPM:

Re: SOLR and string comparison functions

2017-09-18 Thread Emir Arnautović
Hi Darius, This seems to me like misuse/misunderstanding of Solr. As you probably noticed, Solr score is not normalised - you cannot compare scores of two queries and tell if one result match better query than the other. There are some techniques to achieve something close, but that is not that

Re: Solr nodes crashing (OOM) after 6.6 upgrade

2017-09-19 Thread Emir Arnautović
Hi Shamik, Can you tell us a bit more about how you use Solr before it OOM. Do you observe some heavy indexing or it happens during higher query load. Does memory slowly increases or jumps suddenly? Do you have any monitoring tool to see if you can correlate some metric with memory increase?

Re: Solr nodes crashing (OOM) after 6.6 upgrade

2017-09-22 Thread Emir Arnautović
It does not have to be query load - it can be one heavy query that cause memory consumption (heavy faceting, deep paging,…) and after that GC jumps in. Maybe you could start with log and see if there are queries that have large QTime, Emir > On 22 Sep 2017, at 12:00, shamik

Re: Is there a way to delete multiple documents using wildcard?

2017-09-21 Thread Emir Arnautović
Hi, Delete by query should work - posting to /update *:* should delete all doc. HTH, Emir > On 21 Sep 2017, at 05:25, balmydrizzle wrote: > > Doesn't work, either. wildcard query can't be used in delete. At least for > old Solr 3.x > > > > -- > Sent from:

Re: mm is not working if you have same term multiple times in query

2017-09-22 Thread Emir Arnautović
cally ,will definitely try to use the function query. > > Thanks, > Aman Deep Singh > > On 22-Sep-2017 6:25 PM, "Emir Arnautović" <emir.arnauto...@sematext.com> > wrote: > >> Hi Aman, >> You have wrong expectations: Edismax does respect mm, it’s j

Re: mm is not working if you have same term multiple times in query

2017-09-22 Thread Emir Arnautović
Hi Aman, You have wrong expectations: Edismax does respect mm, it’s just that it is met. If you take a look at parsed query, it’ll be something like: +(((name:lock) (name:lock))~2) And from dismax perspective it found both terms. It will not start searching for the next term after first is found

Re: Rescoring from 0 - full

2017-09-21 Thread Emir Arnautović
Hi Dariusz, You could use fq for filtering (can disable caching to avoid polluting filter cache) and q=*:*. That way you’ll get score=1 for all doc and can rerank. The issue with this approach is that you rerank top N and without score they wouldn’t be ordered so it is no-go. What you could do

Re: Seeing very low ingestion performance for a single non-cloud Solr core

2017-09-21 Thread Emir Arnautović
Hi, What are your commit configs? Maybe you are committing too frequently. Thanks, Emir > On 21 Sep 2017, at 06:19, saiks wrote: > > Hi, > > Environment: > - Solr is running in non-cloud mode on 6.4.2, Sun Java8, Linux > 4.4.0-31-generic x86_64 > - Ingesting into a

Re: Question regarding Upgrading to SolrCloud

2017-10-05 Thread Emir Arnautović
Hi Sharma, Please see inline answers. Thanks, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 5 Oct 2017, at 09:00, Gopesh Sharma wrote: > > Hello Guys, > > As of now

Re: Complexphrase treats wildcards differently than other query parsers

2017-10-05 Thread Emir Arnautović
t the majority of QPs skipping > analysis for wildcards. > > So I'm still confused as to why complexphrase does things differently. > > Thanks, > /Bjarke > > 2017-10-05 10:16 GMT+02:00 Emir Arnautović <emir.arnauto...@sematext.com>: > >> Hi Bjarke, >> I

Re: Complexphrase treats wildcards differently than other query parsers

2017-10-05 Thread Emir Arnautović
Hi Bjarke, It is not multiterm that is causing query parser to skip analysis chain but wildcard. The majority of query parsers do not analyse query string if there are wildcards. HTH Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support

Re: Very high number of deleted docs

2017-10-04 Thread Emir Arnautović
index. >> >> Amrit Sarkar >> Search Engineer >> Lucidworks, Inc. >> 415-589-9269 >> www.lucidworks.com >> Twitter http://twitter.com/lucidworks >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2 >> >> On Wed, Oct 4, 2017 at 6:02 P

Re: solrj howto update documents with expungeDeletes

2017-10-04 Thread Emir Arnautović
y > commit(String collection, boolean waitFlush, boolean waitSearcher, boolean > softCommit) > > Or is expungeDeletes true/false a special combination of the boolean > parameters? > > Regards, Bernd > > > Am 04.10.2017 um 13:27 schrieb Emir Arnautović: >> H

Re: solrj howto update documents with expungeDeletes

2017-10-04 Thread Emir Arnautović
Hi Bernd, When it comes to updating, it does not exist because indexed documents are not updatable - you can add new document with the same id and old one will be flagged as deleted. No need to delete explicitly. When it comes to expungeDeletes - that is a flag that can be set when committing.

Re: Very high number of deleted docs

2017-10-04 Thread Emir Arnautović
Hi Markus, You can set reclaimDeletesWeight in merge settings to some higher value than default (I think it is 2) to favor segments with deleted docs when merging. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training -

Re: Solr 7 default Response now JSON instead of XML causing issues

2017-10-03 Thread Emir Arnautović
t "visible" handlers? > > Thanks > Roland > > -Original Message- > From: Emir Arnautović [mailto:emir.arnauto...@sematext.com] > Sent: Monday, October 2, 2017 8:07 AM > To: solr-user@lucene.apache.org > Subject: Re: Solr 7 default Response now JSON instead of XML cau

Re: Need help with Slow Query Logging

2017-10-10 Thread Emir Arnautović
Hi Atita, You should definetely go with log4j configuration as anything else would be redoing what log4j can do. You already have slowQueryThresholdMillies to make slow queries log with WARN and you can configure log4j to put such logs (class + level) to a separate file. This seems like

Re: Need help with Slow Query Logging

2017-10-10 Thread Emir Arnautović
mponents. > Please find my log4j at : *https://pastebin.com/uTLAiBE5 > <https://pastebin.com/uTLAiBE5>* > > Any help on this will surely be appreciated. > > Thanks again. > > Atita > > > On Tue, Oct 10, 2017 at 1:39 PM, Emir Arnautović < > emir.arnauto...@

Re: Issue while using Document Routing in SolrCloud 6.1

2017-10-10 Thread Emir Arnautović
Hi Ketan, Is it possible that you are indexing only one tenant and that is causing single shard to become hotspot? Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 10 Oct 2017, at 12:47, Ketan

Re: Concern on solr commit

2017-10-13 Thread Emir Arnautović
to deliver from solr > irrespective of network overhead, so any thoughts whether commit frequency > affects solr latency..? > > Thanks, > Leo Prince > > On Fri, Oct 13, 2017 at 2:46 PM, Emir Arnautović < > emir.arnauto...@sematext.com> wrote: > >> Hi Leo,

Re: E-Commerce Search: tf-idf, tie-break and boolean model

2017-10-16 Thread Emir Arnautović
Hi Vincenzo, Unless you have really specific ranking requirements, I would not suggest you to start with you proprietary similarity implementation. In most cases edismax will be good enough to cover your requirements. It is not easy task to tune edismax since it has a log knobs that you can

Re: Unbalanced CPU no SolrCloud

2017-10-16 Thread Emir Arnautović
Hi Mahmoud, Do you use routing? Are your servers equally balanced - do you end up having approximately the same number of documents hosted on both servers (counted all shards)? Do you have anything else running on those servers? How do you initialise your SolrJ client? Are documents of similar

Re: Unbalanced CPU no SolrCloud

2017-10-16 Thread Emir Arnautović
>> >> And the documents are approximately the same size. >> >> I Used 10 threads with 10 SolrClients to send data to solr and every >> thread send a batch of 1000 documents every time. >> >> Thanks, >> Mahmoud >> >> >> >> On

Re: Unbalanced CPU no SolrCloud

2017-10-16 Thread Emir Arnautović
lk indexing process. > As you see the write operations on the loaded server are 3x the normal > server despite Disk writes not 3x times. > > Mahmoud > > > On Mon, Oct 16, 2017 at 12:32 PM, Emir Arnautović < > emir.arnauto...@sematext.com> wrote: > >>

Re: Error adding replica after a delete replica

2017-10-06 Thread Emir Arnautović
Hi, How did you delete replica? Did you see any errors in logs after deleting? How did/does it look from ZK perspective after deleting that replica? Thanks, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/

Re: Unbalanced CPU no SolrCloud

2017-10-16 Thread Emir Arnautović
(ExecuteProduceConsume.java:148) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(Queued

Re: Howto verify that update is "in-place"

2017-10-17 Thread Emir Arnautović
Hi James, I did not try, but checking max and num doc might give you info if update was in-place or atomic - atomic is reindexing of existing doc so the old doc will be deleted. In-place update should just update doc values of existing doc so number of deleted docs should not change. HTH, Emir

Re: Unbalanced CPU no SolrCloud

2017-10-17 Thread Emir Arnautović
RewriteHandler.java:335) >>> at >>> org.eclipse.jetty.server.handler.HandlerWrapper.handle( >> HandlerWrapper.java:134) >>> at org.eclipse.jetty.server.Server.handle(Server.java:534) >>> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) >>> at >>> org.eclipse.jetty.se

Re: Issue while using Document Routing in SolrCloud 6.1

2017-10-11 Thread Emir Arnautović
Hi Ketan, Each shard is a separate index and if you are indexing 100doc/sec without routing with two shards, you are indexing 50 docs/shard. If you have routing, and all documents are from single tenant, single shard has to be able to process 100doc/sec. If you have two nodes it means that you

Re: Need help with Slow Query Logging

2017-10-12 Thread Emir Arnautović
> On Tue, Oct 10, 2017 at 5:58 PM, Atita Arora <atitaar...@gmail.com> wrote: > >> Sure thanks Emir, >> Let me give them a quick try and I'll update you. >> >> Thanks, >> Atita >> >> On Tue, Oct 10, 2017 at 5:28 PM, Emir Arnautović < >>

Re: Concern on solr commit

2017-10-13 Thread Emir Arnautović
Hi Leo, It is considered a bad practice to commit from your application. You should let Solr handle commits. There is a great article about soft and hard commits: https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Re: Replication on startup takes a long time

2017-09-25 Thread Emir Arnautović
Hi Eric, I don’t think that there are some bugs with searcher reopening - this is a scenario with a new slave: “But when I add a *new* slave pointing to the master…” So expected to have zero results until replication finishes. Regards, Emir > On 23 Sep 2017, at 19:21, Erick Erickson

Re: How to resolve Overlapping on DeckSearchers=2

2017-09-25 Thread Emir Arnautović
Hi Rubi, As you probably know, in order to have changes visible, you have to reopen searcher. Opening searcher includes warming up searcher. What happens to you is that new commit happens while previous commit did not result in a new searcher. What you can do: commit less frequently - it is

Re: PatternCaptureGroupTokenFilter

2017-09-27 Thread Emir Arnautović
bute doc > changes, if you have the bandwidth please go ahead and create a patch > for the docs > > Best, > Erick > > On Wed, Sep 27, 2017 at 1:53 AM, Emir Arnautović > <emir.arnauto...@sematext.com> wrote: >> Hi all, >> Is there some reason why PatternCap

Re: Filter Factory question

2017-09-29 Thread Emir Arnautović
>> There is a need for a special filter since the input has to be >>> normalized. >>>> That is the main requirement, splitting into pieces is optional. As far >>> as >>>> I know there is nothing in solr that knows about molecular formulas. >>>

Re: solr cloud without hard commit?

2017-09-29 Thread Emir Arnautović
Hi Wei, Hard commits are about data durability. It will roll over transaction logs and create index new index segment. If configured with openSearcher=false, they do not affect query performance much (other then taking some resources) since they do not invalidate caches. If you have transaction

Re: tipping point for using solrcloud—or not?

2017-10-02 Thread Emir Arnautović
Hi John, Your data volume does not require SolrCloud, especially if you isolate core that is related to your business from other cores. You mentioned that the second largest is logs core used for analytics - not sure what sort of logs, but if write intensive logging, you might want to isolate

Re: Solr 7 default Response now JSON instead of XML causing issues

2017-10-02 Thread Emir Arnautović
Hi Roland, I guess you can use defaults in solr config to set wt to XML. Something like: xml You can also use useParams=“xml_out” and in your params.json have xml params defined group xml_out with wt: “xml” HTH, Emir > On 2 Oct 2017, at 13:58, Roland Villemoes

Re: How to Index JSON field Solr 5.3.2

2017-10-02 Thread Emir Arnautović
Hi Sharma, I guess you are looking for nested documents: https://lucene.apache.org/solr/guide/6_6/uploading-data-with-index-handlers.html#UploadingDatawithIndexHandlers-NestedChildDocuments

Re: Default value from another field?

2017-10-03 Thread Emir Arnautović
Hi Jimi, I don’t think that you can do it using schema, but you could do it using custom update request processor chain. I quickly scanned to see if there is such processor and could not find one. The closest one is

Re: DocValues, Long and SolrJ

2017-09-26 Thread Emir Arnautović
Hi Phil, Are you saying that you get this error when you create fresh core/collection? This sort of errors are usually related to schema being changed after some documents being indexed. Thanks, Emir > On 25 Sep 2017, at 23:42, Phil Scadden wrote: > > I ran into a

Re: Solr 5.5.2 - Custom Function Query update

2017-09-26 Thread Emir Arnautović
Hi Florian, I am guessing that you are running Solr in SolrCloud Mode. Please see https://lucene.apache.org/solr/guide/6_6/adding-custom-plugins-in-solrcloud-mode.html and let us know if this helps.

Re: SOLR terminology

2017-09-28 Thread Emir Arnautović
Hi, Let’s start from the top and introduce also Shards, Primaries and Replicas: SolrCluster is a cluster of Solr Nodes. Nodes are part of the same cluster if reading configuration from the same “folder” of the same Zookeeper ensemble (ensemble = cluster in ZK terminology). Node is the instance

Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-27 Thread Emir Arnautović
Hi Arun, This is not the most simple query either - a dozen of phrase queries on several fields + the same query as bq. Can you provide debugQuery info. I did not look much into debug times and what includes what, but one thing that is strange to me is that QTime is 4s while query in debug is

Re: DocValues, Long and SolrJ

2017-09-27 Thread Emir Arnautović
t; started the operation. > > -Original Message- > From: Emir Arnautović [mailto:emir.arnauto...@sematext.com] > Sent: Tuesday, 26 September 2017 8:49 p.m. > To: solr-user@lucene.apache.org > Subject: Re: DocValues, Long and SolrJ > > Hi Phil, > Are you saying

Re: Filter Factory question

2017-09-27 Thread Emir Arnautović
Hi Homer, There is no need for special filter, there is one that is for some reason not part of documentation (will ask why so follow that thread if decided to go this way): You can use something like: This will capture all atom counts as a separate tokens. HTH, Emir > On 26 Sep 2017, at

PatternCaptureGroupTokenFilter

2017-09-27 Thread Emir Arnautović
Hi all, Is there some reason why PatternCaptureGroupTokenFilter is not documented even included in the code base? Thanks, Emir

Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-27 Thread Emir Arnautović
Hi Arun, It is hard to measure something without affecting it, but we could use debug results and combine with QTime without debug: If we ignore merging results, it seems that majority of time is spent for retrieving docs (~500ms). You should consider reducing number of rows if you want better

Re: when transaction logs are closing?

2017-10-09 Thread Emir Arnautović
Hi Bernd, I did not look at the code, but I would guess never. Solr tends to keep file handle for each file that it uses and it keeps last N transaction logs. Transaction log file is flushed and new one is created when you issue hard commit - with or without open searcher. At that moment it

Re: AW: Howto verify that update is "in-place"

2017-10-18 Thread Emir Arnautović
; > > > As for each update you are doing via atomic operation contains the > > "id" / "uniqueKey". Comparing the "_version_" field value for one of > > them would be fine for a batch. Rest, Emir has list them out. > > > > Amrit Sarkar > &g

Re: Schemaless detecting multivalued fields

2017-10-19 Thread Emir Arnautović
Hi John, You should be able to do that with custom update request processor chain and https://lucene.apache.org/solr/6_6_0//solr-core/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.html

Re: Solr nodes going into recovery mode and eventually failing

2017-10-19 Thread Emir Arnautović
Hi Shamik, I am pleased to see you find SPM useful! I think that your problems might be related to caches exhausting your memory. You mentioned that your index is 70GB, but how many documents it has? Remember that filter caches can take up to 1bit/doc. With 4096 filter cache size it means that

Re: Solr nodes going into recovery mode and eventually failing

2017-10-23 Thread Emir Arnautović
Hi Shamik, I agree that your filter cache is not the reason for OOMs. Can you confirm that your fieldCache and filedValueCache sizes are not consuming too much memory. The next on the list would be some heavy faceting with pivots, but you mentioned that all fields are low cardinality. Do you

Re: Solr nodes going into recovery mode and eventually failing

2017-10-23 Thread Emir Arnautović
You mentioned hat you are on v. 6.6, but in case someone else uses this, just to add that maxRamMB is added to FastLRUCache in version 6.4. Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 23 Oct

Re: Facets based on sampling

2017-10-24 Thread Emir Arnautović
Hi John, Did you mean “docValues don’t work for analysed fields” since it works for multivalue string (or other supported types) fields. What you need to do is to convert your analysed field to multivalue string field - that requires changes in indexing flow. HTH, Emir -- Monitoring - Log

Re: Solr nodes going into recovery mode and eventually failing

2017-10-24 Thread Emir Arnautović
Hi Shamik, Please see incline comments/questions. Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 24 Oct 2017, at 07:41, shamik wrote: > > Thanks Emir and Zisis. > > I added

Re: Limiting by range of sum across documents

2017-11-14 Thread Emir Arnautović
Hi Chris, I misunderstood your requirement. I am not aware of some facet result filtering feature. What you could do is sort facet results by sum and load page by page but that does not sound like a good solution. Did you try using streaming expressions - I don’t have much experience with this

Re: Reusable tokenstream

2017-11-22 Thread Emir Arnautović
Hi Roxana, I don’t think that it is possible. In some cases (seems like yours is good fit) you could create custom update request processor that would do the shared analysis (you can have it defined in schema) and after analysis use those tokens to create new values for those two fields and

Re: Merging of index in Solr

2017-11-22 Thread Emir Arnautović
Hi Edwin, Quick googling suggests that this is the issue of NTFS related to large number of file fragments caused by large number of files in one directory of huge files. Are you running this merging on a Windows machine? Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr &

Re: Reusable tokenstream

2017-11-22 Thread Emir Arnautović
access to the token stream and not reconstruct it? > Thanks, > Roxana > > > On Wed, Nov 22, 2017 at 10:26 AM, Emir Arnautović < > emir.arnauto...@sematext.com> wrote: > >> Hi Roxana, >> I don’t think that it is possible. In some cases (seems like yours is good

Re: JSON-B deserialization of Solr-response with highlightning

2017-11-23 Thread Emir Arnautović
Hi Magnus, Not sure if this is the right group for this question and I did not code this part for a long time, and not sure if fully understood the issue, but can you map higlighting to Map? Also, not sure if using this in example in your tests, but you are missing

Re: Strip out punctuation at the end of token

2017-11-23 Thread Emir Arnautović
Hi Sergio, You can use PatternCaptureGroupFilterFactory to emit both tokens. This token filter is not documented in recent documentation but it is still there. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training -

Re: Reusable tokenstream

2017-11-22 Thread Emir Arnautović
yway I can add a tokenstream to a SolrInputDocument (is > there any other class exposed by Solr during indexing that I can use for > this purpose?). > Am I correct or still missing something? > Thank you. > > > On Wed, Nov 22, 2017 at 11:33 AM, Emir Arnautović < > emir.arnauto.

Re: Spellchecker Results

2017-11-27 Thread Emir Arnautović
Hi Sid, I don’t think that such feature is added to Solr, but there is Sematext’s component that does what you need: https://github.com/sematext/solr-researcher/tree/master/dym HTH, Emir -- Monitoring - Log Management - Alerting -

Re: Strip out punctuation at the end of token

2017-11-27 Thread Emir Arnautović
Hi Sergio, Is this the only case that needs “special” handling? If you are only after matching phone numbers then you need to think about both false negatives and false positives. E.g. if you go with only WDFF you will end up with ‘008’ token. That means that you will also return this doc for

Re: Huge Query execution time for multiple ORs

2017-11-30 Thread Emir Arnautović
Hi Faraz, It is a bit worse than that - it also needs to calculate score, so for each matching doc of one query part it has to check if it appears in results of other query parts. If you use term query parser, you avoid calculating score - all doc will have score 1. Solr is based on lucene,

Re: Logging in Solrcloud

2017-12-05 Thread Emir Arnautović
Hi Stefan, I am not aware of option to log only client side queries, but I think that you can find workaround with what you currently have. If you take a look at log lines for query that comes from the client and one that is result of querying shards, you will see differences - the most simple

Re: Prevent Document to get partially indexed if document is not available

2017-12-13 Thread Emir Arnautović
Hi, Did you try with _version_=1 “If the content in the _version_ field is equal to '1', then the document must simply exist. In this case, no version matching occurs, but if the document does not exist, the updates will be rejected.” Regards, Emir -- Monitoring - Log Management - Alerting -

Re: SOLR nested dataimport issues

2017-12-18 Thread Emir Arnautović
Hi, I did not check it but it seems to me that it might be related to using full path in your fields xpath: you are iterating hash-es and you should probable set field paths assuming it is the new root. E.g. for id it would be: > HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly

Re: Is it safe to give users access to /admin/luke ?

2017-12-14 Thread Emir Arnautović
Hi, Depends on what you consider safe: - will user be able to change index - NO. - will user be able to get enough info to more or less restore document content - YES. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training -

Re: OOM spreads to other replica's/HA when OOM

2017-12-18 Thread Emir Arnautović
Hi Susheel, The fact that only node that received query OOM tells that it is about merging results from all shards and providing final result. It is expected that repeating the same query on some other node will result in a similar behaviour - it just mean that Solr does not have enough memory

Re: OOM spreads to other replica's/HA when OOM

2017-12-18 Thread Emir Arnautović
ction. Total > 12 machines with 6 shards and 6 replica's (replicationFactor = 2) > > On Mon, Dec 18, 2017 at 9:22 AM, Emir Arnautović < > emir.arnauto...@sematext.com> wrote: > >> Hi Susheel, >> The fact that only node that received query OOM tells that it is abou

Re: Identify Reference Leak in Custom Code related to Solr

2017-12-18 Thread Emir Arnautović
2 > Medium: https://medium.com/@sarkaramrit2 > > On Mon, Dec 18, 2017 at 5:13 PM, Emir Arnautović < > emir.arnauto...@sematext.com> wrote: > >> Hi Amrit, >> I’ll check with my colleague that worked on this. In the meantime, can you >> provide more inf

Re: Identify Reference Leak in Custom Code related to Solr

2017-12-18 Thread Emir Arnautović
Hi Amrit, I’ll check with my colleague that worked on this. In the meantime, can you provide more info about setup: Solr version, M-S or cloud and steps that we can do to reproduce it. Thanks, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting

Re: OOM spreads to other replica's/HA when OOM

2017-12-19 Thread Emir Arnautović
Hi Susheel, If a single query can cause node to fail and if retry cause replicas to be affected (still to be confirmed) then preventing retry logic on Solr side can only partially solve that issue - retry logic can exist on client side and it will result in replicas’ OOM. Again, not sure if

Re: How to restrict the fields solr returns?

2017-12-19 Thread Emir Arnautović
Hi, You could write custom search component that can be included in your request handler. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 19 Dec 2017, at 09:22, Solrmails

Re: Span queries

2017-12-19 Thread Emir Arnautović
Hi Sreenivas, If you go extreme and accept that you want to return even if slop is large, you could utilize edismax: use mm to define how much terms must match and pf and ps and pf2/3 and ps2/3 to boost results that match slop requirements. Maybe you can see if extending edismax might be

Re: How to restrict the fields solr returns?

2017-12-19 Thread Emir Arnautović
In addition to that, you can use invariants to disallow overriding it. Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 19 Dec 2017, at 14:23, Diego Ceccarelli wrote: >

Re: Limit search queries only to pull replicas

2017-12-14 Thread Emir Arnautović
Hi Stanislav, I don’t think that there is a built in feature to do this, but that sounds like nice feature of Solrj - maybe you should check if available. You can implement it outside of Solrj - check cluster state to see which shards are available and send queries only to pull replicas. HTH,

Re: Keep indexed records

2017-12-20 Thread Emir Arnautović
Hi Shashi, IMO it would be best if you put that logic on your controller where you start import. If you are doing that through admin console, the only solution I am aware of is to write your custom component. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr &

Re: Multipath hierarchical faceting

2017-11-17 Thread Emir Arnautović
Hi, In order to use this feature, you would have to patch Solr and build it yourself. But note that this ticket is old and the last patch version is from 2014, so you would either have to patch referenced version of adjust patch to work with version that you are targeting. HTH, Emir --

Re: Multiple collections for a write-alias

2017-11-10 Thread Emir Arnautović
This approach could work only if it is append only index. In case you have updates/deletes, you have to process in order, otherwise you will get incorrect results. I am thinking that is one of the reasons why it might not be supported since not too useful. Emir -- Monitoring - Log Management -

Re: Limiting by range of sum across documents

2017-11-13 Thread Emir Arnautović
Hi Chris, You mention it returns all manufacturers? Even after you apply filters (don’t see filter in your example)? You can control how many facets are returned with facet.limit and you can use face.pivot.mincount to determine how many facets are returned. If you calculate sum on all

Re: minimum should match for only for few fields

2017-11-13 Thread Emir Arnautović
Hi Vincenzo, It is not perfect, but you could achieve something similar using _query_ hook, e.g.: =lucene=_query_:”{defType=edismax qf=‘f1 f2’ mm=‘2’}my query” OR _query_:”{defType=edismax qf=‘f3 f4’ mm=‘1’}my query” HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr

Re: get all tokens from TokenStream in my custom filter

2017-11-20 Thread Emir Arnautović
Hi Kumar > Emir , i need all tokens of query in incrementToken() function not only > current token That was just an example - the point was that you need to set attributes - you can read all tokens from previous stream, do whatever needed with them and when ready, set attributes and return

Re: Leading wildcard searches very slow

2017-11-20 Thread Emir Arnautović
Hi Sundeep, The simplified explanation is that terms are indexed to be more prefix search friendly (and that is why Amrit suggested that you index term reversed if you want leading wildcard). If you use leading wildcard, there is no structure to limit terms that can be matched and engine has to

Re: Merging of index in Solr

2017-11-21 Thread Emir Arnautović
Hi Edwin, How many host/nodes/shard are those 3.5TB? I am not familiar with merge code, but trying to think what it might include, so don’t take any of following as ground truth. Merging for sure will include segments rewrite, so you better have additional 3.5TB if you are merging it to a

Re: Custom analyzer & frequency

2017-11-21 Thread Emir Arnautović
Hi Alain, You did not provided definition of used field type - you use “nametext” type and pasted “text_ami” field type. It is possible that you have omitTermFrequenciesAndPosition=“true” on nametext field type. The default value for text fields should be false. HTH, Emir -- Monitoring - Log

Re: Please help me with solr plugin

2017-11-21 Thread Emir Arnautović
Hi Zara, What sort of plugins are you trying to build? What sort os issues did you run into? Maybe you are not too far from having running custom plugin. I would recommend you try running some of existing plugins as your own - just to make sure that you are able to build and configure custom

Re: Custom analyzer & frequency

2017-11-21 Thread Emir Arnautović
Hi Alain, As explained in prev mail that is doc frequency and each doc is counted once. I am not sure if Luke can provide you information about overall term frequency - sum of term frequency of all docs. Regards, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr &

Re: Merging of index in Solr

2017-11-21 Thread Emir Arnautović
of space, but the merging is > still going on. > > The index are currently updates free. We have only index the data in 2 > different collections, and we now need to merge them into a single > collection. > > Regards, > Edwin > > On 21 November 2017 at 16:52, Emir

Re: Custom analyzer & frequency

2017-11-21 Thread Emir Arnautović
Lucene base I build outside Solr, I see top terms freq same on 2 > panels. > Do you know a reason for that ? > Does this have an impact on Solr search ? Does bad freq in "top terms" > come from Luke or Solr ? > > > 2017-11-21 12:08 GMT+01:00 Emir Arnautović <emir.arnauto...@

Re: Limiting by range of sum across documents

2017-11-13 Thread Emir Arnautović
Hi Chris, I assumed that you apply some sort of fq=price:[100 TO 200] to focus on wanted products. Can you share full json faceting request - numFound:0 suggest that something is completely wrong. Thanks, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch

Re: get all tokens from TokenStream in my custom filter

2017-11-16 Thread Emir Arnautović
Hi, Maybe you are not pasting entire filter code, but you need to set attributes and return true. Take a look at come other token filter how it is done. Here is how it is done in UpperCaseFilter: public final boolean incrementToken() throws IOException { if (input.incrementToken()) {

Re: how to ensure that one shard does not get overloaded when we use routing

2017-11-03 Thread Emir Arnautović
Hi Ketan, I’ll just add that with 4 shards you might just as well skip bits part - all tenant document will end up on a single shard anyway. Unless you have a lot projectIds or all have pretty much the same number of documents, and you always search single projectId, I would reevaluate using

Re: Advice on Stemming in Solr

2017-11-03 Thread Emir Arnautović
g words and applicable flags, and an affix file > that specifies how these flags will control spell checking. > Probably we can control it from those files in HunspellStemFilterFactory? > > Regards, > Edwin > > > On 2 November 2017 at 17:46, Emir Arnautović <emir.arnaut

Re: match in order

2017-11-05 Thread Emir Arnautović
Hi Vincenco, Since it is about boosting, you might also take a look at edismax and pf2 and pf3 fields. It also supports slop (ps2 and ps3). HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 5

Re: SolrClould 6.6 stability challenges

2017-11-05 Thread Emir Arnautović
t;> Lucidworks, Inc. >> 415-589-9269 >> www.lucidworks.com >> Twitter http://twitter.com/lucidworks >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2 >> >> On Sat, Nov 4, 2017 at 2:47 PM, Emir Arnautović < >> emir.arnauto...@sematext.com> wrote: >&g

Re: Anyone have any comments on current solr monitoring favorites?

2017-11-06 Thread Emir Arnautović
uired to chime in here!  > > > Is there a non-expiring dev version I could experiment with? I think I did > sign up for a trial years ago from a different company... I was actually > wondering about hooking it up to my personal AWS based solr cloud instance. > > > Thanks > > Rob

  1   2   3   4   >