Re: Delete By Query issue followed by Delete By Id Issues

2018-07-05 Thread sujatha sankaran
Hi Emir, We are deleting a larger subset of docs with a particular value which we know based on the id and only updating a few of the deleted. Our document is of the form __, we need to delete all that has the same , that are no longer in DB and then update only a few that has been updated in DB.

Re: NgramTokenizerFactory question

2018-07-05 Thread Kudrettin Güleryüz
Thank you for the explanation. To close the loop, I was able to track the problem down to the Lucene Query parser on 5.2.1 which returned +body:"123 234 345 456" for a query string 123456. Turned out that It is possible to get the same behavior by turning on split on white-space and auto

Re: exact Match and Contains

2018-07-05 Thread Erick Erickson
First, attachments are aggressively stripped by the mail server, none of your images came through. Second, try adding =query to the URL and look at the parsed query returned, that should provide some good hints. Best, Erick On Thu, Jul 5, 2018 at 7:18 AM, Rushikesh Garadade <

Re: push to the limit without going over

2018-07-05 Thread Erick Erickson
Arturas: " it is becoming incredibly difficult to find working code" Yeah, I sympathize totally. What I usually do is go into the test code of whatever version of Solr I'm using and find examples there. _That_ code _must_ be kept up to date ;). About batching docs. What you gain basically more

Re: how to use HTMLStripCharFilter in solrJ?

2018-07-05 Thread Ahmet Arslan
Hi Arturas,  Here are some things to try : 1) HTMLStripCharFilter stripper = new HTMLStripCharFilter(strReader.markSupported() ? strReader : new BufferedReader(strReader)) 2) Consider using HTML Strip update processor factory.  3) Create a custom Lucene analyzer using html strip char filter

Re: push to the limit without going over

2018-07-05 Thread Shawn Heisey
On 7/4/2018 3:32 AM, Arturas Mazeika wrote: Details: I am benchmarking solrcloud setup on a single machine (Intel 7 with 8 "cpu cores", an SSD as well as a HDD) using the German Wikipedia collection. I created 4 nodes, 4 shards, rep factor: 2 cluster on the same machine (and managed to push the

Re: Maximum number of SolrCloud collections in limited hardware resource

2018-07-05 Thread Alexandre Rafalovitch
Does it need to be a SolrCloud? If it is just replication, maybe it can just be double indexed from the client. Or old style replication. And then use LotsOfCores autoloading. Regards, Alex On Wed, Jun 27, 2018, 8:46 AM Shawn Heisey, wrote: > On 6/27/2018 5:10 AM, Sharif Shahrair wrote: >

Fwd: exact Match and Contains

2018-07-05 Thread Rushikesh Garadade
Small Correction in the mail above: attachmentType is managed-schema is: -- Forwarded message - From: Rushikesh Garadade Date: Thu, Jul 5, 2018 at 7:43 PM Subject: exact Match and Contains To: Hi, I have field attachmentType in my collection whose schema is as follows:

Re: push to the limit without going over

2018-07-05 Thread Arturas Mazeika
Hi Erick et al, Thanks a lot for the response. Your explanation seems very plausible and I'd love to investigate those further. Batching the docs (for me surprisingly) improved the numbers: Buffer size secs MB/s Docs/s N:500 1117 34.4077538 2400.72695 N:100 1073 35.8186962 2499.17241 N:10 1170

exact Match and Contains

2018-07-05 Thread Rushikesh Garadade
Hi, I have field attachmentType in my collection whose schema is as follows: when I search for attachmentType:application/pdf i.e. /select *?q=attachmentType:application/pdf * I get results [image: image.png] When I search for attachmentType:*application/pdf* i.e. /select?

Re: Maximum number of SolrCloud collections in limited hardware resource

2018-07-05 Thread Erick Erickson
Just set the size parameter in solrconfig.xml to 0. Best, Erick On Wed, Jul 4, 2018 at 10:37 PM, Sharif Shahriar wrote: > Hi Emir, > Thanks a lot for your reply. In your reply you've mentioned- > If you stick with multiple collections, you can turn off caches completely, > monitor latency and

Re: Creating single CloudSolrClient object which can be used throughout the application

2018-07-05 Thread Erick Erickson
Just a try block or a try-with-resources block? I.e. try (CloudSolrClient csc = new ClousSolrClient.) { } try-with-resources is _designed_ to call close when the block is exited, there's no mystery involved at all. Best, Erick On Wed, Jul 4, 2018 at 10:13 PM, Ritesh Kumar wrote: > Hello

Solr Kerberos Authentication

2018-07-05 Thread Greenhorn Techie
Hi, In the solr documentation, it is mentioned that blockUnknown property for Authentication plugin has the default value of false, which means any authenticated users will be allowed to use Solr. However, wondering whether this parameter only makes sense for Basic Authentication only or does it

Re: Maximum number of SolrCloud collections in limited hardware resource

2018-07-05 Thread Sharif Shahriar
Hi Emir, Thanks a lot for your reply. In your reply you've mentioned- If you stick with multiple collections, you can turn off caches completely, monitor latency and turn on caches for collections when it is reaching some threshold. -How this can be done? Is there any configuration to turn off

Re: AddReplica to shard with lowest node count

2018-07-05 Thread Gus Heck
Ah hmm I guess I didn't realize the autoscaling didn't use the rule based stuff (haven't had opportunity to work with either). If it's deprecated, maybe that suggests we need a highly visible warning box on the ref guide page? On Thu, Jul 5, 2018 at 12:18 AM, Shalin Shekhar Mangar <

Re: how to use HTMLStripCharFilter in solrJ?

2018-07-05 Thread Alexandre Rafalovitch
I am confused. Why you do not just add the CharFilter definition to the field type you need? You see to be trying to do it completely on the cliwnt side? No sure. Regards, Alex On Thu, Jul 5, 2018, 2:53 AM Arturas Mazeika, wrote: > Hi Solr Folk, > > What would be the easiest way to use

Re: Querying in Solrcloud

2018-07-05 Thread Arturas Mazeika
Hi Erick, wow. This Email had a such a profound effect and filled so many gaps in my head. I was wondering how master-slave (through replication) and (quorum based or whatever the name is) distribution live under the same hood in solr. And in such a concise manner! Good Job indeed. I wonder

how to use HTMLStripCharFilter in solrJ?

2018-07-05 Thread Arturas Mazeika
Hi Solr Folk, What would be the easiest way to use some of the Solr and Lucene components in SolrJ? I am pretty amazed how much thought and careful engineering went into some individual components to cover the wild real world effectively. And I wonder whether one could re-use some of them in