Use ConcurrentUpdateSolrClient some program
I use the solr is 6.0 version, the solrj is 6.0 version, using SolrCloud mode deployment, in my code did not make an explicit commit, configure the autoCommit and softAutoCommit, using the ConcurrentUpdateSolrClient class. When we send 100 million data, often read timeout exception occurred in this anomaly, the data is lost. I would like to ask a few questions: 1, ConcurrentUpdateSolrClient.add time, if not thrown on behalf of the data is not an exception has been successfully sent to the solr, this time is the synchronization of Well, that is, solr server to accept the data written to the log before we return? 2, if the answer to question 1 is no, then how do we determine ConcurrentUpdateSolrClient.add implementation failure, so that we have the wrong data retransmission processing. 3, there is no use ConcurrentUpdateSolrClient Thank you!
XFS or EXT4 on Amazon AWS AMIs
So what are people recommending for SOLR on AWS on Amazon AMI - ext4 or xfs? I saw an article about MongoDB - saying performance on Amazon was better due to a mutex issue on ext4 files and threaded calls. I have been using ext4 for a long time, but I am moving to r3.* instances and TRIM / DISCARD support just appears more supported on XFS. -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Is it possible to rewrite part of the solr response?
Thanks. I'll look into that stuff. The counts issue is really not a serious problem for us far as I know. On Wed, Dec 21, 2016 at 9:08 PM, Erick Erickson wrote: > "grab the response" is a bit ambiguous here in Solr terms. Sure, > a SearchComponent (you can write a plugin) gets the response, > but it only sees the final list being returned to the user, i.e. if you > have rows=15 it sees only 15 docs. Not sure that's adequate, > in the case above you could easily not be allowed to see any of > the top N docs. Plus, doing anything like this would give very > skewed things like facets, grouping, etc. Say the facets were > calculated over 534 hits but the user was only allowed to see 10 docs... > Very confusing. > > The most robust solution would be a "post filter", another bit > of custom code that you write (plugin). See: > http://yonik.com/advanced-filter-caching-in-solr/ > A post filter sees _all_ the documents that satisfy the query, > and makes an include/exclude decision on each one (just > like any other fq clause). So facets, grouping and all the rest > "just work". Do be aware that if the ACL calculations are expensive > you need to be prepared for the system administrator doing a > *:* query. I usually build in a bailout and stop passing documents > after some number and pass back a result about "please narrow > down your search". Of course if your business logic is such that > you can calculate them all "fast enough", you're golden. > > All that said, if there's any way you can build this into tokens in the > doc and use a standard fq clause it's usually much easier. That may > take some creative work at indexing time if it's even possible. > > Best, > Erick > > On Wed, Dec 21, 2016 at 5:56 PM, Mike Thomsen > wrote: > > We're trying out some ideas on locking down solr and would like to know > if > > there is a public API that allows you to grab the response before it is > > sent and inspect it. What we're trying to do is something for which a > > filter query is not a good option to really get where we want to be. > > Basically, it's an integration with some business logic to make a final > > pass at ensuring that certain business rules are followed in the event a > > query returns documents a user is not authorized to see. > > > > Thanks, > > > > Mike >
Re: Is it possible to rewrite part of the solr response?
Awesome explanation Eric, I'll be filing this away for future reference. On Dec 21, 2016 7:08 PM, "Erick Erickson" wrote: "grab the response" is a bit ambiguous here in Solr terms. Sure, a SearchComponent (you can write a plugin) gets the response, but it only sees the final list being returned to the user, i.e. if you have rows=15 it sees only 15 docs. Not sure that's adequate, in the case above you could easily not be allowed to see any of the top N docs. Plus, doing anything like this would give very skewed things like facets, grouping, etc. Say the facets were calculated over 534 hits but the user was only allowed to see 10 docs... Very confusing. The most robust solution would be a "post filter", another bit of custom code that you write (plugin). See: http://yonik.com/advanced-filter-caching-in-solr/ A post filter sees _all_ the documents that satisfy the query, and makes an include/exclude decision on each one (just like any other fq clause). So facets, grouping and all the rest "just work". Do be aware that if the ACL calculations are expensive you need to be prepared for the system administrator doing a *:* query. I usually build in a bailout and stop passing documents after some number and pass back a result about "please narrow down your search". Of course if your business logic is such that you can calculate them all "fast enough", you're golden. All that said, if there's any way you can build this into tokens in the doc and use a standard fq clause it's usually much easier. That may take some creative work at indexing time if it's even possible. Best, Erick On Wed, Dec 21, 2016 at 5:56 PM, Mike Thomsen wrote: > We're trying out some ideas on locking down solr and would like to know if > there is a public API that allows you to grab the response before it is > sent and inspect it. What we're trying to do is something for which a > filter query is not a good option to really get where we want to be. > Basically, it's an integration with some business logic to make a final > pass at ensuring that certain business rules are followed in the event a > query returns documents a user is not authorized to see. > > Thanks, > > Mike
Re: Is it possible to rewrite part of the solr response?
"grab the response" is a bit ambiguous here in Solr terms. Sure, a SearchComponent (you can write a plugin) gets the response, but it only sees the final list being returned to the user, i.e. if you have rows=15 it sees only 15 docs. Not sure that's adequate, in the case above you could easily not be allowed to see any of the top N docs. Plus, doing anything like this would give very skewed things like facets, grouping, etc. Say the facets were calculated over 534 hits but the user was only allowed to see 10 docs... Very confusing. The most robust solution would be a "post filter", another bit of custom code that you write (plugin). See: http://yonik.com/advanced-filter-caching-in-solr/ A post filter sees _all_ the documents that satisfy the query, and makes an include/exclude decision on each one (just like any other fq clause). So facets, grouping and all the rest "just work". Do be aware that if the ACL calculations are expensive you need to be prepared for the system administrator doing a *:* query. I usually build in a bailout and stop passing documents after some number and pass back a result about "please narrow down your search". Of course if your business logic is such that you can calculate them all "fast enough", you're golden. All that said, if there's any way you can build this into tokens in the doc and use a standard fq clause it's usually much easier. That may take some creative work at indexing time if it's even possible. Best, Erick On Wed, Dec 21, 2016 at 5:56 PM, Mike Thomsen wrote: > We're trying out some ideas on locking down solr and would like to know if > there is a public API that allows you to grab the response before it is > sent and inspect it. What we're trying to do is something for which a > filter query is not a good option to really get where we want to be. > Basically, it's an integration with some business logic to make a final > pass at ensuring that certain business rules are followed in the event a > query returns documents a user is not authorized to see. > > Thanks, > > Mike
Re: Is it possible to rewrite part of the solr response?
It would be custom code and I have something along those lines, although it throws an error instead of changing the response... Rushing now and can't go into more detail right now. On Dec 21, 2016 6:57 PM, "Mike Thomsen" wrote: > We're trying out some ideas on locking down solr and would like to know if > there is a public API that allows you to grab the response before it is > sent and inspect it. What we're trying to do is something for which a > filter query is not a good option to really get where we want to be. > Basically, it's an integration with some business logic to make a final > pass at ensuring that certain business rules are followed in the event a > query returns documents a user is not authorized to see. > > Thanks, > > Mike >
Is it possible to rewrite part of the solr response?
We're trying out some ideas on locking down solr and would like to know if there is a public API that allows you to grab the response before it is sent and inspect it. What we're trying to do is something for which a filter query is not a good option to really get where we want to be. Basically, it's an integration with some business logic to make a final pass at ensuring that certain business rules are followed in the event a query returns documents a user is not authorized to see. Thanks, Mike
Re: Solr streaming divide by zero exception coming
There is not nearly enough information here to say anything helpful. Please attach the stack trace, the query used etc. IOW, whatever you think would help someone else reproduce the problem. What version of Solr are you using? Best, Erick On Wed, Dec 21, 2016 at 11:14 AM, nelias wrote: > Hello , > > I am running a simple search expression on a secured solr server running on > cloud mode . Ran the stream exception in admin console . Getting > java.util.concurrent.ExecutionException . Ran the same via SolrJ and got > inside the source code and it's saying Divide by zero exception is coming > while opening the stream > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-streaming-divide-by-zero-exception-coming-tp4310792.html > Sent from the Solr - User mailing list archive at Nabble.com.
Solr streaming divide by zero exception coming
Hello , I am running a simple search expression on a secured solr server running on cloud mode . Ran the stream exception in admin console . Getting java.util.concurrent.ExecutionException . Ran the same via SolrJ and got inside the source code and it's saying Divide by zero exception is coming while opening the stream -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-streaming-divide-by-zero-exception-coming-tp4310792.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr Suggester
Hi All, I've a field like that: When I run a suggester on my_field_1 it returns response. However my_field_2 doesn't. I've defined suggester as: suggester FuzzyLookupFactory DocumentDictionaryFactory What can be the reason? Kind Regards, Furkan KAMACI
Re: Very long young generation stop the world GC pause
Also curious why such a large heap is required... If it's due to field caches being loaded I'd highly recommend MMapDirectory (if not using already) and turning on DocValues for all fields you plan to perform sort/facet/analytics on. steve On Wed, Dec 21, 2016 at 9:25 AM Pushkar Raste wrote: > You should probably have as small a swap as possible. I still feel long GCs > are either due to swapping or thread contention. > > Did you try to remove all other G1GC tuning parameters except for the > ParallelRefProcEnabled? > > On Dec 19, 2016 1:39 AM, "forest_soup" wrote: > > > Sorry for my wrong memory. The swap is 16GB. > > > > > > > > -- > > View this message in context: http://lucene.472066.n3. > > nabble.com/Very-long-young-generation-stop-the-world-GC- > > pause-tp4308911p4310301.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > >
Re: Very long young generation stop the world GC pause
You should probably have as small a swap as possible. I still feel long GCs are either due to swapping or thread contention. Did you try to remove all other G1GC tuning parameters except for the ParallelRefProcEnabled? On Dec 19, 2016 1:39 AM, "forest_soup" wrote: > Sorry for my wrong memory. The swap is 16GB. > > > > -- > View this message in context: http://lucene.472066.n3. > nabble.com/Very-long-young-generation-stop-the-world-GC- > pause-tp4308911p4310301.html > Sent from the Solr - User mailing list archive at Nabble.com. >
RE: DocTransformer not always working
Yeah, that makes sense indeed. Thanks! Markus -Original message- > From:Chris Hostetter > Sent: Thursday 15th December 2016 19:44 > To: solr-user@lucene.apache.org > Subject: RE: DocTransformer not always working > > > : Well, i can work with this really fine knowing this, but does it make > : sense? I did assume (or be wrong in doing so) that fl=minhash:[binstr] > : should mean get that field and pass it through the transformer. At least > : i just now fell for it, maybe other shouldn't :) > > that's what it *can* mean, but it's not -- fundementally -- what it means. > > foo:[bar x=y ...] means run the "bar" transformer and request that it > uses the name "foo" as an output key in the resulting documents. > > when "bar" is executing it knows what name it was asked to use, so it can > use that information for other purposes (like in your case: you can use > that as a stored field name to do some processing on) but there's no > reason "foo" has to be a real field name. > > many processors don't treat the "name" special in any way, and in gneral a > processor should behave sanely if there is no name specified (ie: > "fl=[bar]" should be totally valid) > > the key reason why it's not really a good idea to *force* the "name" used > in the response to match a "real" stored field is because it prevents you > from using multiple transformers on the same field, or from returning the > same field unmodified. > > Another/Better way for you to have designed your transformer would have > been that the field to apply the binstr logic too should be specified as a > local param, ie... > > fl=minhash,b2_minhash:[binstr f=minhash base=2],b8_minhash:[binstr > f=minhash base=16] > > > ...see what i mean? > > > > > : > : Anyway, thanks again today, > : Markus > : > : -Original message- > : > From:Chris Hostetter > : > Sent: Wednesday 14th December 2016 23:14 > : > To: solr-user > : > Subject: Re: DocTransformer not always working > : > > : > > : > Fairly certain you aren't overridding getExtraRequestFields, so when your > : > DocTransformer is evaluated it can'd find the field you want it to > : > transform. > : > > : > By default, the ResponseWriters don't provide any fields that aren't > : > explicitly requested by the user, or specified as "extra" by the > : > DocTransformer. > : > > : > IIUC you want the stored value of the "minhash" field to be available to > : > you, but the response writer code doesn't know that -- it just knows you > : > want "minhash" to be the output respons key for the "[binstr]" > : > transformer. > : > > : > > : > Take a look at RawValueTransformerFactory as an example to borrow from. > : > > : > > : > > : > > : > : Date: Wed, 14 Dec 2016 21:55:26 + > : > : From: Markus Jelsma > : > : Reply-To: solr-user@lucene.apache.org > : > : To: solr-user > : > : Subject: DocTransformer not always working > : > : > : > : Hello - I just spotted an oddity with all two custom DocTransformers we > sometimes use on Solr 6.3.0. This particular transformer in the example just > transforms a long (or int) into a sequence of bits. I just use it as an > convenience to compare minhashes with my eyeballs. First example is very > straightforward, fl=minhash:[binstr], show only the minhash field, but as a > bit sequence. > : > : > : > : > solr/search/select?omitHeader=true&wt=json&indent=true&rows=1&sort=id%20asc&q=*:*&fl=minhash:[binstr] > : > : { > : > : "response":{"numFound":96933,"start":0,"docs":[ > : > : {}] > : > : }} > : > : > : > : The document is empty! This also happens with another transformer. The > next example i also request the lang field: > : > : > : > : solr/search/select?omitHeader=true&wt=json&indent=true&rows=1&sort=id > asc&q=*:*&fl=lang,minhash:[binstr] > : > : { > : > : "response":{"numFound":96933,"start":0,"docs":[ > : > : { > : > : "lang":"nl"}] > : > : }} > : > : > : > : Ok, at least i now get the lang field, but the transformed minhash is > nowhere to be seen. In the next example i request all fields and the > transformed minhash: > : > : > : > : > /solr/search/select?omitHeader=true&wt=json&indent=true&rows=1&sort=id%20asc&q=*:*&fl=*,minhash:[binstr] > : > : { > : > : "response":{"numFound":96933,"start":0,"docs":[ > : > : { > : > : > "minhash":"11101101001010001101001010111101100100110010", > : > : ...other fields here > : > : "_version_":1553728923368423424}] > : > : }} > : > : > : > : So it seems that right now, i can only use a transformer properly if i > request all fields. I believe it used to work with all three examples just as > you would expect. But since i haven't used transformers for a while, i don't > know at which version it stopped working like that (if it ever did of course > :) > : > : > : > : Did i mess something up or did a bug creep on me? > : > : > : > : Thanks, > : > : Markus > : > : >