Use ConcurrentUpdateSolrClient some program

2016-12-21 Thread 苗海泉
I use the solr is 6.0 version, the solrj is 6.0 version, using
SolrCloud mode deployment, in my code did not make an explicit commit,
configure the autoCommit and softAutoCommit, using the
ConcurrentUpdateSolrClient class.

When we send 100 million data, often read timeout exception occurred
in this anomaly, the data is lost. I would like to ask a few
questions:
1, ConcurrentUpdateSolrClient.add time, if not thrown on behalf of the
data is not an exception has been successfully sent to the solr, this
time is the synchronization of Well, that is, solr server to accept
the data written to the log before we return?
2, if the answer to question 1 is no, then how do we determine
ConcurrentUpdateSolrClient.add implementation failure, so that we have
the wrong data retransmission processing.
3, there is no use ConcurrentUpdateSolrClient
Thank you!


XFS or EXT4 on Amazon AWS AMIs

2016-12-21 Thread William Bell
So what are people recommending for SOLR on AWS on Amazon AMI - ext4 or xfs?

I saw an article about MongoDB - saying performance on Amazon was better
due to a mutex issue on ext4 files and threaded calls.

I have been using ext4 for a long time, but I am moving to r3.* instances
and TRIM / DISCARD support just appears more supported on XFS.




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Is it possible to rewrite part of the solr response?

2016-12-21 Thread Mike Thomsen
Thanks. I'll look into that stuff. The counts issue is really not a serious
problem for us far as I know.

On Wed, Dec 21, 2016 at 9:08 PM, Erick Erickson 
wrote:

> "grab the response" is a bit ambiguous here in Solr terms. Sure,
> a SearchComponent (you can write a plugin) gets the response,
> but it only sees the final list being returned to the user, i.e. if you
> have rows=15 it sees only 15 docs. Not sure that's adequate,
> in the case above you could easily not be allowed to see any of
> the top N docs. Plus, doing anything like this would give very
> skewed things like facets, grouping, etc. Say the facets were
> calculated over 534 hits but the user was only allowed to see 10 docs...
> Very confusing.
>
> The most robust solution would be a "post filter", another bit
> of custom code that you write (plugin). See:
> http://yonik.com/advanced-filter-caching-in-solr/
> A post filter sees _all_ the documents that satisfy the query,
> and makes an include/exclude decision on each one (just
> like any other fq clause). So facets, grouping and all the rest
> "just work". Do be aware that if the ACL calculations are  expensive
> you need to be prepared for the system administrator doing a
> *:* query. I usually build in a bailout and stop passing documents
> after some number and pass back a result about "please narrow
> down your search". Of course if your business logic is such that
> you can calculate them all "fast enough", you're golden.
>
> All that said, if there's any way you can build this into tokens in the
> doc and use a standard fq clause it's usually much easier. That may
> take some creative work at indexing time if it's even possible.
>
> Best,
> Erick
>
> On Wed, Dec 21, 2016 at 5:56 PM, Mike Thomsen 
> wrote:
> > We're trying out some ideas on locking down solr and would like to know
> if
> > there is a public API that allows you to grab the response before it is
> > sent and inspect it. What we're trying to do is something for which a
> > filter query is not a good option to really get where we want to be.
> > Basically, it's an integration with some business logic to make a final
> > pass at ensuring that certain business rules are followed in the event a
> > query returns documents a user is not authorized to see.
> >
> > Thanks,
> >
> > Mike
>


Re: Is it possible to rewrite part of the solr response?

2016-12-21 Thread John Bickerstaff
Awesome explanation Eric, I'll be filing this away for future reference.

On Dec 21, 2016 7:08 PM, "Erick Erickson"  wrote:

"grab the response" is a bit ambiguous here in Solr terms. Sure,
a SearchComponent (you can write a plugin) gets the response,
but it only sees the final list being returned to the user, i.e. if you
have rows=15 it sees only 15 docs. Not sure that's adequate,
in the case above you could easily not be allowed to see any of
the top N docs. Plus, doing anything like this would give very
skewed things like facets, grouping, etc. Say the facets were
calculated over 534 hits but the user was only allowed to see 10 docs...
Very confusing.

The most robust solution would be a "post filter", another bit
of custom code that you write (plugin). See:
http://yonik.com/advanced-filter-caching-in-solr/
A post filter sees _all_ the documents that satisfy the query,
and makes an include/exclude decision on each one (just
like any other fq clause). So facets, grouping and all the rest
"just work". Do be aware that if the ACL calculations are  expensive
you need to be prepared for the system administrator doing a
*:* query. I usually build in a bailout and stop passing documents
after some number and pass back a result about "please narrow
down your search". Of course if your business logic is such that
you can calculate them all "fast enough", you're golden.

All that said, if there's any way you can build this into tokens in the
doc and use a standard fq clause it's usually much easier. That may
take some creative work at indexing time if it's even possible.

Best,
Erick

On Wed, Dec 21, 2016 at 5:56 PM, Mike Thomsen 
wrote:
> We're trying out some ideas on locking down solr and would like to know if
> there is a public API that allows you to grab the response before it is
> sent and inspect it. What we're trying to do is something for which a
> filter query is not a good option to really get where we want to be.
> Basically, it's an integration with some business logic to make a final
> pass at ensuring that certain business rules are followed in the event a
> query returns documents a user is not authorized to see.
>
> Thanks,
>
> Mike


Re: Is it possible to rewrite part of the solr response?

2016-12-21 Thread Erick Erickson
"grab the response" is a bit ambiguous here in Solr terms. Sure,
a SearchComponent (you can write a plugin) gets the response,
but it only sees the final list being returned to the user, i.e. if you
have rows=15 it sees only 15 docs. Not sure that's adequate,
in the case above you could easily not be allowed to see any of
the top N docs. Plus, doing anything like this would give very
skewed things like facets, grouping, etc. Say the facets were
calculated over 534 hits but the user was only allowed to see 10 docs...
Very confusing.

The most robust solution would be a "post filter", another bit
of custom code that you write (plugin). See:
http://yonik.com/advanced-filter-caching-in-solr/
A post filter sees _all_ the documents that satisfy the query,
and makes an include/exclude decision on each one (just
like any other fq clause). So facets, grouping and all the rest
"just work". Do be aware that if the ACL calculations are  expensive
you need to be prepared for the system administrator doing a
*:* query. I usually build in a bailout and stop passing documents
after some number and pass back a result about "please narrow
down your search". Of course if your business logic is such that
you can calculate them all "fast enough", you're golden.

All that said, if there's any way you can build this into tokens in the
doc and use a standard fq clause it's usually much easier. That may
take some creative work at indexing time if it's even possible.

Best,
Erick

On Wed, Dec 21, 2016 at 5:56 PM, Mike Thomsen  wrote:
> We're trying out some ideas on locking down solr and would like to know if
> there is a public API that allows you to grab the response before it is
> sent and inspect it. What we're trying to do is something for which a
> filter query is not a good option to really get where we want to be.
> Basically, it's an integration with some business logic to make a final
> pass at ensuring that certain business rules are followed in the event a
> query returns documents a user is not authorized to see.
>
> Thanks,
>
> Mike


Re: Is it possible to rewrite part of the solr response?

2016-12-21 Thread John Bickerstaff
It would be custom code and I have something along those lines, although it
throws an error instead of changing the response...

Rushing now and can't go into more detail right now.

On Dec 21, 2016 6:57 PM, "Mike Thomsen"  wrote:

> We're trying out some ideas on locking down solr and would like to know if
> there is a public API that allows you to grab the response before it is
> sent and inspect it. What we're trying to do is something for which a
> filter query is not a good option to really get where we want to be.
> Basically, it's an integration with some business logic to make a final
> pass at ensuring that certain business rules are followed in the event a
> query returns documents a user is not authorized to see.
>
> Thanks,
>
> Mike
>


Is it possible to rewrite part of the solr response?

2016-12-21 Thread Mike Thomsen
We're trying out some ideas on locking down solr and would like to know if
there is a public API that allows you to grab the response before it is
sent and inspect it. What we're trying to do is something for which a
filter query is not a good option to really get where we want to be.
Basically, it's an integration with some business logic to make a final
pass at ensuring that certain business rules are followed in the event a
query returns documents a user is not authorized to see.

Thanks,

Mike


Re: Solr streaming divide by zero exception coming

2016-12-21 Thread Erick Erickson
There is not nearly enough information here to say
anything helpful. Please attach the stack trace,
the query used etc. IOW, whatever you think would
help someone else reproduce the problem.

What version of Solr are you using?

Best,
Erick

On Wed, Dec 21, 2016 at 11:14 AM, nelias  wrote:
> Hello ,
>
> I am running a simple search expression on a secured solr server running on
> cloud mode . Ran the stream exception in admin console . Getting
> java.util.concurrent.ExecutionException . Ran the same via SolrJ and got
> inside the source code and it's saying Divide by zero exception is coming
> while opening the stream
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-streaming-divide-by-zero-exception-coming-tp4310792.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Solr streaming divide by zero exception coming

2016-12-21 Thread nelias
Hello ,

I am running a simple search expression on a secured solr server running on
cloud mode . Ran the stream exception in admin console . Getting
java.util.concurrent.ExecutionException . Ran the same via SolrJ and got
inside the source code and it's saying Divide by zero exception is coming
while opening the stream



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-streaming-divide-by-zero-exception-coming-tp4310792.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr Suggester

2016-12-21 Thread Furkan KAMACI
Hi All,

I've a field like that:





When I run a suggester on my_field_1 it returns response. However
my_field_2 doesn't. I've defined suggester as:

  suggester
  FuzzyLookupFactory
  DocumentDictionaryFactory

What can be the reason?

Kind Regards,
Furkan KAMACI


Re: Very long young generation stop the world GC pause

2016-12-21 Thread Steven Bower
Also curious why such a large heap is required... If it's due to field
caches being loaded I'd highly recommend MMapDirectory (if not using
already) and turning on DocValues for all fields you plan to perform
sort/facet/analytics on.

steve

On Wed, Dec 21, 2016 at 9:25 AM Pushkar Raste 
wrote:

> You should probably have as small a swap as possible. I still feel long GCs
> are either due to swapping or thread contention.
>
> Did you try to remove all other G1GC tuning parameters except for the
> ParallelRefProcEnabled?
>
> On Dec 19, 2016 1:39 AM, "forest_soup"  wrote:
>
> > Sorry for my wrong memory. The swap is 16GB.
> >
> >
> >
> > --
> > View this message in context: http://lucene.472066.n3.
> > nabble.com/Very-long-young-generation-stop-the-world-GC-
> > pause-tp4308911p4310301.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>


Re: Very long young generation stop the world GC pause

2016-12-21 Thread Pushkar Raste
You should probably have as small a swap as possible. I still feel long GCs
are either due to swapping or thread contention.

Did you try to remove all other G1GC tuning parameters except for the
ParallelRefProcEnabled?

On Dec 19, 2016 1:39 AM, "forest_soup"  wrote:

> Sorry for my wrong memory. The swap is 16GB.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Very-long-young-generation-stop-the-world-GC-
> pause-tp4308911p4310301.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


RE: DocTransformer not always working

2016-12-21 Thread Markus Jelsma
Yeah, that makes sense indeed.

Thanks!
Markus
 
-Original message-
> From:Chris Hostetter 
> Sent: Thursday 15th December 2016 19:44
> To: solr-user@lucene.apache.org
> Subject: RE: DocTransformer not always working
> 
> 
> : Well, i can work with this really fine knowing this, but does it make 
> : sense? I did assume (or be wrong in doing so) that fl=minhash:[binstr] 
> : should mean get that field and pass it through the transformer. At least 
> : i just now fell for it, maybe other shouldn't :)
> 
> that's what it *can* mean, but it's not -- fundementally -- what it means.
> 
> foo:[bar x=y ...] means run the "bar" transformer and request that it 
> uses the name "foo" as an output key in the resulting documents.
> 
> when "bar" is executing it knows what name it was asked to use, so it can 
> use that information for other purposes (like in your case: you can use 
> that as a stored field name to do some processing on) but there's no 
> reason "foo" has to be a real field name.
> 
> many processors don't treat the "name" special in any way, and in gneral a 
> processor should behave sanely if there is no name specified (ie: 
> "fl=[bar]" should be totally valid)
> 
> the key reason why it's not really a good idea to *force* the "name" used 
> in the response to match a "real" stored field is because it prevents you 
> from using multiple transformers on the same field, or from returning the 
> same field unmodified.
> 
> Another/Better way for you to have designed your transformer would have 
> been that the field to apply the binstr logic too should be specified as a 
> local param, ie...
> 
>   fl=minhash,b2_minhash:[binstr f=minhash base=2],b8_minhash:[binstr 
> f=minhash base=16]
> 
> 
> ...see what i mean?
> 
> 
> 
> 
> : 
> : Anyway, thanks again today,
> : Markus
> : 
> : -Original message-
> : > From:Chris Hostetter 
> : > Sent: Wednesday 14th December 2016 23:14
> : > To: solr-user 
> : > Subject: Re: DocTransformer not always working
> : > 
> : > 
> : > Fairly certain you aren't overridding getExtraRequestFields, so when your 
> : > DocTransformer is evaluated it can'd find the field you want it to 
> : > transform.
> : > 
> : > By default, the ResponseWriters don't provide any fields that aren't 
> : > explicitly requested by the user, or specified as "extra" by the 
> : > DocTransformer.
> : > 
> : > IIUC you want the stored value of the "minhash" field to be available to 
> : > you, but the response writer code doesn't know that -- it just knows you 
> : > want "minhash" to be the output respons key for the "[binstr]" 
> : > transformer.
> : > 
> : > 
> : > Take a look at RawValueTransformerFactory as an example to borrow from.
> : > 
> : > 
> : > 
> : > 
> : > : Date: Wed, 14 Dec 2016 21:55:26 +
> : > : From: Markus Jelsma 
> : > : Reply-To: solr-user@lucene.apache.org
> : > : To: solr-user 
> : > : Subject: DocTransformer not always working
> : > : 
> : > : Hello - I just spotted an oddity with all two custom DocTransformers we 
> sometimes use on Solr 6.3.0. This particular transformer in the example just 
> transforms a long (or int) into a sequence of bits. I just use it as an 
> convenience to compare minhashes with my eyeballs. First example is very 
> straightforward, fl=minhash:[binstr], show only the minhash field, but as a 
> bit sequence.
> : > : 
> : > : 
> solr/search/select?omitHeader=true&wt=json&indent=true&rows=1&sort=id%20asc&q=*:*&fl=minhash:[binstr]
> : > : {
> : > :   "response":{"numFound":96933,"start":0,"docs":[
> : > :   {}]
> : > :   }}
> : > : 
> : > : The document is empty! This also happens with another transformer. The 
> next example i also request the lang field:
> : > : 
> : > : solr/search/select?omitHeader=true&wt=json&indent=true&rows=1&sort=id 
> asc&q=*:*&fl=lang,minhash:[binstr]
> : > : {
> : > :   "response":{"numFound":96933,"start":0,"docs":[
> : > :   {
> : > : "lang":"nl"}]
> : > :   }}
> : > : 
> : > : Ok, at least i now get the lang field, but the transformed minhash is 
> nowhere to be seen. In the next example i request all fields and the 
> transformed minhash:
> : > : 
> : > : 
> /solr/search/select?omitHeader=true&wt=json&indent=true&rows=1&sort=id%20asc&q=*:*&fl=*,minhash:[binstr]
> : > : {
> : > :   "response":{"numFound":96933,"start":0,"docs":[
> : > :   {
> : > : 
> "minhash":"11101101001010001101001010111101100100110010",
> : > : ...other fields here
> : > : "_version_":1553728923368423424}]
> : > :   }}
> : > : 
> : > : So it seems that right now, i can only use a transformer properly if i 
> request all fields. I believe it used to work with all three examples just as 
> you would expect. But since i haven't used transformers for a while, i don't 
> know at which version it stopped working like that (if it ever did of course 
> :)
> : > : 
> : > : Did i mess something up or did a bug creep on me?
> : > : 
> : > : Thanks,
> : > : Markus
> : > : 
>