Dynamically Adding query parameters in my custom Request Handler class

2016-01-09 Thread Mark Robinson
Hi,
When I initially fire a query against my Solr instance using SOLRJ I pass
only, say q=*:*=(myfield:vaue1).

I have written a custom RequestHandler, which is what I call in my SolrJ
query.
Inside this custom request handler can I add more query params like say the
facets etc.. so that ultimately facets are also received back in my results
which were initially not specified when I invoked the Solr url using SolrJ.

In short, instead of constructing the query dynamically initially in SolrJ
I want to add the extra query params, adding a jar in Solr (a java code
that will check certain conditions and dynamically add the query params
after the initial SolrJ query is done). That is why I thought of a custom
RH which would help we write a java class and deploy in Solr.

Is this possible. Could some one get back please.

Thanks!
Mark.


Re: Dynamically Adding query parameters in my custom Request Handler class

2016-01-09 Thread Mark Robinson
Hi,

Ahmet, Jack, Thanks for the pointers.
My requirement is, I would not be having the facets or sort fields or its
order as static.
For example suppose for a particular scenario I need to show only 2 facets
and sort on only one field.
For another scenario I may have to do facet.field for a different set of
fields and sort on again another set of fields.
Consider there are some sort of user preferences for each query.

So I think I may not be able to store my parameters like facet fields, sort
fields etc preconfigured in solrconfig.xml.
Please correct me if I am wrong.

Based on Ahmet's reply I created a CustomSearchComponent with help from the
net.
I created a dummy RH and added this as the searchComponent in
SolrConfig.xml:-


  exampleComponent

  

  
  

..invoked it using:-
http://localhost:8984/solr/myexample?q=*:*
The o/p gave me that one record will ALL FIELDS fully in xml format.
*It did not give only the "id" field which was what I was trying as a test!*


my code for the custom Search Component shared below please:-
Before that I have these queries:-
1. In my code,Instead of hitting the server AGAIN using SolrJ to enforce my
params (just "fl" newly added) , is there anyway the query can be
executed with my additional fl param.
2. Just adding the input additional params is what I want to achieve. I
dont want to do anything on the response.
   Currently I am doing it:-
   --> builder.rsp.add( "example", doc.getFields());
  Note:- I removed this line and again when I ran this query NO OUTPUT came.
  So suppose I used it along with any of my existing RH by adding in
searchcomponent, I want it to only affect the input   querying by adding
additional params and should not influence the rendering of the o/p in any
way. How do I add this to one of my existing Request Handlers only to
influence the input for querying and NOT o/p format in any way.
3. Why is all fields being rendered for the one doc I selected to come back
in my "example"  variable, when I am actually restricting the fields to
only =id

Any help is greatly appreciated.


My console shows the following ie ... looks like the filtering happens but
what is /select doing here when I am calling
/myexample:-

154956 [qtp1856056345-18] INFO  org.apache.solr.core.SolrCore  û
[collection1] w
ebapp=/solr path=*/select* params=*{q=*:*=id=xml=2.2}* hits=4
status=0
 QTime=16
[stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY,numericType=INT,numer
icPrecisionStep=16,
org.apache.lucene.document.LazyDocument$LazyFi
eld@38585de9, org.apache.lucene.document.LazyDocument$LazyField@1b3edb96,
org.ap
ache.lucene.document.LazyDocument$LazyField@9692beb,
org.apache.lucene.document.
LazyDocument$LazyField@683f4dad,
org.apache.lucene.document.LazyDocument$LazyFie
ld@12f2e256, org.apache.lucene.document.LazyDocument$LazyField@7ffd69f5]
155080 [qtp1856056345-21] INFO  org.apache.solr.core.SolrCore  û
[collection1] w
ebapp=/solr path=*/myexample* params={*q=*:**} status=0 QTime=299


rough java class:-

public class ExampleSearchComponent extends SearchComponent {

@Override
public void prepare(ResponseBuilder builder) throws IOException {

}

@Override
public void process(ResponseBuilder builder) throws IOException {
SolrParams params = builder.req.getParams();
String q = params.get(CommonParams.Q);
ModifiableSolrParams params1 = new ModifiableSolrParams(params);
params1.add("fl", "id");
System.out.println("q is "+q);

QueryResponse response=null;

HttpSolrServer server = new HttpSolrServer( "
http://localhost:8984/solr/collection1; );
server.setParser(new XMLResponseParser());

try{
  response = server.query( params1 );
}catch(Exception e){}

SolrDocumentList results = new SolrDocumentList();
SolrIndexSearcher searcher = builder.req.getSearcher();
Document doc=searcher.doc(0);
System.out.println(doc.getFields());


builder.rsp.add( "example", doc.getFields());
}


@Override
public String getDescription() {
return "ExampleSearchComponent";
}

@Override
public String getSource() {
return "";
}

//@Override
public String getSourceId() {
return "";
}

@Override
public String getVersion() {
return "1.0";
}
}





Thanks and Rgds,
Mark.

On Sat, Jan 9, 2016 at 12:38 PM, Jack Krupansky 
wrote:

> Sure, you CAN do this, but why would you want to? I mean, what exactly is
> the motivation here? If you truly have custom code to execute, fine, but if
> all you are trying to do is set parameters, a custom request handler is
> hitting a tack with a sledge hammer. For example, why isn't setting
> defaults in solrconfig sufficient for your needs? At least then you can
> change parameters with a simple text edit rather than require a Java build
> and jar deploy.
>
> Can you share what some of the requirements are for 

Re: Dynamically Adding query parameters in my custom Request Handler class

2016-01-09 Thread Erik Hatcher
Woah, Mark….  you’re making a search request within a search component.  
Instead, let the built-in “query” component do the work for you.

I think one fix for you is to make your “components” be “first-components” 
instead (allowing the other default search components to come into play).  You 
don’t need to search within your component, just affect parameters, right?


—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com 



> On Jan 9, 2016, at 3:19 PM, Mark Robinson  wrote:
> 
> Hi,
> 
> Ahmet, Jack, Thanks for the pointers.
> My requirement is, I would not be having the facets or sort fields or its
> order as static.
> For example suppose for a particular scenario I need to show only 2 facets
> and sort on only one field.
> For another scenario I may have to do facet.field for a different set of
> fields and sort on again another set of fields.
> Consider there are some sort of user preferences for each query.
> 
> So I think I may not be able to store my parameters like facet fields, sort
> fields etc preconfigured in solrconfig.xml.
> Please correct me if I am wrong.
> 
> Based on Ahmet's reply I created a CustomSearchComponent with help from the
> net.
> I created a dummy RH and added this as the searchComponent in
> SolrConfig.xml:-
> 
>
>  exampleComponent
>
>  
> 
>   class="org.ExampleSearchComponent">
>  
> 
> ..invoked it using:-
> http://localhost:8984/solr/myexample?q=*:*
> The o/p gave me that one record will ALL FIELDS fully in xml format.
> *It did not give only the "id" field which was what I was trying as a test!*
> 
> 
> my code for the custom Search Component shared below please:-
> Before that I have these queries:-
> 1. In my code,Instead of hitting the server AGAIN using SolrJ to enforce my
> params (just "fl" newly added) , is there anyway the query can be
> executed with my additional fl param.
> 2. Just adding the input additional params is what I want to achieve. I
> dont want to do anything on the response.
>   Currently I am doing it:-
>   --> builder.rsp.add( "example", doc.getFields());
>  Note:- I removed this line and again when I ran this query NO OUTPUT came.
>  So suppose I used it along with any of my existing RH by adding in
> searchcomponent, I want it to only affect the input   querying by adding
> additional params and should not influence the rendering of the o/p in any
> way. How do I add this to one of my existing Request Handlers only to
> influence the input for querying and NOT o/p format in any way.
> 3. Why is all fields being rendered for the one doc I selected to come back
> in my "example"  variable, when I am actually restricting the fields to
> only =id
> 
> Any help is greatly appreciated.
> 
> 
> My console shows the following ie ... looks like the filtering happens but
> what is /select doing here when I am calling
> /myexample:-
> 
> 154956 [qtp1856056345-18] INFO  org.apache.solr.core.SolrCore  û
> [collection1] w
> ebapp=/solr path=*/select* params=*{q=*:*=id=xml=2.2}* hits=4
> status=0
> QTime=16
> [stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY,numericType=INT,numer
> icPrecisionStep=16,
> org.apache.lucene.document.LazyDocument$LazyFi
> eld@38585de9, org.apache.lucene.document.LazyDocument$LazyField@1b3edb96,
> org.ap
> ache.lucene.document.LazyDocument$LazyField@9692beb,
> org.apache.lucene.document.
> LazyDocument$LazyField@683f4dad,
> org.apache.lucene.document.LazyDocument$LazyFie
> ld@12f2e256, org.apache.lucene.document.LazyDocument$LazyField@7ffd69f5]
> 155080 [qtp1856056345-21] INFO  org.apache.solr.core.SolrCore  û
> [collection1] w
> ebapp=/solr path=*/myexample* params={*q=*:**} status=0 QTime=299
> 
> 
> rough java class:-
> 
> public class ExampleSearchComponent extends SearchComponent {
> 
> @Override
>public void prepare(ResponseBuilder builder) throws IOException {
> 
> }
> 
>@Override
>public void process(ResponseBuilder builder) throws IOException {
>SolrParams params = builder.req.getParams();
>String q = params.get(CommonParams.Q);
>ModifiableSolrParams params1 = new ModifiableSolrParams(params);
>params1.add("fl", "id");
>System.out.println("q is "+q);
> 
>QueryResponse response=null;
> 
>HttpSolrServer server = new HttpSolrServer( "
> http://localhost:8984/solr/collection1; );
>server.setParser(new XMLResponseParser());
> 
>try{
>  response = server.query( params1 );
>}catch(Exception e){}
> 
>SolrDocumentList results = new SolrDocumentList();
>SolrIndexSearcher searcher = builder.req.getSearcher();
>Document doc=searcher.doc(0);
>System.out.println(doc.getFields());
> 
> 
>builder.rsp.add( "example", doc.getFields());
>}
> 
> 
>@Override
>public String getDescription() {
>return "ExampleSearchComponent";
>}
> 
>@Override
>public String getSource() {
>

Re: Dynamically Adding query parameters in my custom Request Handler class

2016-01-09 Thread Mark Robinson
Thanks Eric!

Appreciate your valuable suggestions.
Now I am getting the concept of a search-component better!

So my custom class is just this after removing the SOLRJ part, as I just
need to modify the query by adding some parameters dynamically before the
query actually is executed by SOLR:-

public void process(ResponseBuilder builder) throws IOException {
SolrParams *params *= builder.req.getParams();
String q = params.get(CommonParams.Q);
ModifiableSolrParams *params1* = new ModifiableSolrParams(*params*);
*params1.add*("fl", "id");
//Added this line
*builder.req.setParams(params1);*

System.out.println("q is ### "+q);
}

Note:- Nothing inside prepare() method,

In my /select RH added the following in solrconfig.xml just bef close of
the   tag:-


 exampleComponent
 

 

Still it is not restricting o/p fields to only fl.
The console output shows the following:-
q is ### *:*
16140 [qtp1856056345-12] INFO  org.apache.solr.core.SolrCore  û
[collection1] we
bapp=/solr path=/select params={q=*:*} hits=4 status=0 QTime=4

Note:- the  "###" proves that it accessed the custom class. But the
ModifiableSolrParams params1 = new ModifiableSolrParams(params);
params1.add("fl", "id");

did not take effect.

I think I am near to developing my first dynamic query component. Could
some one please tell me where I am going wrong this time.

Appreciate any help.I am very eager to see my first dynamic query
implemented using a customized Search Component!!

Thanks!
Mark.

On Sat, Jan 9, 2016 at 3:38 PM, Erik Hatcher  wrote:

> Woah, Mark….  you’re making a search request within a search component.
> Instead, let the built-in “query” component do the work for you.
>
> I think one fix for you is to make your “components” be “first-components”
> instead (allowing the other default search components to come into play).
> You don’t need to search within your component, just affect parameters,
> right?
>
>
> —
> Erik Hatcher, Senior Solutions Architect
> http://www.lucidworks.com 
>
>
>
> > On Jan 9, 2016, at 3:19 PM, Mark Robinson 
> wrote:
> >
> > Hi,
> >
> > Ahmet, Jack, Thanks for the pointers.
> > My requirement is, I would not be having the facets or sort fields or its
> > order as static.
> > For example suppose for a particular scenario I need to show only 2
> facets
> > and sort on only one field.
> > For another scenario I may have to do facet.field for a different set of
> > fields and sort on again another set of fields.
> > Consider there are some sort of user preferences for each query.
> >
> > So I think I may not be able to store my parameters like facet fields,
> sort
> > fields etc preconfigured in solrconfig.xml.
> > Please correct me if I am wrong.
> >
> > Based on Ahmet's reply I created a CustomSearchComponent with help from
> the
> > net.
> > I created a dummy RH and added this as the searchComponent in
> > SolrConfig.xml:-
> > 
> >
> >  exampleComponent
> >
> >  
> >
> >   > class="org.ExampleSearchComponent">
> >  
> >
> > ..invoked it using:-
> > http://localhost:8984/solr/myexample?q=*:*
> > The o/p gave me that one record will ALL FIELDS fully in xml format.
> > *It did not give only the "id" field which was what I was trying as a
> test!*
> >
> >
> > my code for the custom Search Component shared below please:-
> > Before that I have these queries:-
> > 1. In my code,Instead of hitting the server AGAIN using SolrJ to enforce
> my
> > params (just "fl" newly added) , is there anyway the query can be
> > executed with my additional fl param.
> > 2. Just adding the input additional params is what I want to achieve. I
> > dont want to do anything on the response.
> >   Currently I am doing it:-
> >   --> builder.rsp.add( "example", doc.getFields());
> >  Note:- I removed this line and again when I ran this query NO OUTPUT
> came.
> >  So suppose I used it along with any of my existing RH by adding in
> > searchcomponent, I want it to only affect the input   querying by adding
> > additional params and should not influence the rendering of the o/p in
> any
> > way. How do I add this to one of my existing Request Handlers only to
> > influence the input for querying and NOT o/p format in any way.
> > 3. Why is all fields being rendered for the one doc I selected to come
> back
> > in my "example"  variable, when I am actually restricting the fields to
> > only =id
> >
> > Any help is greatly appreciated.
> >
> >
> > My console shows the following ie ... looks like the filtering happens
> but
> > what is /select doing here when I am calling
> > /myexample:-
> >
> > 154956 [qtp1856056345-18] INFO  org.apache.solr.core.SolrCore  û
> > [collection1] w
> > ebapp=/solr path=*/select* params=*{q=*:*=id=xml=2.2}*
> hits=4
> > status=0
> > QTime=16
> >
> [stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY,numericType=INT,numer
> > 

Re: Dynamically Adding query parameters in my custom Request Handler class

2016-01-09 Thread Ahmet Arslan
Hi Mark,

Try using set method instead of add method :  params1.set("fl", "id");

I also suggest to use static String for "fl" as you used  CommonParams.Q for "q"

Congrats for your first search component!

happy searching,


Ahmet



On Saturday, January 9, 2016 11:32 PM, Mark Robinson  
wrote:
Thanks Eric!

Appreciate your valuable suggestions.
Now I am getting the concept of a search-component better!

So my custom class is just this after removing the SOLRJ part, as I just
need to modify the query by adding some parameters dynamically before the
query actually is executed by SOLR:-

public void process(ResponseBuilder builder) throws IOException {
SolrParams *params *= builder.req.getParams();
String q = params.get(CommonParams.Q);
ModifiableSolrParams *params1* = new ModifiableSolrParams(*params*);
*params1.add*("fl", "id");
//Added this line
*builder.req.setParams(params1);*

System.out.println("q is ### "+q);
}

Note:- Nothing inside prepare() method,

In my /select RH added the following in solrconfig.xml just bef close of
the   tag:-


 exampleComponent




Still it is not restricting o/p fields to only fl.
The console output shows the following:-
q is ### *:*
16140 [qtp1856056345-12] INFO  org.apache.solr.core.SolrCore  û
[collection1] we
bapp=/solr path=/select params={q=*:*} hits=4 status=0 QTime=4

Note:- the  "###" proves that it accessed the custom class. But the
ModifiableSolrParams params1 = new ModifiableSolrParams(params);
params1.add("fl", "id");

did not take effect.

I think I am near to developing my first dynamic query component. Could
some one please tell me where I am going wrong this time.

Appreciate any help.I am very eager to see my first dynamic query
implemented using a customized Search Component!!

Thanks!
Mark.


On Sat, Jan 9, 2016 at 3:38 PM, Erik Hatcher  wrote:

> Woah, Mark….  you’re making a search request within a search component.
> Instead, let the built-in “query” component do the work for you.
>
> I think one fix for you is to make your “components” be “first-components”
> instead (allowing the other default search components to come into play).
> You don’t need to search within your component, just affect parameters,
> right?
>
>
> —
> Erik Hatcher, Senior Solutions Architect
> http://www.lucidworks.com 
>
>
>
> > On Jan 9, 2016, at 3:19 PM, Mark Robinson 
> wrote:
> >
> > Hi,
> >
> > Ahmet, Jack, Thanks for the pointers.
> > My requirement is, I would not be having the facets or sort fields or its
> > order as static.
> > For example suppose for a particular scenario I need to show only 2
> facets
> > and sort on only one field.
> > For another scenario I may have to do facet.field for a different set of
> > fields and sort on again another set of fields.
> > Consider there are some sort of user preferences for each query.
> >
> > So I think I may not be able to store my parameters like facet fields,
> sort
> > fields etc preconfigured in solrconfig.xml.
> > Please correct me if I am wrong.
> >
> > Based on Ahmet's reply I created a CustomSearchComponent with help from
> the
> > net.
> > I created a dummy RH and added this as the searchComponent in
> > SolrConfig.xml:-
> > 
> >
> >  exampleComponent
> >
> >  
> >
> >   > class="org.ExampleSearchComponent">
> >  
> >
> > ..invoked it using:-
> > http://localhost:8984/solr/myexample?q=*:*
> > The o/p gave me that one record will ALL FIELDS fully in xml format.
> > *It did not give only the "id" field which was what I was trying as a
> test!*
> >
> >
> > my code for the custom Search Component shared below please:-
> > Before that I have these queries:-
> > 1. In my code,Instead of hitting the server AGAIN using SolrJ to enforce
> my
> > params (just "fl" newly added) , is there anyway the query can be
> > executed with my additional fl param.
> > 2. Just adding the input additional params is what I want to achieve. I
> > dont want to do anything on the response.
> >   Currently I am doing it:-
> >   --> builder.rsp.add( "example", doc.getFields());
> >  Note:- I removed this line and again when I ran this query NO OUTPUT
> came.
> >  So suppose I used it along with any of my existing RH by adding in
> > searchcomponent, I want it to only affect the input   querying by adding
> > additional params and should not influence the rendering of the o/p in
> any
> > way. How do I add this to one of my existing Request Handlers only to
> > influence the input for querying and NOT o/p format in any way.
> > 3. Why is all fields being rendered for the one doc I selected to come
> back
> > in my "example"  variable, when I am actually restricting the fields to
> > only =id
> >
> > Any help is greatly appreciated.
> >
> >
> > My console shows the following ie ... looks like the filtering happens
> but
> > what is /select doing here 

Re: Specifying a different txn log directory

2016-01-09 Thread Erick Erickson
Please show us exactly what you did. and exactly
what you saw to say that "does not seem to work".

Best,
Erick

On Fri, Jan 8, 2016 at 7:47 PM, KNitin  wrote:
> Hi,
>
> How do I specify a different directory for transaction logs? I tried using
> the updatelog entry in solrconfig.xml and reloaded the collection but that
> does not seem to work.
>
> Is there another setting I need to change?
>
> Thanks
> Nitin


Querying only replica's

2016-01-09 Thread Robert Brown

Hi,

(btw, when is 5.5 due?  I see the docs reference it, but not the 
download page)


Anyway, I index and query Solr over HTTP (no SolrJ, etc.) - is it 
best/good to get the CLUSTERSTATUS via the collection API and explicitly 
send queries to a replica to ensure I don't send queries to the leaders 
of my collection, to improve performance?  Like-wise with sending 
updates directly to a Leader?


My leaders will receive full updates of the entire collection once a 
day, so I would assume if the leader is handling queries too, 
performance would be hit?


Is the CLUSTERSTATUS API the only way to do this btw without SolrJ, 
etc.?  I wasn't sure if ZooKeeper would be able to tell me also.


Do I also need to do anything to ensure the leaders are never sent 
queries from the replica's?


Does this all sound sane?

One of my collections is 3 shards, with 2 replica's each (9 total 
nodes), 70m docs in total.


Thanks,
Rob



Re: Manage schema.xml via Solrj?

2016-01-09 Thread Bob Lawson
Thank you all so much for your responses.  Very helpful indeed!


> On Jan 8, 2016, at 12:03 PM, Erick Erickson  wrote:
> 
> First, Daniel nailed the XY problem, but this isn't that...
> 
> You're correct that hand-editing the schema file is error-prone.
> The managed schema API is your friend here. There are
> several commercial front-ends that already do this.
> 
> The managed schema API is all just HTTP, so there's nothing
> precluding a Java program from interpreting a form and sending
> off the proper HTTP requests to modify the schema.
> 
> The SolrJ client library has some sugar around this, there's no
> reason you can't use that as it's just a jar (and a dependency on
> a logging jar).
> 
> For SolrCloud it's a little different. You need to make sure your
> changes get to Zookeeper, which the schema API will handle
> for you.
> 
> One thing that's a bit confusing is "managed schema" and
> "schemaless". They both use the same underlying mechanism
> to modify the schema.xml file. With "managed schema" you do
> what you're talking about, have some process where you make
> specific modifications with the schema API to a controlled
> schema file.
> 
> "schemaless" automatically tries to guess what the schema
> _should_ be and uses the managed schema API to implement
> those guesses.
> 
> GW:
> Schema guessing is a great way to get things started, but virtually
> every organization I work with takes explicit control of the schema.
> They do this for three reasons:
> 1> the assumptions in managed schema create indexes that can be
> made much smaller by judicious options on the fields.
> 2> the search cases require careful analysis chains.
> 3> the guesses are wrong. I.e. if the first number encountered in a
> field is, say, 3 and the guessing says "Oh, this is an int field". The
> next doc is 3.4.. you'll get a parsing error and fail to index the doc.
> 
> 
> Best,
> Erick
> 
>> On Fri, Jan 8, 2016 at 7:38 AM, GW  wrote:
>> Bob,
>> 
>> Not sure why you would want to do this. You can set up Solr to guess the
>> schema. It creates a file called manage_schema.xml for an override. This is
>> the case with 5.3 I came across it by accident setting it up the first time
>> and I was a little annoyed but it made for a quick setup. Your programming
>> would still need to realise the new doc structure and use that new document
>> structure. The only problem is it's a bit generic in the guess work and I
>> did not spend much time testing it out so I am not really versed in
>> operating it. I got myself mack to schema.xml ASAP. My thoughts are you are
>> looking at a lot of work for little gain.
>> 
>> Best,
>> 
>> GW
>> 
>> 
>> 
>>> On 7 January 2016 at 21:36, Bob Lawson  wrote:
>>> 
>>> I want to programmatically make changes to schema.xml using java to do
>>> it.  Should I use Solrj to do this or is there a better way?  Can I use
>>> Solrj to make the rest calls that make up the schema API?  Whatever the
>>> answer, can anyone point me to an example showing how to do it?  Thanks!
>>> 
>>> 


Re: Specifying a different txn log directory

2016-01-09 Thread Mark Miller
dataDir and tlog dir cannot be changed with a core reload.

- Mark

On Sat, Jan 9, 2016 at 1:20 PM Erick Erickson 
wrote:

> Please show us exactly what you did. and exactly
> what you saw to say that "does not seem to work".
>
> Best,
> Erick
>
> On Fri, Jan 8, 2016 at 7:47 PM, KNitin  wrote:
> > Hi,
> >
> > How do I specify a different directory for transaction logs? I tried
> using
> > the updatelog entry in solrconfig.xml and reloaded the collection but
> that
> > does not seem to work.
> >
> > Is there another setting I need to change?
> >
> > Thanks
> > Nitin
>
-- 
- Mark
about.me/markrmiller


Re: Dynamically Adding query parameters in my custom Request Handler class

2016-01-09 Thread Ahmet Arslan
Hi Mark,

Yes this is possible. Better, you can use a custom SearchComponent for this 
task too.
You retrieve solr parameters, wrap it into ModifiableSolrParams. Add extra 
parameters etc, then pass it to underlying search components.

Ahmet


On Saturday, January 9, 2016 3:59 PM, Mark Robinson  
wrote:
Hi,
When I initially fire a query against my Solr instance using SOLRJ I pass
only, say q=*:*=(myfield:vaue1).

I have written a custom RequestHandler, which is what I call in my SolrJ
query.
Inside this custom request handler can I add more query params like say the
facets etc.. so that ultimately facets are also received back in my results
which were initially not specified when I invoked the Solr url using SolrJ.

In short, instead of constructing the query dynamically initially in SolrJ
I want to add the extra query params, adding a jar in Solr (a java code
that will check certain conditions and dynamically add the query params
after the initial SolrJ query is done). That is why I thought of a custom
RH which would help we write a java class and deploy in Solr.

Is this possible. Could some one get back please.

Thanks!
Mark.


Re: solrcloud -How to delete a doc at a specific shard

2016-01-09 Thread Erick Erickson
I don't really know unless there's _something_ different
about the docs, and you could delete by _query_, something
like id=XXX AND (condition unique to the doc you want to remove).

I'm more concerned about how there got to be duplicate entries in the
first place. There really shouldn't be any with composite id doing the
routing. What lead up to this?

Best,
Erick

On Fri, Jan 8, 2016 at 6:33 PM, elvis鱼人  wrote:
> solr version is 5.2.0,
> this problem is different shards with the same ID,
> the document router is compositeId ,
> and if i do this
> ../collection/update?commit=true=idhere,
> then this id is missing in whole solrcloud.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/solrcloud-How-to-delete-a-doc-at-a-specific-shard-tp4249354p4249601.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dynamically Adding query parameters in my custom Request Handler class

2016-01-09 Thread Jack Krupansky
Sure, you CAN do this, but why would you want to? I mean, what exactly is
the motivation here? If you truly have custom code to execute, fine, but if
all you are trying to do is set parameters, a custom request handler is
hitting a tack with a sledge hammer. For example, why isn't setting
defaults in solrconfig sufficient for your needs? At least then you can
change parameters with a simple text edit rather than require a Java build
and jar deploy.

Can you share what some of the requirements are for your custom request
handler, including the motivation? I'd hate to see you go off and invest
significant effort in a custom request handler when simpler techniques may
suffice.

-- Jack Krupansky

On Sat, Jan 9, 2016 at 12:08 PM, Ahmet Arslan 
wrote:

> Hi Mark,
>
> Yes this is possible. Better, you can use a custom SearchComponent for
> this task too.
> You retrieve solr parameters, wrap it into ModifiableSolrParams. Add extra
> parameters etc, then pass it to underlying search components.
>
> Ahmet
>
>
> On Saturday, January 9, 2016 3:59 PM, Mark Robinson <
> mark123lea...@gmail.com> wrote:
> Hi,
> When I initially fire a query against my Solr instance using SOLRJ I pass
> only, say q=*:*=(myfield:vaue1).
>
> I have written a custom RequestHandler, which is what I call in my SolrJ
> query.
> Inside this custom request handler can I add more query params like say the
> facets etc.. so that ultimately facets are also received back in my results
> which were initially not specified when I invoked the Solr url using SolrJ.
>
> In short, instead of constructing the query dynamically initially in SolrJ
> I want to add the extra query params, adding a jar in Solr (a java code
> that will check certain conditions and dynamically add the query params
> after the initial SolrJ query is done). That is why I thought of a custom
> RH which would help we write a java class and deploy in Solr.
>
> Is this possible. Could some one get back please.
>
> Thanks!
> Mark.
>


Re: SolrCloud: Setting/finding node names for deleting replicas

2016-01-09 Thread Erick Erickson
For some reason, "slice" is the preferred term in the _code_, while
"shard" is preferred in docs

FWIW
Erick

On Fri, Jan 8, 2016 at 3:51 PM, Jeff Wartes  wrote:
>
> Honestly, I have no idea which is "old". The solr source itself uses slice 
> pretty consistently, so I stuck with that when I started the project last 
> year. And logically, a shard being an instance of a slice makes sense to me. 
> But one significant place where they word shard is exposed is the default 
> names of the slices, so it’s a mixed bag.
>
>
> See here:
>   https://github.com/whitepages/solrcloud_manager#terminology
>
>
>
>
>
>
> On 1/8/16, 2:34 PM, "Robert Brown"  wrote:
>
>>Thanks for the pointer Jeff,
>>
>>For SolrCloud it turned out to be...
>>
>>=xxx
>>
>>btw, for your app, isn't "slice" old notation?
>>
>>
>>
>>
>>On 08/01/16 22:05, Jeff Wartes wrote:
>>>
>>> I’m pretty sure you could change the name when you ADDREPLICA using a 
>>> core.name property. I don’t know if you can when you initially create the 
>>> collection though.
>>>
>>> The CLUSTERSTATUS command will tell you the core names: 
>>> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api18
>>>
>>> That said, this tool might make things easier.
>>> https://github.com/whitepages/solrcloud_manager
>>>
>>>
>>> # shows cluster status, including core names:
>>> java -jar solrcloud_manager-assembly-1.4.0.jar -z zk0.example.com:2181/myapp
>>>
>>>
>>> # deletes a replica by node/collection/shard (figures out the core name 
>>> under the hood)
>>> java -jar solrcloud_manager-assembly-1.4.0.jar deletereplica -z 
>>> zk0.example.com:2181/myapp -c collection1 --node node1.example.com --slice 
>>> shard2
>>>
>>>
>>> I mention this tool every now and then on this list because I like it, but 
>>> I’m the author, so take that with a pretty big grain of salt. Feedback is 
>>> very welcome.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 1/8/16, 1:18 PM, "Robert Brown"  wrote:
>>>
 Hi,

 I'm having trouble identifying a replica to delete...

 I've created a 3-shard cluster, all 3 created on a single host, then
 added a replica for shard2 onto another host, no problem so far.

 Now I want to delete the original shard, but got this error when trying
 a *replica* param value I thought would work...

 shard2/uk available replicas are core_node1,core_node4

 I can't find any mention of core_node1 or core_node4 via the admin UI,
 how would I know/find the name of each one?

 Is it possible to set these names explicitly myself for easier maintenance?

 Many thanks for any guidance,
 Rob

>>


Re: Querying only replica's

2016-01-09 Thread Erick Erickson
bq: is it best/good to get the CLUSTERSTATUS via the collection API
and explicitly send queries to a replica to ensure I don't send
queries to the leaders of my collection

In a word _no_. SolrCloud is vastly different than the old
master/slave. In SolrCloud, each and every node (leader and replicas)
index all the docs and serve queries. The additional burden the leader
has is actually very small. There's absolutely no reason to _not_ use
the leader to serve queries.

As far as sending updates, there would be a _little_ benefit to
sending the updates directly to the leader, but _far_ more benefit in
using SolrJ. If you use SolrJ (and CloudSolrClient), then the
documents are split up on the _client_ and only the docs for a
particular shard are automatically sent to the leader for that shard.
Using SolrJ you can essentially scale indexing linearly with the
number of shards you have. Just using HTTP does not scale linearly.
Your particular app may not care, but in high-throughput situations
this can be significant.

So rather than spend time and effort sending updates directly to a
leader and have the leader then forward the docs to the correct shard,
I recommend investing the time in using SolrJ for updates rather than
sending updates to the leader over HTTP. Or just ignore the problem
and devote your efforts to something that are more valuable.

So in short:
1> just stick a load balancer in front of _all_ your Solr nodes for
queries. And note that there's an internal load balancer already in
Solr that routes things around anyway, although putting a load
balancer in front of your entire cluster makes it so there's not a
single point of failure.
2> Depending on your throughput needs, either
2a> use SolrJ to index
2b> don't worry about it and send updates through the load balancer as
well. There'll be an extra hop if you send updates to a replica, but
if that's significant you should be using SolrJ

As for 5.5, it's not at all clear that there _will_ be a 5.5. 5.4 was
just released in early December. There's usually a several month lag
between point releases and there's some agitation to start the 6.0
release process, so it's up in the air.


On Sat, Jan 9, 2016 at 12:04 PM, Robert Brown  wrote:
> Hi,
>
> (btw, when is 5.5 due?  I see the docs reference it, but not the download
> page)
>
> Anyway, I index and query Solr over HTTP (no SolrJ, etc.) - is it best/good
> to get the CLUSTERSTATUS via the collection API and explicitly send queries
> to a replica to ensure I don't send queries to the leaders of my collection,
> to improve performance?  Like-wise with sending updates directly to a
> Leader?
>
> My leaders will receive full updates of the entire collection once a day, so
> I would assume if the leader is handling queries too, performance would be
> hit?
>
> Is the CLUSTERSTATUS API the only way to do this btw without SolrJ, etc.?  I
> wasn't sure if ZooKeeper would be able to tell me also.
>
> Do I also need to do anything to ensure the leaders are never sent queries
> from the replica's?
>
> Does this all sound sane?
>
> One of my collections is 3 shards, with 2 replica's each (9 total nodes),
> 70m docs in total.
>
> Thanks,
> Rob
>


Re: Specifying a different txn log directory

2016-01-09 Thread KNitin
Hi,

Eric:

 I changed updateLog as follows.

 /mnt/nitin_test/ 

I made this change after the collection was created and then updated zk and
reloaded the collection.

Mark: Ok that might be the issue. I will try doing this without the reload.

Thanks,
Nitin

On Sat, Jan 9, 2016 at 2:32 PM, Mark Miller  wrote:

> dataDir and tlog dir cannot be changed with a core reload.
>
> - Mark
>
> On Sat, Jan 9, 2016 at 1:20 PM Erick Erickson 
> wrote:
>
> > Please show us exactly what you did. and exactly
> > what you saw to say that "does not seem to work".
> >
> > Best,
> > Erick
> >
> > On Fri, Jan 8, 2016 at 7:47 PM, KNitin  wrote:
> > > Hi,
> > >
> > > How do I specify a different directory for transaction logs? I tried
> > using
> > > the updatelog entry in solrconfig.xml and reloaded the collection but
> > that
> > > does not seem to work.
> > >
> > > Is there another setting I need to change?
> > >
> > > Thanks
> > > Nitin
> >
> --
> - Mark
> about.me/markrmiller
>


Re: Running Lucene/SOR on Hadoop

2016-01-09 Thread Steve Davids
You might consider trying to get the de-duplication done at index time:
https://cwiki.apache.org/confluence/display/solr/De-Duplication that way
the map reduce job wouldn't even be necessary.

When it comes to the map reduce job, you would need to be more specific
with *what* you are doing for people to try and help, are you attempting to
query for every record of all 40 million rows - how many mapper tasks? But
right off the bat I see you are using Java's HttpURLConnection, you should
really use SolrJ for querying purposes:
https://cwiki.apache.org/confluence/display/solr/Using+SolrJ you won't need
to deal with xml parsing and it uses Apache's HttpClient with much more
reasonable defaults.

-Steve

On Thu, Dec 24, 2015 at 11:28 PM, Dino Chopins 
wrote:

> Hi Erick,
>
> Thank you for your response and pointer. What I mean by running Lucene/SOLR
> on Hadoop is to have Lucene/SOLR index available to be queried using
> mapreduce or any best practice recommended.
>
> I need to have this mechanism to do large scale row deduplication. Let me
> elaborate why I need this:
>
>1. I have two data sources with 35 and 40 million records of customer
>profile - the data come from two systems (SAP and MS CRM)
>2. Need to index and compare row by row of the two data sources using
>name, address, birth date, phone and email field. For birth date and
> email
>it will use exact comparison, but for the other fields will use
>probabilistic comparison. Btw, the data has been normalized before they
> are
>being indexed.
>3. Each finding will be categorized under same person, and will be
>deduplicated automatically or under user intervention depending on the
>score.
>
> I usually use it using Lucene index on local filesystem and use term
> vector, but since this will be repeated task and then challenged by
> management to do this on top of Hadoop cluster I need to have a framework
> or best practice to do this.
>
> I understand that to have Lucene index on HDFS is not very appropriate
> since HDFS is designed for large block operation. With that understanding,
> I use SOLR and hope to query it using http call from mapreduce job.  The
> snippet code is below.
>
> url = new URL(SOLR-Query-URL);
>
> HttpURLConnection connection = (HttpURLConnection)
> url.openConnection();
> connection.setRequestMethod("GET");
>
> The later method turns out to perform very bad. The simple mapreduce job
> that only read the data sources and write to hdfs takes 15 minutes, but
> once I do the http request it takes three hours now and still ongoing.
>
> What went wrong? And what will be solution to my problem?
>
> Thanks,
>
> Dino
>
> On Mon, Dec 14, 2015 at 12:30 AM, Erick Erickson 
> wrote:
>
> > First, what do you mean "run Lucene/Solr on Hadoop"?
> >
> > You can use the HdfsDirectoryFactory to store Solr/Lucene
> > indexes on Hadoop, at that point the actual filesystem
> > that holds the index is transparent to the end user, you just
> > use Solr as you would if it was using indexes on the local
> > file system. See:
> > https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS
> >
> > If you want to use Map-Reduce to _build_ indexes, see the
> > MapReduceIndexerTool in the Solr contrib area.
> >
> > Best,
> > Erick
> >
>
>
>
>
> --
> Regards,
>
> Dino
>


Selective Replication from master to slave

2016-01-09 Thread chandan khatri
Dear All,

I've a use case where I need to do selective replication from master to
slave.

Basically I am going with master slave approach - the application pushing
data to master will need to preview the search and if the search is deemed
useful/appropriate I need the data to be replicated to slaves.

Please advice.

Thanks!


Re: Solr search and index rate optimization

2016-01-09 Thread Steve Davids
bq. There's no good reason to have 5 with a small cluster and by "small" I
mean < 100s of nodes.

Well, a good reason would be if you want your system to continue to operate
if 2 ZK nodes lose communication with the rest of the cluster or go down
completely. Just to be clear though, the ZK nodes definitely don't need to
be beefy machines compared to your Solr data nodes since they are just
doing light-weight orchestration. But yea, for a 2 data node system one
might be willing to go with a 3 node ensemble to tolerate a single ZK
node dying, just depends on how much cash you are willing to spend and
availability level you are looking for.

-Steve


On Fri, Jan 8, 2016 at 12:07 PM, Erick Erickson 
wrote:

> Here's a longer form of Toke's answer:
>
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> BTW, on the surface, having 5 ZK nodes isn't doing you any real good.
> Zookeeper isn't really involved in serving queries or handling
> updates, it's purpose is to have the state of the cluster (nodes up,
> recovering, down, etc) and notify Solr listeners when that state
> changes. There's no good reason to have 5 with a small cluster and by
> "small" I mean < 100s of nodes.
>
> Best,
> Erick
>
> On Fri, Jan 8, 2016 at 2:40 AM, Toke Eskildsen 
> wrote:
> > On Fri, 2016-01-08 at 10:55 +0500, Zap Org wrote:
> >> i wanted to ask that i need to index after evey 15 min with hard commit
> >> (real time records) and currently have 5 zookeeper instances and 2 solr
> >> instances in one machine serving 200 users with 32GB RAM. whereas i
> wanted
> >> to serve more than 10,000 users so what should be my machine specs and
> what
> >> should be my architecture for this much serve rate along with index
> rate.
> >
> > It depends on your system and if we were forced to guess, our guess
> > would be very loose.
> >
> >
> > Fortunately you do have a running system with real queries: Make a copy
> > on two similar machines (you will probably need more hardware anyway)
> > and simulate growing traffic, measuring response times at appropriate
> > points: 200 users, 500, 1000, 2000 etc.
> >
> > If you are very lucky, your current system scales all the way. If not,
> > you should have enough data to make an educated guess of the amount of
> > machines you need. You should have at least 3 measuring point to
> > extrapolate from as scaling is not always linear.
> >
> > - Toke Eskildsen, State and University Library, Denmark
> >
> >
>


Re: Running Lucene/SOR on Hadoop

2016-01-09 Thread Dino Chopins
Hi Tim,

Thank you for the great pointer. Will join the group.

Thanks,

Dino

On Tue, Jan 5, 2016 at 2:10 AM, Tim Williams  wrote:

> Apache Blur (Incubating) has several approaches (hive, spark, m/r)
> that could probably help with this ranging from very experimental to
> stable.  If you're interested, you can ask over on
> blur-u...@incubator.apache.org ...
>


Re: Running Lucene/SOR on Hadoop

2016-01-09 Thread Dino Chopins
Hi Steve,

I cannot remove deduplication at index time, but rather to find duplicates
of the document then inform the duplicate data back to user.

Yes, I need to query each document of all 40 million rows. It will be about
10 mapper tasks max. Will try the SolrJ for this purpose. Thanks Steve.

Best,

Dino

On Sun, Jan 10, 2016 at 11:31 AM, Steve Davids  wrote:

> You might consider trying to get the de-duplication done at index time:
> https://cwiki.apache.org/confluence/display/solr/De-Duplication that way
> the map reduce job wouldn't even be necessary.
>
> When it comes to the map reduce job, you would need to be more specific
> with *what* you are doing for people to try and help, are you attempting to
> query for every record of all 40 million rows - how many mapper tasks? But
> right off the bat I see you are using Java's HttpURLConnection, you should
> really use SolrJ for querying purposes:
> https://cwiki.apache.org/confluence/display/solr/Using+SolrJ you won't
> need
> to deal with xml parsing and it uses Apache's HttpClient with much more
> reasonable defaults.
>
> -Steve
>




-- 
Regards,

Dino