Re: Code for getting distinct facet counts across shards(Distributed Process).

2011-06-09 Thread Bill Bell
I have coded and tested this and it appears to work.

Are you having any problems?

On 6/9/11 12:35 AM, "rajini maski"  wrote:

> In solr 1.4.1, for getting "distinct facet terms count" across shards,
>
>
>
>The piece of code added for getting count of distinct facet terms across
>distributed process is as followed:
>
>
>
>
>
>Class: facetcomponent.java
>
>Function: -- finishStage(ResponseBuilder rb)
>
>
>
>  for (DistribFieldFacet dff : fi.facets.values()) {
>
>//just after this line of code
>
> else { // TODO: log error or throw exception?
>
> counts = dff.getLexSorted();
>
>
>
>int namedistint = 0;
>
>
>namedistint=rb.req.getParams().getFieldInt(dff.getKey().toString(),FacetPa
>rams.FACET_NAMEDISTINCT,0);
>
>if (namedistint  == 0)
>
>facet_fields.add(dff.getKey(), fieldCounts);
>
>
>
>if (namedistint  == 1)
>
>facet_fields.add("numfacetTerms", counts.length);
>
>
>
>
> if (namedistint  == 2) {
>
> NamedList resCount = new NamedList();
>
>
> resCount.add("numfacetTerms", counts.length);
>
>
> resCount.add("counts", fieldCounts);
>
>facet_fields.add(dff.getKey(), resCount);
>
> }
>
>
>
>
>Is this flow correct ?  I have worked with few test cases and it has
>worked
>fine.  but i want to know if there are any bugs that can creep in here?
>(My
>concern is this piece of code should not effect the rest of logic)
>
>
>
>
>*Code flow with comments for reference:*
>
>
> Function : --   finishStage(ResponseBuilder rb)
>
>
>
>  //in this for loop ,
>
> for (DistribFieldFacet dff : fi.facets.values()) {
>
>
>
>//just after this line of code
>
> else { // TODO: log error or throw exception?
>
> counts = dff.getLexSorted();
>
>
>
> int namedistint = 0;  //default
>
>
>
>//get the value of facet.numterms from the input query
>
>
>namedistint=rb.req.getParams().getFieldInt(dff.getKey().toString(),FacetPa
>rams.FACET_NAMEDISTINCT,0);
>
>
>
>// based on the value for  facet.numterms==0 or 1 or 2  , if conditions
>
>
>
>//Get only facet field counts
>
>if (namedistint  == 0)
>
>{
>
>facet_fields.add(dff.getKey(), fieldCounts);
>
>
>}
>
>
>
>//get only distinct facet term count
>
>if (namedistint  == 1)
>
>{
>
>facet_fields.add("numfacetTerms", counts.length);
>
>
>}
>
>
>
>//get facet field count and distinct term count.
>
> if (namedistint  == 2) {
>
> NamedList resCount = new NamedList();
>
>
> resCount.add("numfacetTerms", counts.length);
>
>
> resCount.add("counts", fieldCounts);
>
>facet_fields.add(dff.getKey(), resCount);
>
> }
>
>
>
>
>
>Regards,
>
>Rajani
>
>
>
>
>
>On Fri, May 27, 2011 at 1:14 PM, rajini maski 
>wrote:
>
>>  No such issues . Successfully integrated with 1.4.1 and it works across
>> single index.
>>
>> for f.2.facet.numFacetTerms=1  parameter it will give the distinct count
>> result
>>
>> for f.2.facet.numFacetTerms=2 parameter  it will give counts as well as
>> results for facets.
>>
>> But this is working only across single index not distributed process.
>>The
>> conditions you have added in simple facet.java- "if namedistinct count
>>==int
>> " ( 0, 1 and 2 condtions).. Should it be added in distributed process
>> function to enable it work across shards?
>>
>> Rajani
>>
>>
>>
>> On Fri, May 27, 2011 at 12:33 PM, Bill Bell  wrote:
>>
>>> I am pretty sure it does not yet support distributed shards..
>>>
>>> But the patch was written for 4.0... So there might be issues with
>>>running
>>> it on 1.4.1.
>>>
>>> On 5/26/11 11:08 PM, "rajini maski"  wrote:
>>>
>>> > The patch solr 2242 for getting count of distinct facet terms
>>> doesn't
>>> >work for distributedProcess
>>> >
>>> >(https://issues.apache.org/jira/browse/SOLR-2242)
>>> >
>>> >The error log says
>>> >
>>> > HTTP ERROR 500
>>> >Problem accessing /solr/select. Reason:
>>> >
>>> >For input string: "numFacetTerms"
>>> >
>>> >java.lang.NumberFormatException: For input string: "numFacetTerms"
>>> >at
>>>
>>> 
java.lang.NumberFormatException.forInputString(NumberFormatException.ja
va:
>>> >48)
>>> >at java.lang.Long.parseLong(Long.java:403)
>>> >at java.lang.Long.parseLong(Long.java:461)
>>> >at 
>>>org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:331)
>>> >at org.apache.solr.schema.TrieField.toInternal(TrieField.java:344)
>>> >at
>>>
>>> 
org.apache.solr.handler.component.FacetComponent$DistribFieldFacet.add(
Fac
>>> >etComponent.java:619)
>>> >at
>>>
>>> 
org.apache.solr.handler.component.FacetComponent.countFacets(FacetCompo
nen
>>> >t.java:265)
>>> >at
>>>
>>> 
org.apache.solr.han

Code for getting distinct facet counts across shards(Distributed Process).

2011-06-08 Thread rajini maski
 In solr 1.4.1, for getting "distinct facet terms count" across shards,



The piece of code added for getting count of distinct facet terms across
distributed process is as followed:





Class: facetcomponent.java

Function: -- finishStage(ResponseBuilder rb)



  for (DistribFieldFacet dff : fi.facets.values()) {

//just after this line of code

 else { // TODO: log error or throw exception?

 counts = dff.getLexSorted();



int namedistint = 0;


namedistint=rb.req.getParams().getFieldInt(dff.getKey().toString(),FacetParams.FACET_NAMEDISTINCT,0);

if (namedistint  == 0)

facet_fields.add(dff.getKey(), fieldCounts);



if (namedistint  == 1)

facet_fields.add("numfacetTerms", counts.length);




 if (namedistint  == 2) {

 NamedList resCount = new NamedList();


 resCount.add("numfacetTerms", counts.length);


 resCount.add("counts", fieldCounts);

facet_fields.add(dff.getKey(), resCount);

 }




Is this flow correct ?  I have worked with few test cases and it has worked
fine.  but i want to know if there are any bugs that can creep in here?  (My
concern is this piece of code should not effect the rest of logic)




*Code flow with comments for reference:*


 Function : --   finishStage(ResponseBuilder rb)



  //in this for loop ,

 for (DistribFieldFacet dff : fi.facets.values()) {



//just after this line of code

 else { // TODO: log error or throw exception?

 counts = dff.getLexSorted();



 int namedistint = 0;  //default



//get the value of facet.numterms from the input query


namedistint=rb.req.getParams().getFieldInt(dff.getKey().toString(),FacetParams.FACET_NAMEDISTINCT,0);



// based on the value for  facet.numterms==0 or 1 or 2  , if conditions



//Get only facet field counts

if (namedistint  == 0)

{

facet_fields.add(dff.getKey(), fieldCounts);


}



//get only distinct facet term count

if (namedistint  == 1)

{

facet_fields.add("numfacetTerms", counts.length);


}



//get facet field count and distinct term count.

 if (namedistint  == 2) {

 NamedList resCount = new NamedList();


 resCount.add("numfacetTerms", counts.length);


 resCount.add("counts", fieldCounts);

facet_fields.add(dff.getKey(), resCount);

 }





Regards,

Rajani





On Fri, May 27, 2011 at 1:14 PM, rajini maski  wrote:

>  No such issues . Successfully integrated with 1.4.1 and it works across
> single index.
>
> for f.2.facet.numFacetTerms=1  parameter it will give the distinct count
> result
>
> for f.2.facet.numFacetTerms=2 parameter  it will give counts as well as
> results for facets.
>
> But this is working only across single index not distributed process. The
> conditions you have added in simple facet.java- "if namedistinct count ==int
> " ( 0, 1 and 2 condtions).. Should it be added in distributed process
> function to enable it work across shards?
>
> Rajani
>
>
>
> On Fri, May 27, 2011 at 12:33 PM, Bill Bell  wrote:
>
>> I am pretty sure it does not yet support distributed shards..
>>
>> But the patch was written for 4.0... So there might be issues with running
>> it on 1.4.1.
>>
>> On 5/26/11 11:08 PM, "rajini maski"  wrote:
>>
>> > The patch solr 2242 for getting count of distinct facet terms
>> doesn't
>> >work for distributedProcess
>> >
>> >(https://issues.apache.org/jira/browse/SOLR-2242)
>> >
>> >The error log says
>> >
>> > HTTP ERROR 500
>> >Problem accessing /solr/select. Reason:
>> >
>> >For input string: "numFacetTerms"
>> >
>> >java.lang.NumberFormatException: For input string: "numFacetTerms"
>> >at
>>
>> >java.lang.NumberFormatException.forInputString(NumberFormatException.java:
>> >48)
>> >at java.lang.Long.parseLong(Long.java:403)
>> >at java.lang.Long.parseLong(Long.java:461)
>> >at org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:331)
>> >at org.apache.solr.schema.TrieField.toInternal(TrieField.java:344)
>> >at
>>
>> >org.apache.solr.handler.component.FacetComponent$DistribFieldFacet.add(Fac
>> >etComponent.java:619)
>> >at
>>
>> >org.apache.solr.handler.component.FacetComponent.countFacets(FacetComponen
>> >t.java:265)
>> >at
>>
>> >org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComp
>> >onent.java:235)
>> >at
>>
>> >org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHa
>> >ndler.java:290)
>> >at
>>
>> >org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBas
>> >e.java:131)
>> >at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>> >at
>>
>> >org.apache.solr.servlet.Solr