Hi Erick/ Ahmet,

Thanks for your suggestion. Can we have a query in TermsComponent like. I need 
the word count of comments for a question id not all. When I include the query 
q=questionid=123 I still see count of all

http://localhost:8182/solr/dev/terms?terms.fl=comments&terms=true&terms.limit=1000&q=questionid=123

StatsComponent is not supporting text fields

Field type 
textcloud_en{class=org.apache.solr.schema.TextField,analyzer=org.apache.solr.analysis.TokenizerChain,args={positionIncrementGap=100,
 class=solr.TextField}} is not currently supported

  <fieldType name="textcloud_en" class="solr.TextField" 
positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" 
ignoreCase="true"/>
          <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
          <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>

Thanks
Rajesh



CEB India Private Limited. Registration No: U741040HR2004PTC035324. Registered 
office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, 
Haryana-122002, India.

This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including CEB subsidiaries that offer 
SHL Talent Measurement products and services. If you have received this e-mail 
in error, please notify the sender and immediately, destroy all copies of this 
email and its attachments. The publication, copying, in whole or in part, or 
use or dissemination in any other way of this e-mail and attachments by anyone 
other than the intended person(s) is prohibited.

-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Friday, April 29, 2016 9:16 PM
To: solr-user <solr-user@lucene.apache.org>; Ahmet Arslan <iori...@yahoo.com>
Subject: Re: Facet ignoring repeated word

That's the way faceting is designed to work. It counts the _documents_ that a 
term appears in that satisfy your query, if a word appears multiple times in a 
doc, it'll only count it once.

For the general use-case it'd be unsettling for a user to see a facet count of 
500, then click on it and discover that the number of docs in the corpus was 
really 345 or something.

Ahmet's hints might help, but I'd really ask if counting words multiple times 
really satisfies the use case.

Best,
Erick

On Fri, Apr 29, 2016 at 7:10 AM, Ahmet Arslan <iori...@yahoo.com.invalid> wrote:
> Hi,
>
> Depending on your requirements; StatsComponent, TermsComponent, 
> LukeRequestHandler can also be used.
>
>
> https://cwiki.apache.org/confluence/display/solr/The+Terms+Component
> https://wiki.apache.org/solr/LukeRequestHandler
> https://cwiki.apache.org/confluence/display/solr/The+Stats+Component
> Ahmet
>
>
>
> On Friday, April 29, 2016 11:56 AM, "G, Rajesh" <r...@cebglobal.com> wrote:
> Hi,
>
> I am trying to implement word 
> cloud<https://www.google.co.uk/imgres?imgurl=https%3A%2F%2Fwww.whitehouse.gov%2Fsites%2Fdefault%2Ffiles%2Fother%2Fsotu_wordle.png&imgrefurl=https%3A%2F%2Fwww.whitehouse.gov%2Fblog%2F2011%2F01%2F26%2Fstate-union-word-cloud-jobs-america-people-new&docid=eZ_HvQpd9FRBKM&tbnid=qyIc-elv6z-0iM%3A&w=895&h=406&bih=643&biw=1366&ved=0ahUKEwie_8XjurPMAhXLaRQKHWiFDFAQMwgyKAAwAA&iact=mrc&uact=8>
>   using Solr.  The problem I have is Solr facet query ignores repeated words 
> in a document eg.
>
> I have indexed the text :
> It seems that the harder I work, the more work I get for the same 
> compensation and reward. The more work I take on gets absorbed into my 
> "normal" workload and I'm not recognized for working harder than my peers, 
> which makes me not want to work to my potential. I am very underwhelmed by 
> the evaluation process and bonus structure. I don't believe the current 
> structure rewards strong performers. I am confident that the company could 
> not hire someone with my talent to replace me if I left, but I don't think 
> the company realizes that.
>
> The indexed content has word my and the count the is 3 but when I run the 
> query 
> http://localhost:8182/solr/dev/select?facet=true&facet.field=comments&rows=0&indent=on&q=questionid:3956&wt=json
>  the count of word my  is 1 and not 3. Can you please help?
>
> Also please suggest If there is a better way to implement word cloud in Solr 
> other than using facet?
>
>     "facet_fields":{
>       "comments":[
>         "absorbed",1,
>         "am",1,
>         "believe",1,
>         "bonus",1,
>         "company",1,
>         "compensation",1,
>         "confident",1,
>         "could",1,
>         "current",1,
>         "don't",1,
>         "evaluation",1,
>         "get",1,
>         "gets",1,
>         "harder",1,
>         "hire",1,
>         "i",1,
>         "i'm",1,
>         "left",1,
>         "makes",1,
>         "me",1,
>         "more",1,
>         "my",1,
>         "normal",1,
>         "peers",1,
>         "performers",1,
>         "potential",1,
>         "process",1,
>         "realizes",1,
>         "recognized",1,
>         "replace",1,
>         "reward",1,
>         "rewards",1,
>         "same",1,
>         "seems",1,
>         "someone",1,
>         "strong",1,
>         "structure",1,
>         "take",1,
>         "talent",1,
>         "than",1,
>         "think",1,
>         "underwhelmed",1,
>         "very",1,
>         "want",1,
>         "which",1,
>         "work",1,
>         "working",1,
>         "workload",1]
>     }
>
>
>
>
> CEB India Private Limited. Registration No: U741040HR2004PTC035324. 
> Registered office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, 
> Gurgaon, Haryana-122002, India..
>
>
>
> This e-mail and/or its attachments are intended only for the use of the 
> addressee(s) and may contain confidential and legally privileged information 
> belonging to CEB and/or its subsidiaries, including CEB subsidiaries that 
> offer SHL Talent Measurement products and services. If you have received this 
> e-mail in error, please notify the sender and immediately, destroy all copies 
> of this email and its attachments. The publication, copying, in whole or in 
> part, or use or dissemination in any other way of this e-mail and attachments 
> by anyone other than the intended person(s) is prohibited.

Reply via email to