Re: JSON facet performance for aggregations

Saman Rasheed Thu, 25 May 2017 02:42:24 -0700

hi yonik,

i like your work on solr very much, and i'm hoping it can deliver what we are 
looking to acheive here... and apologies for the direct aproach but i dont i 
have a choice, i've sumitted the request below to the mailing list and i still 
havent had a reply ... and part of me wondering it's because either i have 
missed out on something very obvious, or maybe my aproach to my problem is 
using the wrong technology here!

The mailing list is not allowing me to send you a direct link to the issue 
unless you want to see my message with alot of xml 😊

so i'm pasting the contents of my message below:

thanks,

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

i have an english book which i have indexed its contents successfully into 
field called 'content,
with the following properties:

<field name="content" type="text_general" indexed="true" stored="true" 
multiValued="true"
termVectors="true" termPositions="true" termOffsets="true"/>

so if need to return the number of a specific term regex e.g. '*olomo*' then my 
document should
contain 2 and give me 'Solomon' with a term frequency = 2.

I've tried going through the term vector section in the reference and various 
other posts
on the internet but still i havent managed to figure out how.

the nearest i found is the following syntax/way:

http://localhost:8983/solr/test/tvrh?q=content:[*%20TO%20*]&indent=true&tv.tf=true&tv.df=true

which brings my pc to a near halt for about a couple of minutes, and then it 
returns the term
frequency of every term! but i only need the term frequency of particular 
pattern/regex:

is there a way to narrow it down to just one regex term, e.g. *thing*, so it 
will find soothing,
somthing, everything each with their number of occurences for the document?

thanks,

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

________________________________
From: Yonik Seeley <ysee...@gmail.com>
Sent: 24 May 2017 10:45
To: solr-user@lucene.apache.org
Subject: Re: JSON facet performance for aggregations

On Mon, May 8, 2017 at 11:27 AM, Yonik Seeley <ysee...@gmail.com> wrote:
> I opened https://issues.apache.org/jira/browse/SOLR-10634 to address
> this performance issue.

OK, this has been committed.
A quick test shows about a 30x speedup when faceting on a
string/numeric docvalues field with 100K unique values and doing a
simple aggregation on another numeric field (and when the limit:-1).

-Yonik

Re: JSON facet performance for aggregations

Reply via email to