Can you reload all the content?

If so, I would calculate this in an update request processor and put the result 
in its own field.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Oct 21, 2015, at 2:53 AM, Roland Szűcs <roland.sz...@booknwalk.com> wrote:
> 
> Thank Toke your quick response. All your suggestions seem to be very good 
> idea. I found the capital letters also strange because of the names, places 
> so I will skip this part as I do not need an absolute measure just a ranked 
> order among my documents,
> 
> cheers,
> Roland
> 
> 
> 
> 2015. okt. 21. dátummal, 11:25 időpontban Toke Eskildsen 
> <t...@statsbiblioteket.dk> írta:
> 
>> Roland Szűcs <roland.sz...@booknwalk.com> wrote:
>>> My use case is that I have to calculate the LIX readability index for my
>>> documents.
>> [...]
>>> *B* = Number of periods (defined by period, colon or capital first letter)
>> [...]
>>> Does anybody have idea how to get the number of "periods"?
>> 
>> As the positions does not matter, you could make a copyField containing only 
>> punctuation. And maybe extended with a replace filter so that you have dot, 
>> comma, color, bang, question ect. instead of .,:!?
>> 
>> The capital first letter seems a bit strange to me - what about names? But 
>> anyway, you could do it with a PatternReplaceCharFilter, matching on 
>> something like 
>> ([^.,:!?]\p{Space}*\p{Upper})|(^\p{Upper})
>> and replacing with 'capital' (the regexp above probably fails - it was just 
>> from memory).
>> 
>> - Toke Eskildsen

Reply via email to