Hi Emir, So this would likely be different from what the operating system counts, as the operating system may consider each Chinese characters as 3 to 4 bytes. Which is probably why I could not find any record with subject:/.{255,}.*/
Is there other tools that we can use to query the length for data that are already indexed which are not in the standard English language? (Eg: Chinese, Japanese, etc) Regards, Edwin On 3 January 2018 at 23:51, Emir Arnautović <emir.arnauto...@sematext.com> wrote: > Hi Edwin, > I do not know, but my guess would be that each character is counted as 1 > in regex regardless how many bytes it takes in used encoding. > > Regards, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 3 Jan 2018, at 16:43, Zheng Lin Edwin Yeo <edwinye...@gmail.com> > wrote: > > > > Thanks for the reply. > > > > I am doing the search on existing data that has already been indexed, and > > it is likely to be a one time thing. > > > > This subject:/.{255,}.*/ works for English characters. However, there > are > > Chinese characters in some of the records. The length seems to be more > than > > 255, but it does not shows up in the results. > > > > Do you know how the length for Chinese characters and other languages are > > being determined? > > > > Regards, > > Edwin > > > > > > On 3 January 2018 at 23:01, Alexandre Rafalovitch <arafa...@gmail.com> > > wrote: > > > >> Do that during indexing as Emir suggested. Specifically, use an > >> UpdateRequestProcessor chain, probably with the Clone and FieldLength > >> processors: http://www.solr-start.com/javadoc/solr-lucene/org/ > >> apache/solr/update/processor/FieldLengthUpdateProcessorFactory.html > >> > >> Regards, > >> Alex. > >> > >> On 31 December 2017 at 22:00, Zheng Lin Edwin Yeo <edwinye...@gmail.com > > > >> wrote: > >>> Hi, > >>> > >>> Would like to check, if it is possible to query a field which has data > of > >>> more than a certain length? > >>> > >>> Like for example, I want to query the field subject that has more than > >> 255 > >>> bytes. Is it possible? > >>> > >>> I am currently using Solr 6.5.1. > >>> > >>> Regards, > >>> Edwin > >> > >