Re: Is there a way to tell if multivalued field actually contains multiple values?

2016-11-11 Thread Michael McCandless
I think you can use the term stats that Lucene tracks for each field. Compare Terms.getSumTotalTermFreq and Terms.getDocCount. If they are equal it means every document that had this field, had only one token. Mike McCandless http://blog.mikemccandless.com On Fri, Nov 11, 2016 at 5:50 AM, Mik

Re: Is there a way to tell if multivalued field actually contains multiple values?

2016-11-11 Thread Mikhail Khludnev
I suppose it's needless to remind that norm(field) is proportional (but not precisely by default) to number of tokens in a doc's field (although not actual text values). On Fri, Nov 11, 2016 at 5:08 AM, Alexandre Rafalovitch wrote: > Hello, > > Say I indexed a large dataset against a schemaless

Re: Is there a way to tell if multivalued field actually contains multiple values?

2016-11-10 Thread Erick Erickson
I don't think so. Once things are indexed, they look just like a regular text field with odd offsets for some of the terms. Of course if you returned the stored form (assuming it's stored) it'd look different, but that's messy too. Best, Erick On Thu, Nov 10, 2016 at 6:08 PM, Alexandre Rafalovitc

Is there a way to tell if multivalued field actually contains multiple values?

2016-11-10 Thread Alexandre Rafalovitch
Hello, Say I indexed a large dataset against a schemaless configuration. Now I have a bunch of multivalued fields. Is there any way to say which of these (text) fields have (for given data) only single values? I know I am supposed to look at the original data, and all that, but this is more for de