And you need to know this why? If you are really trying to understand how this all works under the covers, you need to look at Lucene's inverted index as a start. Start here: http://lucene.apache.org/core/4_3_0/core/org/apache/lucene/codecs/lucene42/package-summary.html#package_description
Might take you a couple of weeks to put it all together. Or you could try asking the actual business-level question that you need an answer to. :-) Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, May 28, 2013 at 10:13 PM, Kamal Palei <palei.ka...@gmail.com> wrote: > Dear All > I have a basic doubt how the data is stored in apache solr indexes. > > Say I have thousand registered users in my site. Lets say I want to store > skills of each users as a multivalued string index. > > Say > user 1 has skill set - Java, MySql, PHP > user 2 has skill set - C++, MySql, PHP > user 3 has skill set - Java, Android, iOS > ... so on > > You can see user 1 and 2 has two common skills that is MySql and PHP > In an actual case there might be millions of repetition of words. > > Now question is, does apache solr stores them as just words, OR converts > each words to an unique number and stores the number only. > > Best Regards > Kamal > Net Cloud Systems > Bangalore, India