As a general rule with Solr, do a proof of concept implementation with the simplest sensible approach and only start piling on complexity if performance or capacity become problematic. If the data is naturally a string, use a string. If it is naturally a number, use a number. Use whatever the query client's will be most comfortable with.

-- Jack Krupansky

-----Original Message----- From: Kamal Palei
Sent: Tuesday, May 28, 2013 10:54 PM
To: solr-user@lucene.apache.org
Subject: Re: How apache solr stores indexes

Thanks Alex.

I am in dilemma how do I store the skill sets with solr index as a string
token or as an integer. To give little background -

As of today, each skill I assign a unique id (take as auto increment field
in mysql table), and the store them against user id in a separate table.
That's how I do search for users having  a particular skill or retrieve
complete skill set of a particular user.

Now I want to dump everything to solr and will minimize mysql usage as low
as possible. This will help me to scale to higher load.

I am just weighing down options between
1. Should I store each skill as a string token (in a new multivalued string
index)
2. OR should I store each skill as an integer (in a new multivalued integer
index)

Kindly suggest which is better option.

Best Regards
kamal






On Wed, May 29, 2013 at 8:11 AM, Alexandre Rafalovitch
<arafa...@gmail.com>wrote:

And you need to know this why?

If you are really trying to understand how this all works under the
covers, you need to look at Lucene's inverted index as a start. Start
here:
http://lucene.apache.org/core/4_3_0/core/org/apache/lucene/codecs/lucene42/package-summary.html#package_description

Might take you a couple of weeks to put it all together.

Or you could try asking the actual business-level question that you
need an answer to. :-)

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Tue, May 28, 2013 at 10:13 PM, Kamal Palei <palei.ka...@gmail.com>
wrote:
> Dear All
> I have a basic doubt how the data is stored in apache solr indexes.
>
> Say I have thousand registered users in my site. Lets say I want to > store
> skills of each users as a multivalued string index.
>
> Say
> user 1 has skill set - Java, MySql, PHP
> user 2 has skill set - C++, MySql, PHP
> user 3 has skill set - Java, Android, iOS
> ... so on
>
> You can see user 1 and 2 has two common skills that is MySql and PHP
> In an actual case there might be millions of repetition of words.
>
> Now question is, does apache solr stores them as just words, OR converts
> each words to an unique number and stores the number only.
>
> Best Regards
> Kamal
> Net Cloud Systems
> Bangalore, India


Reply via email to