Right, a search for "442" would not match "1442".

-- Jack Krupansky

-----Original Message----- From: z z
Sent: Friday, June 07, 2013 2:18 AM
To: solr-user@lucene.apache.org
Subject: Re: Schema Change: Int -> String (i am the original poster, new email address)

Maybe if I were to say that the column "user_id" will become "user_ids"
that would clarify things?

user_id:2002+AND+created:[${**from}+TO+${until}]+data:"more"

becomes

user_id*s*:2002+AND+created:[${**from}+TO+${until}]+data:"more"

where I want 2002 to be an exact positive match on one of the user_ids
embedded in the TEXT ... not string :)  If I am totally off or making no
sense, feedback it very welcome.  I am just seeing lots of similar data
going into my db and it feels like Solr should be able to handle this.

I just want to know if transforming the data like that will still allow
exact searches against a user_id.  My language from a solr gurus point of
view is probably *very* poorly phrased ... "exact" and TEXT might not go
hand in hand.

Is the TEXT "20 1442 35" parsed as "20" "1442" "35" so that a search
against it for "1442" will yield "exact" results?  A search against "442"
wont match right?

1. "20 1442 35"
2. "20 442 35"
3. "20 1442"

user_ids:1442 -> yields #1 & #3 always?
user_ids:442 -> yields only #2 always?

My lack of understanding about what solr does when it indexes is shining
through :)


On Fri, Jun 7, 2013 at 1:43 PM, z z <zenlok.testi...@gmail.com> wrote:

My language might be a bit off (I am saying "string" when I probably mean
"text" in the context of solr), but I'm pretty sure that my story is
unwavering ;)

`id` int(11) NOT NULL AUTO_INCREMENT
`created` int(10)
`data` varbinary(255)
`user_id` int(11)

So, imagine that we have 1000 entries come in where "data" above is
exactly the same for all 1000 entries, but user_id is different (id and
created being different is irrelevant).  I am thinking that prior to
inserting into mysql, I should be able to concatenate the user_ids together
with whitespace and then insert them into something like:

`id` int(11) NOT NULL AUTO_INCREMENT
`created` int(10)
`data` varbinary(255)
`user_id` blob

Then on solr's end it will treat the user_id as Text and parse it (I want
to say tokenize, but maybe my language is incorrect here?).

Then when I search

user_id:2002+AND+created:[${**from}+TO+${until}]+data:"more"

I want to be sure that if I look for user_id "2002", I will get data that
only has a value "2002" in the user_id column and that a separate user with
id "20" cannot accidentally pull data for user_id "2002" as a result of a
fuzzy (my language ok?) match of 20 against (20)02.

Current schema definition:

 <field name="user_id" type="int" indexed="true" stored="true"/>

New schema definition:

    <field name="user_id" type="user_id_string" indexed="true"
stored="true"/>
...
    <fieldType name="user_id_string" class="solr.TextField"
positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"
maxTokenLength="120"/>
      </analyzer>
    </fieldType>



Reply via email to