Ok. After analysis, I narrowed the reduced results set to the fact that the
zipcode field is not indexed 'as is'. i.e the zipcode field values are
broken down into tokens and then stored. Hence, if there are 10 documents
with zipcode fields varying from 91000-91009, then the zipcode fields are
not stored as 91000, 91001 etc.. instead, the most common recurrences are
grabbed together and stored as tokens  hence resulting in a reduced
resultset.
The net effect is I cannot search for a value like 91000  since its not
stored as it is.

I suspect this to do something with the type of field the zipcode is
associated to. Right now , zipcode is a field of type text_general where the
StandardTokenizerFactory may be breakign the values into tokens. However, I
want to store them without tokenizing. Whats the best field type to do this.
?

I already explored the String fieldtype which is supposed to store the
values as is, but I see that the values are still being tokenized.


Thanks,
Anand
On Wed, Aug 3, 2011 at 7:24 PM, Erick Erickson <erickerick...@gmail.com>wrote:

> Sorry, I'm on a restricted machine so can't get the precise URL. But,
> there's a debug page for DIH that might allow you to see what the query
> actually returns. I'd guess one of two things:
> 1> you aren't getting the number of rows you think.
> 2> you aren't committing the documents you add.
>
> But that's just a guess.
>
> Best
> Erick
> On Aug 3, 2011 2:15 PM, "anand sridhar" <anand.for...@gmail.com> wrote:
> > Hi,
> > I am a newbie to Solr and have been trying to learn using
> > DataImportHandler.
> > I have a query in data-config.xml that fetches about 5 records when i
> fire
> > it in SQL Query manager.
> > However, when Solr does a full import, it is skipping 4 records and only
> > importing 1 record.
> > What could be the reason for that. ?
> >
> > My data-config.xml looks like this -
> >
> > <dataConfig>
> > <dataSource type="JdbcDataSource"
> > name="GeoService"
> > driver="net.sourceforge.jtds.jdbc.Driver"
> > url="jdbc:jtds:sqlserver://10.168.50.104/ZipCodeLookup"
> > user="sa"
> > password="psiuser"/>
> > <document>
> > <entity name="city"
> > query="select ll.cityId as id, ll.zip as zipCode, c.cityName as
> > cityName, st.stateName as state, ct.countryName as country from
> latlonginfo
> > ll,city c, state st, country ct where ll.cityId = c.cityID and
> > c.stateID=st.stateID and st.countryID = ct.countryID
> > order by ll.areacode"
> > dataSource="GeoService">
> > <field column="zipCode" name="zipCode"/>
> > <field column="cityName" name="cityName"/>
> > <field column="state" name="state"/>
> > <field column="country" name="country"/>
> > </entity>
> > </document>
> > </dataConfig>
> >
> > My fields definition in schema.xml looks as below -
> >
> > <field name="CityName" type="text_general" indexed="true" stored="true"
> />
> > <field name="zipCode" type="text_general" indexed="true" stored="true"/>
> > <field name="state" type="text_general" indexed="true" stored="true" />
> > <field name="country" type="text_general" indexed="true" stored="true" />
> >
> > One observation I made was the 1 record that is being indexes is the last
> > record in the result set. I have verified that there are no duplicate
> > records being retreived.
> >
> > For eg, if the result set from Database is -
> >
> > zipcode CityName state country
> > ------- --------- ----- -------
> > 91324 Northridge CA USA
> > 91325 Northridge CA USA
> > 91327 Northridge CA USA
> > 91328 Northridge CA USA
> > 91329 Northridge CA USA
> > 91330 Northridge CA USA
> >
> > The record being indexed is the last record all the time.
> >
> > Any suggestions are welcome.
> >
> > Thanks,
> > Anand
>

Reply via email to