Hi Pat,
Thanks for your response. We have 7 other models indexed by Sphinx.
They are all much smaller tables, the largest containing around 5,000
records. Their indices are far more complex, though.
There is no missing data for email addresses. However, there are many
account_names records that have contain NULL or "" values for the
first_name and/or last_name columns. Indeed, removing the two
account_names lines from the define_index block and re-running the
index task, removes the duplicate document ID warning. However, due to
an unrelated configuration issue, I'm not able to get to get the
total_entries count from a console. Using the `search` command line
tool, it seems that the account_core index now has a total of 602083
documents (all of them!).
So it looks like the blank data is the cause for the duplicate
document id warning and the seemingly-missing records. What would you
suggest as a way to work around this issue? I can try casting the NULL
values to empty strings, or adding other data to the index that would
help Sphinx distinguish between records (but wouldn't the timestamp
and email address fields do that?). Anything you'd suggest?
And yes, the Account model is listed in the indexed_models setting.
Cheers,Alex

On Nov 2, 9:32 am, Pat Allan <[email protected]> wrote:
> Hi Alex
>
> How many other Sphinx indices do you have in your app? Just wondering if 
> there's some conflict somehow, though that surely would crop up in dev as 
> well.
>
> As for the missing records - do you have many accounts with no first name, 
> last name or email addresses? I remember reading somewhere that Sphinx 
> ignores records that have no data in their fields. Not sure if these two 
> problems are related to each other.
>
> Also, going by an issue you logged on Github - is this the app you're using 
> the indexed_models setting with? Can you confirm that all relevant models are 
> in that setting?
>
> Cheers
>
> --
> Pat
>
> On 02/11/2011, at 12:18 AM, Alex Kahn wrote:
>
>
>
>
>
>
>
> > Hi,
>
> > I'm adding a new index to my application. It looks like this:
>
> > class Account < ActiveRecord::Base
> >  define_index do
> >    indexes account_name.first_name
> >    indexes account_name.last_name
> >    indexes email_addresses.email_address
>
> >    has created_at
>
> >    set_property :delta => :datetime, :threshold => 2.minutes
> >  end
> > end
>
> > I'm testing how long the full index takes to generate on a staging
> > server where we typically have only sanitized data from production.
> > But for this task, I'm working with our entire accounts,
> > account_names, and email_addresses tables from production.
>
> > When I generate the index, I get the following warning during the
> > accounts index phase:
>
> >  WARNING: duplicate document ids found
>
> > In the Rails console, I observe the following:
> >>> Account.search.total_entries
> > => 260793
> >>> Account.count
> > => 602083
>
> > Locally, with a much smaller subset of the data, I also get a
> > different count from each data source, but I don't receive the
> > "duplicate document ids" warning when generating the index.
>
> > My research so far has indicated that this is an issue with merging
> > indexes. But here I'm generating a full index, not a generating a
> > delta index and then merging it into a full index.
>
> > My questions are:
>
> > 1. The warning and the discrepancy in count, are they related?
> > 2. What does the warning mean?
> > 3. Is all of my data accessible via searching, despite the different
> > counts?
> > 4. How can I fix this?
>
> > Thanks in advance for any assistance,
> > Alex Kahn
>
> > P.S. I'm using Rails 2.3.14, Sphinx 0.9.9, thinking-sphinx 1.4.7, ts-
> > datetime-delta 1.0.2
>
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "Thinking Sphinx" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to 
> > [email protected].
> > For more options, visit this group 
> > athttp://groups.google.com/group/thinking-sphinx?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en.

Reply via email to