Hi Alex A quick work-around could be to add a custom field to the mix, to ensure there's always data:
indexes "'account'", :as => :placeholder_field Does that help? -- Pat On 02/11/2011, at 10:03 PM, Alex Kahn wrote: > Hi Pat, > Thanks for your response. We have 7 other models indexed by Sphinx. > They are all much smaller tables, the largest containing around 5,000 > records. Their indices are far more complex, though. > There is no missing data for email addresses. However, there are many > account_names records that have contain NULL or "" values for the > first_name and/or last_name columns. Indeed, removing the two > account_names lines from the define_index block and re-running the > index task, removes the duplicate document ID warning. However, due to > an unrelated configuration issue, I'm not able to get to get the > total_entries count from a console. Using the `search` command line > tool, it seems that the account_core index now has a total of 602083 > documents (all of them!). > So it looks like the blank data is the cause for the duplicate > document id warning and the seemingly-missing records. What would you > suggest as a way to work around this issue? I can try casting the NULL > values to empty strings, or adding other data to the index that would > help Sphinx distinguish between records (but wouldn't the timestamp > and email address fields do that?). Anything you'd suggest? > And yes, the Account model is listed in the indexed_models setting. > Cheers,Alex > > On Nov 2, 9:32 am, Pat Allan <[email protected]> wrote: >> Hi Alex >> >> How many other Sphinx indices do you have in your app? Just wondering if >> there's some conflict somehow, though that surely would crop up in dev as >> well. >> >> As for the missing records - do you have many accounts with no first name, >> last name or email addresses? I remember reading somewhere that Sphinx >> ignores records that have no data in their fields. Not sure if these two >> problems are related to each other. >> >> Also, going by an issue you logged on Github - is this the app you're using >> the indexed_models setting with? Can you confirm that all relevant models >> are in that setting? >> >> Cheers >> >> -- >> Pat >> >> On 02/11/2011, at 12:18 AM, Alex Kahn wrote: >> >> >> >> >> >> >> >>> Hi, >> >>> I'm adding a new index to my application. It looks like this: >> >>> class Account < ActiveRecord::Base >>> define_index do >>> indexes account_name.first_name >>> indexes account_name.last_name >>> indexes email_addresses.email_address >> >>> has created_at >> >>> set_property :delta => :datetime, :threshold => 2.minutes >>> end >>> end >> >>> I'm testing how long the full index takes to generate on a staging >>> server where we typically have only sanitized data from production. >>> But for this task, I'm working with our entire accounts, >>> account_names, and email_addresses tables from production. >> >>> When I generate the index, I get the following warning during the >>> accounts index phase: >> >>> WARNING: duplicate document ids found >> >>> In the Rails console, I observe the following: >>>>> Account.search.total_entries >>> => 260793 >>>>> Account.count >>> => 602083 >> >>> Locally, with a much smaller subset of the data, I also get a >>> different count from each data source, but I don't receive the >>> "duplicate document ids" warning when generating the index. >> >>> My research so far has indicated that this is an issue with merging >>> indexes. But here I'm generating a full index, not a generating a >>> delta index and then merging it into a full index. >> >>> My questions are: >> >>> 1. The warning and the discrepancy in count, are they related? >>> 2. What does the warning mean? >>> 3. Is all of my data accessible via searching, despite the different >>> counts? >>> 4. How can I fix this? >> >>> Thanks in advance for any assistance, >>> Alex Kahn >> >>> P.S. I'm using Rails 2.3.14, Sphinx 0.9.9, thinking-sphinx 1.4.7, ts- >>> datetime-delta 1.0.2 >> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "Thinking Sphinx" group. >>> To post to this group, send email to [email protected]. >>> To unsubscribe from this group, send email to >>> [email protected]. >>> For more options, visit this group >>> athttp://groups.google.com/group/thinking-sphinx?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "Thinking Sphinx" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/thinking-sphinx?hl=en. > -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.
