Thanks Garrett! Just pushed a fix to the develop branch for this - if you want to update the ref in your Gemfile, it's 19ea71dd90. Took the same approach as your fix, though I wish there was something more elegant (or Sphinx just handled it gracefully).
On 2 Mar 2014, at 12:13 am, Garrett Dimon <[email protected]> wrote: > I believe that I may have found the culprit. (The "\u0000" unicode null > character.) By updating the scope of the index, I was able to narrow the > failure down to the two offending issues and their comments and attempting to > index just those issues. That made the failures quicker. Otherwise, I was > having to wait 30+ minutes for the indexer to fail. That made it difficult to > test out theories. It's obvious in hindsight, but it really helped get to the > bottom of things. > > We concatenate all of the comments using a "comment_bodies" method on the > issue for indexing purposes. In both cases, one of the comments for the issue > contains "\u0000". I had previously been using Sequel Pro to examine the > contents of the comments, so I didn't see the character. Once I was able to > definitively identify the offending comments, I checked the value of > "comment_bodies" in console, and the \u0000 was visible. > > I added a low-tech ".gsub(/\u0000/, '')" to our comment_bodies, and all was > right again in indexing land. So, it appears that the \u0000 null unicode > sequence is definitely the culprit. For now, we'll filter it out manually, > but I expect that's something that Thinking Sphinx could handle. > > I hope this helps! > > > On Fri, Feb 28, 2014 at 7:19 PM, Pat Allan <[email protected]> wrote: > From reading through the source a bit more, it seems the `where` conditions > for a real-time index can be set to either a symbol (representing an instance > method name within the indexed model) or a proc (called with a single > argument, the instance of the model), and if any condition exists, each much > return true for the object to be added to the real-time index. > > As for monitoring SphinxQL statements that are being sent to Sphinx, there's > been a recent commit to TS that adds real-time updates to the logs. Adding > the following to your Gemfile should sort that out: > > gem 'thinking-sphinx', '~> 3.1.0', > :git => 'git://github.com/pat/thinking-sphinx.git', > :branch => 'develop', > :ref => '94ee176a7a' > > Also: anything within the scope block (includes or otherwise) are only > applied for the full index generation. They're not used when updating the > Sphinx indices for a single record is created/edited, nor do they apply > during searches. > > On 1 Mar 2014, at 2:46 am, Garrett Dimon <[email protected]> wrote: > > > Thanks! We'll give that a shot. It seems the where *is* being applied at > > search time, but just not index-time. We're actually considering indexing > > everything with RT because previously when projects were unarchived or > > accounts unfrozen, they'd get picked backup with the full reindexes, but it > > seems with RT, that wouldn't happen because we don't run full re-indexes. > > Is there a way that we could tell TS to reindex in those situations if we > > use this approach and exclude them from the index? > > > > I'm trying to track down the specific content that's tripping up the > > generation, but I'm not having much luck. I found the record that's printed > > out when rake files, and I manually inspected the records in nearby > > proximity to it, but they all check out. Is there a way/location for me to > > see the SQL query that's being used at the moment the rake task fails? > > > > > > On Fri, Feb 28, 2014 at 9:21 AM, Pat Allan <[email protected]> > > wrote: > > The `where` method doesn't apply for real-time indices - but try this > > instead in the index definition block: > > > > scope { Issue.where "account_status_id IN (1,2) AND archived IS false" } > > > > That should ensure only the appropriate issues are indexed as part of the > > generate call. > > > > Beyond that, you may wish to add `includes` within that scope to cover all > > associations used within the index? > > > > Unrelated to any of this: the match_mode defaults to extended with TS v3 > > (indeed, it can't be anything else). > > > > -- > > Pat > > > > On 1 Mar 2014, at 2:09 am, Garrett Dimon <[email protected]> wrote: > > > > > Roger on the generation. Any high-level suggestions for optimization? > > > > > > I'll see if I can't figure out the exact record that's tripping up the > > > index generation. In the meantime, here's a gist of our index definition: > > > https://gist.github.com/garrettdimon/25f6c305541f30b3ce39 > > > > > > > > > On Thu, Feb 27, 2014 at 9:21 PM, Pat Allan <[email protected]> > > > wrote: > > > Hi Garrett > > > > > > Generation can be slow - at the end of the day, it really comes down to > > > how much data you're dealing with, and if you're using aggregation > > > methods, how quick they are. It's all going through your Rails app > > > (instead of just SQL queries), so optimising for that is different to > > > adding db indices and such. > > > > > > As for the error though... without having a copy of the database, it's a > > > little hard to debug, but it sounds like there's a bug in TS with > > > something in the data being passed through. Having a look at your app log > > > may help identify the record in question... also, what does the index > > > definition for that model look like? > > > > > > -- > > > Pat > > > > > > On 28 Feb 2014, at 9:47 am, Garrett Dimon <[email protected]> wrote: > > > > > >> Howdy, Pat. We're in the process of upgrading from TS 2 with delayed > > >> deltas to TS 3 with real time indexing. > > >> > > >> We've been able to get everything up and running, but we've run into a > > >> couple of problems/questions around the indexing. These may ultimately > > >> be Sphinx questions rather than Thinking Sphinx questions, but I thought > > >> I'd start here since we're only changing Sphinx from 2.0 to 2.1. > > >> > > >> 1. Index Creation Performance > > >> > > >> Our production logs show about a 20 minute turnaround to do a complete > > >> reindex of our production data with TS 2. Running some local tests, TS 3 > > >> generate is taking at least an hour for that data. (The generate is > > >> crashing, so it may ultimately take even longer.) Our indexing > > >> configuration is setup so that a large portion of content is excluded > > >> from the index. (Inactive accounts, archived projects, etc.) We've > > >> verified that searching is correctly excluding the relevant records, but > > >> appears as if that's happening when the query is run rather than when > > >> the indexing occurs. Our only theory so far is that with TS 2 and > > >> traditional indexing, those weren't included in the index at all, but > > >> that with real time indexing, they're included in the index and filtered > > >> out when the query is run. Can you provide any insight about whether > > >> this sounds like normal behavior or whether we've likely screwed > > >> something up? :) > > >> > > >> 2. The Generate is crashing with "rake aborted! sphinxql: syntax error, > > >> unexpected $undefined, expecting CONST_INT (or 4 other tokens) near > > >> ''..." (where ... is content from our DB.) > > >> > > >> I've done some searching, but haven't had any luck. I've run the > > >> generate rake task on two separate occasions, and both times it failed > > >> with the same error message and content, so my gut is leading me to > > >> think that it's an encoding or unescaped quotation mark problem. Does > > >> that problem ring any bells? > > >> > > >> Thanks! > > >> > > >> > > >> > > >> -- > > >> You received this message because you are subscribed to the Google > > >> Groups "Thinking Sphinx" group. > > >> To unsubscribe from this group and stop receiving emails from it, send > > >> an email to [email protected]. > > >> > > >> To post to this group, send email to [email protected]. > > >> Visit this group at http://groups.google.com/group/thinking-sphinx. > > >> For more options, visit https://groups.google.com/groups/opt_out. > > > > > > > > > -- > > > You received this message because you are subscribed to a topic in the > > > Google Groups "Thinking Sphinx" group. > > > To unsubscribe from this topic, visit > > > https://groups.google.com/d/topic/thinking-sphinx/7llAB4zO4bw/unsubscribe. > > > To unsubscribe from this group and all its topics, send an email to > > > [email protected]. > > > To post to this group, send email to [email protected]. > > > Visit this group at http://groups.google.com/group/thinking-sphinx. > > > For more options, visit https://groups.google.com/groups/opt_out. > > > > > > > > > -- > > > You received this message because you are subscribed to the Google Groups > > > "Thinking Sphinx" group. > > > To unsubscribe from this group and stop receiving emails from it, send an > > > email to [email protected]. > > > To post to this group, send email to [email protected]. > > > Visit this group at http://groups.google.com/group/thinking-sphinx. > > > For more options, visit https://groups.google.com/groups/opt_out. > > > > -- > > You received this message because you are subscribed to a topic in the > > Google Groups "Thinking Sphinx" group. > > To unsubscribe from this topic, visit > > https://groups.google.com/d/topic/thinking-sphinx/7llAB4zO4bw/unsubscribe. > > To unsubscribe from this group and all its topics, send an email to > > [email protected]. > > To post to this group, send email to [email protected]. > > Visit this group at http://groups.google.com/group/thinking-sphinx. > > For more options, visit https://groups.google.com/groups/opt_out. > > > > > > -- > > You received this message because you are subscribed to the Google Groups > > "Thinking Sphinx" group. > > To unsubscribe from this group and stop receiving emails from it, send an > > email to [email protected]. > > To post to this group, send email to [email protected]. > > Visit this group at http://groups.google.com/group/thinking-sphinx. > > For more options, visit https://groups.google.com/groups/opt_out. > > -- > You received this message because you are subscribed to a topic in the Google > Groups "Thinking Sphinx" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/thinking-sphinx/7llAB4zO4bw/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/thinking-sphinx. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- > You received this message because you are subscribed to the Google Groups > "Thinking Sphinx" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/thinking-sphinx. > For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/thinking-sphinx. For more options, visit https://groups.google.com/groups/opt_out.
