Thanks Garrett!

Just pushed a fix to the develop branch for this - if you want to update the 
ref in your Gemfile, it's 19ea71dd90. Took the same approach as your fix, 
though I wish there was something more elegant (or Sphinx just handled it 
gracefully).

On 2 Mar 2014, at 12:13 am, Garrett Dimon <[email protected]> wrote:

> I believe that I may have found the culprit. (The "\u0000" unicode null 
> character.) By updating the scope of the index, I was able to narrow the 
> failure down to the two offending issues and their comments and attempting to 
> index just those issues. That made the failures quicker. Otherwise, I was 
> having to wait 30+ minutes for the indexer to fail. That made it difficult to 
> test out theories. It's obvious in hindsight, but it really helped get to the 
> bottom of things.
> 
> We concatenate all of the comments using a "comment_bodies" method on the 
> issue for indexing purposes. In both cases, one of the comments for the issue 
> contains "\u0000". I had previously been using Sequel Pro to examine the 
> contents of the comments, so I didn't see the character. Once I was able to 
> definitively identify the offending comments, I checked the value of 
> "comment_bodies" in console, and the \u0000 was visible.
> 
> I added a low-tech ".gsub(/\u0000/, '')" to our comment_bodies, and all was 
> right again in indexing land. So, it appears that the \u0000 null unicode 
> sequence is definitely the culprit. For now, we'll filter it out manually, 
> but I expect that's something that Thinking Sphinx could handle.
> 
> I hope this helps!
> 
> 
> On Fri, Feb 28, 2014 at 7:19 PM, Pat Allan <[email protected]> wrote:
> From reading through the source a bit more, it seems the `where` conditions 
> for a real-time index can be set to either a symbol (representing an instance 
> method name within the indexed model) or a proc (called with a single 
> argument, the instance of the model), and if any condition exists, each much 
> return true for the object to be added to the real-time index.
> 
> As for monitoring SphinxQL statements that are being sent to Sphinx, there's 
> been a recent commit to TS that adds real-time updates to the logs. Adding 
> the following to your Gemfile should sort that out:
> 
>   gem 'thinking-sphinx', '~> 3.1.0',
>     :git    => 'git://github.com/pat/thinking-sphinx.git',
>     :branch => 'develop',
>     :ref    => '94ee176a7a'
> 
> Also: anything within the scope block (includes or otherwise) are only 
> applied for the full index generation. They're not used when updating the 
> Sphinx indices for a single record is created/edited, nor do they apply 
> during searches.
> 
> On 1 Mar 2014, at 2:46 am, Garrett Dimon <[email protected]> wrote:
> 
> > Thanks! We'll give that a shot. It seems the where *is* being applied at 
> > search time, but just not index-time. We're actually considering indexing 
> > everything with RT because previously when projects were unarchived or 
> > accounts unfrozen, they'd get picked backup with the full reindexes, but it 
> > seems with RT, that wouldn't happen because we don't run full re-indexes. 
> > Is there a way that we could tell TS to reindex in those situations if we 
> > use this approach and exclude them from the index?
> >
> > I'm trying to track down the specific content that's tripping up the 
> > generation, but I'm not having much luck. I found the record that's printed 
> > out when rake files, and I manually inspected the records in nearby 
> > proximity to it, but they all check out. Is there a way/location for me to 
> > see the SQL query that's being used at the moment the rake task fails?
> >
> >
> > On Fri, Feb 28, 2014 at 9:21 AM, Pat Allan <[email protected]> 
> > wrote:
> > The `where` method doesn't apply for real-time indices - but try this 
> > instead in the index definition block:
> >
> >   scope { Issue.where "account_status_id IN (1,2) AND archived IS false" }
> >
> > That should ensure only the appropriate issues are indexed as part of the 
> > generate call.
> >
> > Beyond that, you may wish to add `includes` within that scope to cover all 
> > associations used within the index?
> >
> > Unrelated to any of this: the match_mode defaults to extended with TS v3 
> > (indeed, it can't be anything else).
> >
> > --
> > Pat
> >
> > On 1 Mar 2014, at 2:09 am, Garrett Dimon <[email protected]> wrote:
> >
> > > Roger on the generation. Any high-level suggestions for optimization?
> > >
> > > I'll see if I can't figure out the exact record that's tripping up the 
> > > index generation. In the meantime, here's a gist of our index definition:
> > > https://gist.github.com/garrettdimon/25f6c305541f30b3ce39
> > >
> > >
> > > On Thu, Feb 27, 2014 at 9:21 PM, Pat Allan <[email protected]> 
> > > wrote:
> > > Hi Garrett
> > >
> > > Generation can be slow - at the end of the day, it really comes down to 
> > > how much data you're dealing with, and if you're using aggregation 
> > > methods, how quick they are. It's all going through your Rails app 
> > > (instead of just SQL queries), so optimising for that is different to 
> > > adding db indices and such.
> > >
> > > As for the error though... without having a copy of the database, it's a 
> > > little hard to debug, but it sounds like there's a bug in TS with 
> > > something in the data being passed through. Having a look at your app log 
> > > may help identify the record in question... also, what does the index 
> > > definition for that model look like?
> > >
> > > --
> > > Pat
> > >
> > > On 28 Feb 2014, at 9:47 am, Garrett Dimon <[email protected]> wrote:
> > >
> > >> Howdy, Pat. We're in the process of upgrading from TS 2 with delayed 
> > >> deltas to TS 3 with real time indexing.
> > >>
> > >> We've been able to get everything up and running, but we've run into a 
> > >> couple of problems/questions around the indexing. These may ultimately 
> > >> be Sphinx questions rather than Thinking Sphinx questions, but I thought 
> > >> I'd start here since we're only changing Sphinx from 2.0 to 2.1.
> > >>
> > >> 1. Index Creation Performance
> > >>
> > >> Our production logs show about a 20 minute turnaround to do a complete 
> > >> reindex of our production data with TS 2. Running some local tests, TS 3 
> > >> generate is taking at least an hour for that data. (The generate is 
> > >> crashing, so it may ultimately take even longer.)  Our indexing 
> > >> configuration is setup so that a large portion of content is excluded 
> > >> from the index. (Inactive accounts, archived projects, etc.) We've 
> > >> verified that searching is correctly excluding the relevant records, but 
> > >> appears as if that's happening when the query is run rather than when 
> > >> the indexing occurs. Our only theory so far is that with TS 2 and 
> > >> traditional indexing, those weren't included in the index at all, but 
> > >> that with real time indexing, they're included in the index and filtered 
> > >> out when the query is run. Can you provide any insight about whether 
> > >> this sounds like normal behavior or whether we've likely screwed 
> > >> something up? :)
> > >>
> > >> 2. The Generate is crashing with "rake aborted! sphinxql: syntax error, 
> > >> unexpected $undefined, expecting CONST_INT (or 4 other tokens) near 
> > >> ''..." (where ... is content from our DB.)
> > >>
> > >> I've done some searching, but haven't had any luck. I've run the 
> > >> generate rake task on two separate occasions, and both times it failed 
> > >> with the same error message and content, so my gut is leading me to 
> > >> think that it's an encoding or unescaped quotation mark problem. Does 
> > >> that problem ring any bells?
> > >>
> > >> Thanks!
> > >>
> > >>
> > >>
> > >> --
> > >> You received this message because you are subscribed to the Google 
> > >> Groups "Thinking Sphinx" group.
> > >> To unsubscribe from this group and stop receiving emails from it, send 
> > >> an email to [email protected].
> > >>
> > >> To post to this group, send email to [email protected].
> > >> Visit this group at http://groups.google.com/group/thinking-sphinx.
> > >> For more options, visit https://groups.google.com/groups/opt_out.
> > >
> > >
> > > --
> > > You received this message because you are subscribed to a topic in the 
> > > Google Groups "Thinking Sphinx" group.
> > > To unsubscribe from this topic, visit 
> > > https://groups.google.com/d/topic/thinking-sphinx/7llAB4zO4bw/unsubscribe.
> > > To unsubscribe from this group and all its topics, send an email to 
> > > [email protected].
> > > To post to this group, send email to [email protected].
> > > Visit this group at http://groups.google.com/group/thinking-sphinx.
> > > For more options, visit https://groups.google.com/groups/opt_out.
> > >
> > >
> > > --
> > > You received this message because you are subscribed to the Google Groups 
> > > "Thinking Sphinx" group.
> > > To unsubscribe from this group and stop receiving emails from it, send an 
> > > email to [email protected].
> > > To post to this group, send email to [email protected].
> > > Visit this group at http://groups.google.com/group/thinking-sphinx.
> > > For more options, visit https://groups.google.com/groups/opt_out.
> >
> > --
> > You received this message because you are subscribed to a topic in the 
> > Google Groups "Thinking Sphinx" group.
> > To unsubscribe from this topic, visit 
> > https://groups.google.com/d/topic/thinking-sphinx/7llAB4zO4bw/unsubscribe.
> > To unsubscribe from this group and all its topics, send an email to 
> > [email protected].
> > To post to this group, send email to [email protected].
> > Visit this group at http://groups.google.com/group/thinking-sphinx.
> > For more options, visit https://groups.google.com/groups/opt_out.
> >
> >
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "Thinking Sphinx" group.
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to [email protected].
> > To post to this group, send email to [email protected].
> > Visit this group at http://groups.google.com/group/thinking-sphinx.
> > For more options, visit https://groups.google.com/groups/opt_out.
> 
> --
> You received this message because you are subscribed to a topic in the Google 
> Groups "Thinking Sphinx" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/thinking-sphinx/7llAB4zO4bw/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/thinking-sphinx.
> For more options, visit https://groups.google.com/groups/opt_out.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Thinking Sphinx" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/thinking-sphinx.
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/thinking-sphinx.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to