Does this make any sense: I added source: :query to each of my has_many 
attributes, and suddenly indexing is fast again and back down to < 500MB... 

On Monday, July 22, 2013 8:04:45 PM UTC-4, Pat Allan wrote:
>
> That is surprising - removing min_prefix_len should certainly drop index 
> file sizes down. 
>
> It's worth noting the fix I just mentioned in the other thread should 
> remove the extra join, and this should reduce the amount of data your 
> database passes through to Sphinx. So: it may help return things to what 
> you're expecting. Give it a shot, let me know. 
>
> -- 
> Pat 
>
> On 23/07/2013, at 6:12 AM, Daniel Vandersluis wrote: 
>
> > Does it make any sense for the size to not change even with enable_star 
> and min_prefix_len being disabled? 
> > 
> > On Monday, July 22, 2013 11:20:03 AM UTC-4, Daniel Vandersluis wrote: 
> > There must be something weird going on here - when I added job_ids to 
> the index (as per the other thread) with the latest master from github, the 
> index size grows even more, up to 11GB now... 
> > 
> > On Saturday, July 20, 2013 11:43:38 PM UTC-4, Pat Allan wrote: 
> > On 21/07/2013, at 1:40 PM, Daniel Vandersluis wrote: 
> > 
> > > Sorry, it's the same index, I was just simplifying the names for the 
> purpose of this post and missed one. Sorry for the confusion :) 
> > 
> > Ah rightio. The change in size is pretty crazy then! 
> > 
> > > If the change was made prior to 2.0.11, wouldn't that mean that the 
> indexes previously would have been huge too? 
> > 
> > I would have thought so, yes. 
> > 
> > > I'm not sure I understand what you mean about sql_field_string - do 
> sql_field_strings take up significantly more space than sql_attr_strings 
> do? 
> > 
> > There's no reason for them to at all. I don't know the dark arts behind 
> the Sphinx source code though (it's C and C++, neither of which I'm 
> confident with). 
> > 
> > -- 
> > Pat 
> > 
> > > 
> > >> On Saturday, July 20, 2013 11:18:40 PM UTC-4, Pat Allan wrote: 
> > >> Further to this: I guess I was wrong about 2.0.11 using ordinal 
> attribute types instead of string attribute types - that change must have 
> come in earlier. 
> > >> 
> > >> sql_attr_string is a standard string attribute (not ordinal), and 
> sql_field_string stores the field value as a string attribute of the same 
> name *as well as* the field. The latter removes the need for the _sort 
> suffix you'll spot in sortable attributes in 2.x releases. 
> > >> 
> > >> I wouldn't expect there to be any difference between these two in 
> terms of file size though. But just to compare apples with apples - you had 
> user_core file sizes previously, but now it's candidate_user_core. Are 
> there other large and unnecessary string attributes in the CandidateUser 
> index? 
> > >> 
> > >> -- 
> > >> Pat 
> > >> 
> > >> On 19/07/2013, at 4:25 AM, Daniel Vandersluis wrote: 
> > >> 
> > >> > Yeah, I was using 2.0.11 previously. There does not seem to be any 
> difference with removing sortable: true from the index definition (for 
> resumes.document), except that this line disappears from the generated 
> configuration file: sql_field_string = document. This seems to at least let 
> indexer complete properly, but the index size is still huge: 
> > >> > 
> > >> > indexing index 'candidate_user_core'... 
> > >> > collected 199704 docs, 8478.8 MB 
> > >> > 
> > >> > It also takes a long time to go through the sorting "Mhits" step 
> now. I see how TS2 added sql_attr_string for the sort columns whereas TS3 
> adds sql_field_string - that's what you're talking about right? Is there 
> any way to either a) get around this issue, or b) force TS to use the 
> ordinal type? (everything should still work that way, correct?) 
> > >> > 
> > >> > Here's the options I set in thinking_sphinx.yml: 
> > >> > 
> > >> > development: 
> > >> >   address: localhost 
> > >> >   version: 2.0.8-release 
> > >> >   mem_limit: 256M   
> > >> >   
> > >> >   enable_star: true 
> > >> >   min_prefix_len: 2 
> > >> >   blend_chars: "@, -, &" 
> > >> >   html_strip: true 
> > >> >   max_matches: 25000 
> > >> > 
> > >> > Is there any way I can speed this up / reduce the size? 
> > >> > 
> > >> > On Thursday, July 18, 2013 1:59:11 PM UTC-4, Pat Allan wrote: 
> > >> > I think with 2.0.11 (what you were using previously, right?) TS 
> uses the ordinal attribute type, which stores an integer for each string 
> (calculated by grabbing all known values, putting them in order, returning 
> the index of each value). 
> > >> > 
> > >> > With TS v3 (and later 2.x releases if I remember correctly) it'll 
> use the native string attribute type (a relatively recent addition to 
> Sphinx), which means Sphinx is storing the real string value - which is 
> much better if you're sorting across more than one index (say, if you're 
> using deltas, or searching across multiple models). In this case, it would 
> mean Sphinx is now storing potentially a ton of data, instead of a 32-bit 
> integer per record. 
> > >> > 
> > >> > -- 
> > >> > Pat 
> > >> > 
> > >> > On 19/07/2013, at 3:53 AM, Daniel Vandersluis wrote: 
> > >> > 
> > >> > > Thanks for the response, Pat - yes, it's the same index as the 
> other thread. Good point about sorting resumes, that shouldn't be there. 
> However, why would that make such a difference between TS2 and TS3 (see my 
> other post which I added at the same time as your response)? 
> > >> > > 
> > >> > > I will try removing the sortable on resumes and see what 
> difference it makes! 
> > >> > > 
> > >> > > On Thursday, July 18, 2013 1:49:13 PM UTC-4, Pat Allan wrote: 
> > >> > > Hi Daniel 
> > >> > > 
> > >> > > If this is the same index as in the other thread, I'm guessing 
> it's the fact that you've got resumes.document sortable. A record with many 
> resumes and/or large document values could end up with massive values for 
> the underlying string attribute (that you'd sort by) - are you actually 
> sorting by this? Generally I'd be surprised if there's much point sorting 
> by large amounts of text. 
> > >> > > 
> > >> > > -- 
> > >> > > Pat 
> > >> > > 
> > >> > > On 19/07/2013, at 3:09 AM, Daniel Vandersluis wrote: 
> > >> > > 
> > >> > > > Is there any reason that an index would grow in size when 
> upgrading from thinkingsphinx 2 to 3? The only differences in the 
> configuration file is changing port to mysql41, and changing version to 
> 2.0.8-release, but an index that used to be around 500MB is now resulting 
> in this error: 
> > >> > > > 
> > >> > > > ERROR: index 'user_core': too many string attributes (current 
> index format allows up to 4 GB). 
> > >> > > > 
> > >> > > > Anyone have any idea why this would be? 
> > >> > > > 
> > >> > > > -- 
> > >> > > > You received this message because you are subscribed to the 
> Google Groups "Thinking Sphinx" group. 
> > >> > > > To unsubscribe from this group and stop receiving emails from 
> it, send an email to [email protected]. 
> > >> > > > To post to this group, send email to 
> [email protected]. 
> > >> > > > Visit this group at 
> http://groups.google.com/group/thinking-sphinx. 
> > >> > > > For more options, visit 
> https://groups.google.com/groups/opt_out. 
> > >> > > >   
> > >> > > >   
> > >> > > 
> > >> > > 
> > >> > > 
> > >> > > -- 
> > >> > > You received this message because you are subscribed to the 
> Google Groups "Thinking Sphinx" group. 
> > >> > > To unsubscribe from this group and stop receiving emails from it, 
> send an email to [email protected]. 
> > >> > > To post to this group, send email to [email protected]. 
>
> > >> > > Visit this group at 
> http://groups.google.com/group/thinking-sphinx. 
> > >> > > For more options, visit https://groups.google.com/groups/opt_out. 
>
> > >> > >   
> > >> > >   
> > >> > 
> > >> > 
> > >> > 
> > >> > -- 
> > >> > You received this message because you are subscribed to the Google 
> Groups "Thinking Sphinx" group. 
> > >> > To unsubscribe from this group and stop receiving emails from it, 
> send an email to [email protected]. 
> > >> > To post to this group, send email to [email protected]. 
> > >> > Visit this group at http://groups.google.com/group/thinking-sphinx. 
>
> > >> > For more options, visit https://groups.google.com/groups/opt_out. 
> > >> >   
> > >> >   
> > >> 
> > >> 
> > >> 
> > >> -- 
> > >> You received this message because you are subscribed to the Google 
> Groups "Thinking Sphinx" group. 
> > >> To unsubscribe from this group and stop receiving emails from it, 
> send an email to [email protected]. 
> > >> To post to this group, send email to [email protected]. 
> > >> Visit this group at http://groups.google.com/group/thinking-sphinx. 
> > >> For more options, visit https://groups.google.com/groups/opt_out. 
> > >>   
> > >>   
> > > 
> > 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "Thinking Sphinx" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to [email protected] <javascript:>. 
> > To post to this group, send email to 
> > [email protected]<javascript:>. 
>
> > Visit this group at http://groups.google.com/group/thinking-sphinx. 
> > For more options, visit https://groups.google.com/groups/opt_out. 
> >   
> >   
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/thinking-sphinx.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to