That is surprising - removing min_prefix_len should certainly drop index file 
sizes down.

It's worth noting the fix I just mentioned in the other thread should remove 
the extra join, and this should reduce the amount of data your database passes 
through to Sphinx. So: it may help return things to what you're expecting. Give 
it a shot, let me know.

-- 
Pat

On 23/07/2013, at 6:12 AM, Daniel Vandersluis wrote:

> Does it make any sense for the size to not change even with enable_star and 
> min_prefix_len being disabled?
> 
> On Monday, July 22, 2013 11:20:03 AM UTC-4, Daniel Vandersluis wrote:
> There must be something weird going on here - when I added job_ids to the 
> index (as per the other thread) with the latest master from github, the index 
> size grows even more, up to 11GB now... 
> 
> On Saturday, July 20, 2013 11:43:38 PM UTC-4, Pat Allan wrote:
> On 21/07/2013, at 1:40 PM, Daniel Vandersluis wrote: 
> 
> > Sorry, it's the same index, I was just simplifying the names for the 
> > purpose of this post and missed one. Sorry for the confusion :) 
> 
> Ah rightio. The change in size is pretty crazy then! 
> 
> > If the change was made prior to 2.0.11, wouldn't that mean that the indexes 
> > previously would have been huge too? 
> 
> I would have thought so, yes. 
> 
> > I'm not sure I understand what you mean about sql_field_string - do 
> > sql_field_strings take up significantly more space than sql_attr_strings 
> > do? 
> 
> There's no reason for them to at all. I don't know the dark arts behind the 
> Sphinx source code though (it's C and C++, neither of which I'm confident 
> with). 
> 
> -- 
> Pat 
> 
> > 
> >> On Saturday, July 20, 2013 11:18:40 PM UTC-4, Pat Allan wrote: 
> >> Further to this: I guess I was wrong about 2.0.11 using ordinal attribute 
> >> types instead of string attribute types - that change must have come in 
> >> earlier. 
> >> 
> >> sql_attr_string is a standard string attribute (not ordinal), and 
> >> sql_field_string stores the field value as a string attribute of the same 
> >> name *as well as* the field. The latter removes the need for the _sort 
> >> suffix you'll spot in sortable attributes in 2.x releases. 
> >> 
> >> I wouldn't expect there to be any difference between these two in terms of 
> >> file size though. But just to compare apples with apples - you had 
> >> user_core file sizes previously, but now it's candidate_user_core. Are 
> >> there other large and unnecessary string attributes in the CandidateUser 
> >> index? 
> >> 
> >> -- 
> >> Pat 
> >> 
> >> On 19/07/2013, at 4:25 AM, Daniel Vandersluis wrote: 
> >> 
> >> > Yeah, I was using 2.0.11 previously. There does not seem to be any 
> >> > difference with removing sortable: true from the index definition (for 
> >> > resumes.document), except that this line disappears from the generated 
> >> > configuration file: sql_field_string = document. This seems to at least 
> >> > let indexer complete properly, but the index size is still huge: 
> >> > 
> >> > indexing index 'candidate_user_core'... 
> >> > collected 199704 docs, 8478.8 MB 
> >> > 
> >> > It also takes a long time to go through the sorting "Mhits" step now. I 
> >> > see how TS2 added sql_attr_string for the sort columns whereas TS3 adds 
> >> > sql_field_string - that's what you're talking about right? Is there any 
> >> > way to either a) get around this issue, or b) force TS to use the 
> >> > ordinal type? (everything should still work that way, correct?) 
> >> > 
> >> > Here's the options I set in thinking_sphinx.yml: 
> >> > 
> >> > development: 
> >> >   address: localhost 
> >> >   version: 2.0.8-release 
> >> >   mem_limit: 256M   
> >> >   
> >> >   enable_star: true 
> >> >   min_prefix_len: 2 
> >> >   blend_chars: "@, -, &" 
> >> >   html_strip: true 
> >> >   max_matches: 25000 
> >> > 
> >> > Is there any way I can speed this up / reduce the size? 
> >> > 
> >> > On Thursday, July 18, 2013 1:59:11 PM UTC-4, Pat Allan wrote: 
> >> > I think with 2.0.11 (what you were using previously, right?) TS uses the 
> >> > ordinal attribute type, which stores an integer for each string 
> >> > (calculated by grabbing all known values, putting them in order, 
> >> > returning the index of each value). 
> >> > 
> >> > With TS v3 (and later 2.x releases if I remember correctly) it'll use 
> >> > the native string attribute type (a relatively recent addition to 
> >> > Sphinx), which means Sphinx is storing the real string value - which is 
> >> > much better if you're sorting across more than one index (say, if you're 
> >> > using deltas, or searching across multiple models). In this case, it 
> >> > would mean Sphinx is now storing potentially a ton of data, instead of a 
> >> > 32-bit integer per record. 
> >> > 
> >> > -- 
> >> > Pat 
> >> > 
> >> > On 19/07/2013, at 3:53 AM, Daniel Vandersluis wrote: 
> >> > 
> >> > > Thanks for the response, Pat - yes, it's the same index as the other 
> >> > > thread. Good point about sorting resumes, that shouldn't be there. 
> >> > > However, why would that make such a difference between TS2 and TS3 
> >> > > (see my other post which I added at the same time as your response)? 
> >> > > 
> >> > > I will try removing the sortable on resumes and see what difference it 
> >> > > makes! 
> >> > > 
> >> > > On Thursday, July 18, 2013 1:49:13 PM UTC-4, Pat Allan wrote: 
> >> > > Hi Daniel 
> >> > > 
> >> > > If this is the same index as in the other thread, I'm guessing it's 
> >> > > the fact that you've got resumes.document sortable. A record with many 
> >> > > resumes and/or large document values could end up with massive values 
> >> > > for the underlying string attribute (that you'd sort by) - are you 
> >> > > actually sorting by this? Generally I'd be surprised if there's much 
> >> > > point sorting by large amounts of text. 
> >> > > 
> >> > > -- 
> >> > > Pat 
> >> > > 
> >> > > On 19/07/2013, at 3:09 AM, Daniel Vandersluis wrote: 
> >> > > 
> >> > > > Is there any reason that an index would grow in size when upgrading 
> >> > > > from thinkingsphinx 2 to 3? The only differences in the 
> >> > > > configuration file is changing port to mysql41, and changing version 
> >> > > > to 2.0.8-release, but an index that used to be around 500MB is now 
> >> > > > resulting in this error: 
> >> > > > 
> >> > > > ERROR: index 'user_core': too many string attributes (current index 
> >> > > > format allows up to 4 GB). 
> >> > > > 
> >> > > > Anyone have any idea why this would be? 
> >> > > > 
> >> > > > -- 
> >> > > > You received this message because you are subscribed to the Google 
> >> > > > Groups "Thinking Sphinx" group. 
> >> > > > To unsubscribe from this group and stop receiving emails from it, 
> >> > > > send an email to [email protected]. 
> >> > > > To post to this group, send email to [email protected]. 
> >> > > > Visit this group at http://groups.google.com/group/thinking-sphinx. 
> >> > > > For more options, visit https://groups.google.com/groups/opt_out. 
> >> > > >   
> >> > > >   
> >> > > 
> >> > > 
> >> > > 
> >> > > -- 
> >> > > You received this message because you are subscribed to the Google 
> >> > > Groups "Thinking Sphinx" group. 
> >> > > To unsubscribe from this group and stop receiving emails from it, send 
> >> > > an email to [email protected]. 
> >> > > To post to this group, send email to [email protected]. 
> >> > > Visit this group at http://groups.google.com/group/thinking-sphinx. 
> >> > > For more options, visit https://groups.google.com/groups/opt_out. 
> >> > >   
> >> > >   
> >> > 
> >> > 
> >> > 
> >> > -- 
> >> > You received this message because you are subscribed to the Google 
> >> > Groups "Thinking Sphinx" group. 
> >> > To unsubscribe from this group and stop receiving emails from it, send 
> >> > an email to [email protected]. 
> >> > To post to this group, send email to [email protected]. 
> >> > Visit this group at http://groups.google.com/group/thinking-sphinx. 
> >> > For more options, visit https://groups.google.com/groups/opt_out. 
> >> >   
> >> >   
> >> 
> >> 
> >> 
> >> -- 
> >> You received this message because you are subscribed to the Google Groups 
> >> "Thinking Sphinx" group. 
> >> To unsubscribe from this group and stop receiving emails from it, send an 
> >> email to [email protected]. 
> >> To post to this group, send email to [email protected]. 
> >> Visit this group at http://groups.google.com/group/thinking-sphinx. 
> >> For more options, visit https://groups.google.com/groups/opt_out. 
> >>   
> >>   
> > 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Thinking Sphinx" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/thinking-sphinx.
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>  


-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/thinking-sphinx.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to