Does it make any sense for the size to not change even with enable_star and 
min_prefix_len being disabled?

On Monday, July 22, 2013 11:20:03 AM UTC-4, Daniel Vandersluis wrote:
>
> There must be something weird going on here - when I added job_ids to the 
> index (as per the other thread) with the latest master from github, the 
> index size grows even more, up to 11GB now... 
>
> On Saturday, July 20, 2013 11:43:38 PM UTC-4, Pat Allan wrote:
>>
>> On 21/07/2013, at 1:40 PM, Daniel Vandersluis wrote: 
>>
>> > Sorry, it's the same index, I was just simplifying the names for the 
>> purpose of this post and missed one. Sorry for the confusion :) 
>>
>> Ah rightio. The change in size is pretty crazy then! 
>>
>> > If the change was made prior to 2.0.11, wouldn't that mean that the 
>> indexes previously would have been huge too? 
>>
>> I would have thought so, yes. 
>>
>> > I'm not sure I understand what you mean about sql_field_string - do 
>> sql_field_strings take up significantly more space than sql_attr_strings 
>> do? 
>>
>> There's no reason for them to at all. I don't know the dark arts behind 
>> the Sphinx source code though (it's C and C++, neither of which I'm 
>> confident with). 
>>
>> -- 
>> Pat 
>>
>> > 
>> >> On Saturday, July 20, 2013 11:18:40 PM UTC-4, Pat Allan wrote: 
>> >> Further to this: I guess I was wrong about 2.0.11 using ordinal 
>> attribute types instead of string attribute types - that change must have 
>> come in earlier. 
>> >> 
>> >> sql_attr_string is a standard string attribute (not ordinal), and 
>> sql_field_string stores the field value as a string attribute of the same 
>> name *as well as* the field. The latter removes the need for the _sort 
>> suffix you'll spot in sortable attributes in 2.x releases. 
>> >> 
>> >> I wouldn't expect there to be any difference between these two in 
>> terms of file size though. But just to compare apples with apples - you had 
>> user_core file sizes previously, but now it's candidate_user_core. Are 
>> there other large and unnecessary string attributes in the CandidateUser 
>> index? 
>> >> 
>> >> -- 
>> >> Pat 
>> >> 
>> >> On 19/07/2013, at 4:25 AM, Daniel Vandersluis wrote: 
>> >> 
>> >> > Yeah, I was using 2.0.11 previously. There does not seem to be any 
>> difference with removing sortable: true from the index definition (for 
>> resumes.document), except that this line disappears from the generated 
>> configuration file: sql_field_string = document. This seems to at least let 
>> indexer complete properly, but the index size is still huge: 
>> >> > 
>> >> > indexing index 'candidate_user_core'... 
>> >> > collected 199704 docs, 8478.8 MB 
>> >> > 
>> >> > It also takes a long time to go through the sorting "Mhits" step 
>> now. I see how TS2 added sql_attr_string for the sort columns whereas TS3 
>> adds sql_field_string - that's what you're talking about right? Is there 
>> any way to either a) get around this issue, or b) force TS to use the 
>> ordinal type? (everything should still work that way, correct?) 
>> >> > 
>> >> > Here's the options I set in thinking_sphinx.yml: 
>> >> > 
>> >> > development: 
>> >> >   address: localhost 
>> >> >   version: 2.0.8-release 
>> >> >   mem_limit: 256M   
>> >> >   
>> >> >   enable_star: true 
>> >> >   min_prefix_len: 2 
>> >> >   blend_chars: "@, -, &" 
>> >> >   html_strip: true 
>> >> >   max_matches: 25000 
>> >> > 
>> >> > Is there any way I can speed this up / reduce the size? 
>> >> > 
>> >> > On Thursday, July 18, 2013 1:59:11 PM UTC-4, Pat Allan wrote: 
>> >> > I think with 2.0.11 (what you were using previously, right?) TS uses 
>> the ordinal attribute type, which stores an integer for each string 
>> (calculated by grabbing all known values, putting them in order, returning 
>> the index of each value). 
>> >> > 
>> >> > With TS v3 (and later 2.x releases if I remember correctly) it'll 
>> use the native string attribute type (a relatively recent addition to 
>> Sphinx), which means Sphinx is storing the real string value - which is 
>> much better if you're sorting across more than one index (say, if you're 
>> using deltas, or searching across multiple models). In this case, it would 
>> mean Sphinx is now storing potentially a ton of data, instead of a 32-bit 
>> integer per record. 
>> >> > 
>> >> > -- 
>> >> > Pat 
>> >> > 
>> >> > On 19/07/2013, at 3:53 AM, Daniel Vandersluis wrote: 
>> >> > 
>> >> > > Thanks for the response, Pat - yes, it's the same index as the 
>> other thread. Good point about sorting resumes, that shouldn't be there. 
>> However, why would that make such a difference between TS2 and TS3 (see my 
>> other post which I added at the same time as your response)? 
>> >> > > 
>> >> > > I will try removing the sortable on resumes and see what 
>> difference it makes! 
>> >> > > 
>> >> > > On Thursday, July 18, 2013 1:49:13 PM UTC-4, Pat Allan wrote: 
>> >> > > Hi Daniel 
>> >> > > 
>> >> > > If this is the same index as in the other thread, I'm guessing 
>> it's the fact that you've got resumes.document sortable. A record with many 
>> resumes and/or large document values could end up with massive values for 
>> the underlying string attribute (that you'd sort by) - are you actually 
>> sorting by this? Generally I'd be surprised if there's much point sorting 
>> by large amounts of text. 
>> >> > > 
>> >> > > -- 
>> >> > > Pat 
>> >> > > 
>> >> > > On 19/07/2013, at 3:09 AM, Daniel Vandersluis wrote: 
>> >> > > 
>> >> > > > Is there any reason that an index would grow in size when 
>> upgrading from thinkingsphinx 2 to 3? The only differences in the 
>> configuration file is changing port to mysql41, and changing version to 
>> 2.0.8-release, but an index that used to be around 500MB is now resulting 
>> in this error: 
>> >> > > > 
>> >> > > > ERROR: index 'user_core': too many string attributes (current 
>> index format allows up to 4 GB). 
>> >> > > > 
>> >> > > > Anyone have any idea why this would be? 
>> >> > > > 
>> >> > > > -- 
>> >> > > > You received this message because you are subscribed to the 
>> Google Groups "Thinking Sphinx" group. 
>> >> > > > To unsubscribe from this group and stop receiving emails from 
>> it, send an email to [email protected]. 
>> >> > > > To post to this group, send email to [email protected]. 
>>
>> >> > > > Visit this group at 
>> http://groups.google.com/group/thinking-sphinx. 
>> >> > > > For more options, visit https://groups.google.com/groups/opt_out. 
>>
>> >> > > >   
>> >> > > >   
>> >> > > 
>> >> > > 
>> >> > > 
>> >> > > -- 
>> >> > > You received this message because you are subscribed to the Google 
>> Groups "Thinking Sphinx" group. 
>> >> > > To unsubscribe from this group and stop receiving emails from it, 
>> send an email to [email protected]. 
>> >> > > To post to this group, send email to [email protected]. 
>> >> > > Visit this group at http://groups.google.com/group/thinking-sphinx. 
>>
>> >> > > For more options, visit https://groups.google.com/groups/opt_out. 
>> >> > >   
>> >> > >   
>> >> > 
>> >> > 
>> >> > 
>> >> > -- 
>> >> > You received this message because you are subscribed to the Google 
>> Groups "Thinking Sphinx" group. 
>> >> > To unsubscribe from this group and stop receiving emails from it, 
>> send an email to [email protected]. 
>> >> > To post to this group, send email to [email protected]. 
>> >> > Visit this group at http://groups.google.com/group/thinking-sphinx. 
>> >> > For more options, visit https://groups.google.com/groups/opt_out. 
>> >> >   
>> >> >   
>> >> 
>> >> 
>> >> 
>> >> -- 
>> >> You received this message because you are subscribed to the Google 
>> Groups "Thinking Sphinx" group. 
>> >> To unsubscribe from this group and stop receiving emails from it, send 
>> an email to [email protected]. 
>> >> To post to this group, send email to [email protected]. 
>> >> Visit this group at http://groups.google.com/group/thinking-sphinx. 
>> >> For more options, visit https://groups.google.com/groups/opt_out. 
>> >>   
>> >>   
>> > 
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/thinking-sphinx.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to