On 21/07/2013, at 1:40 PM, Daniel Vandersluis wrote:

> Sorry, it's the same index, I was just simplifying the names for the purpose 
> of this post and missed one. Sorry for the confusion :)

Ah rightio. The change in size is pretty crazy then!

> If the change was made prior to 2.0.11, wouldn't that mean that the indexes 
> previously would have been huge too?

I would have thought so, yes.

> I'm not sure I understand what you mean about sql_field_string - do 
> sql_field_strings take up significantly more space than sql_attr_strings do?

There's no reason for them to at all. I don't know the dark arts behind the 
Sphinx source code though (it's C and C++, neither of which I'm confident with).

-- 
Pat

> 
>> On Saturday, July 20, 2013 11:18:40 PM UTC-4, Pat Allan wrote:
>> Further to this: I guess I was wrong about 2.0.11 using ordinal attribute 
>> types instead of string attribute types - that change must have come in 
>> earlier. 
>> 
>> sql_attr_string is a standard string attribute (not ordinal), and 
>> sql_field_string stores the field value as a string attribute of the same 
>> name *as well as* the field. The latter removes the need for the _sort 
>> suffix you'll spot in sortable attributes in 2.x releases. 
>> 
>> I wouldn't expect there to be any difference between these two in terms of 
>> file size though. But just to compare apples with apples - you had user_core 
>> file sizes previously, but now it's candidate_user_core. Are there other 
>> large and unnecessary string attributes in the CandidateUser index? 
>> 
>> -- 
>> Pat 
>> 
>> On 19/07/2013, at 4:25 AM, Daniel Vandersluis wrote: 
>> 
>> > Yeah, I was using 2.0.11 previously. There does not seem to be any 
>> > difference with removing sortable: true from the index definition (for 
>> > resumes.document), except that this line disappears from the generated 
>> > configuration file: sql_field_string = document. This seems to at least 
>> > let indexer complete properly, but the index size is still huge: 
>> > 
>> > indexing index 'candidate_user_core'... 
>> > collected 199704 docs, 8478.8 MB 
>> > 
>> > It also takes a long time to go through the sorting "Mhits" step now. I 
>> > see how TS2 added sql_attr_string for the sort columns whereas TS3 adds 
>> > sql_field_string - that's what you're talking about right? Is there any 
>> > way to either a) get around this issue, or b) force TS to use the ordinal 
>> > type? (everything should still work that way, correct?) 
>> > 
>> > Here's the options I set in thinking_sphinx.yml: 
>> > 
>> > development: 
>> >   address: localhost 
>> >   version: 2.0.8-release 
>> >   mem_limit: 256M   
>> >   
>> >   enable_star: true 
>> >   min_prefix_len: 2 
>> >   blend_chars: "@, -, &" 
>> >   html_strip: true 
>> >   max_matches: 25000 
>> > 
>> > Is there any way I can speed this up / reduce the size? 
>> > 
>> > On Thursday, July 18, 2013 1:59:11 PM UTC-4, Pat Allan wrote: 
>> > I think with 2.0.11 (what you were using previously, right?) TS uses the 
>> > ordinal attribute type, which stores an integer for each string 
>> > (calculated by grabbing all known values, putting them in order, returning 
>> > the index of each value). 
>> > 
>> > With TS v3 (and later 2.x releases if I remember correctly) it'll use the 
>> > native string attribute type (a relatively recent addition to Sphinx), 
>> > which means Sphinx is storing the real string value - which is much better 
>> > if you're sorting across more than one index (say, if you're using deltas, 
>> > or searching across multiple models). In this case, it would mean Sphinx 
>> > is now storing potentially a ton of data, instead of a 32-bit integer per 
>> > record. 
>> > 
>> > -- 
>> > Pat 
>> > 
>> > On 19/07/2013, at 3:53 AM, Daniel Vandersluis wrote: 
>> > 
>> > > Thanks for the response, Pat - yes, it's the same index as the other 
>> > > thread. Good point about sorting resumes, that shouldn't be there. 
>> > > However, why would that make such a difference between TS2 and TS3 (see 
>> > > my other post which I added at the same time as your response)? 
>> > > 
>> > > I will try removing the sortable on resumes and see what difference it 
>> > > makes! 
>> > > 
>> > > On Thursday, July 18, 2013 1:49:13 PM UTC-4, Pat Allan wrote: 
>> > > Hi Daniel 
>> > > 
>> > > If this is the same index as in the other thread, I'm guessing it's the 
>> > > fact that you've got resumes.document sortable. A record with many 
>> > > resumes and/or large document values could end up with massive values 
>> > > for the underlying string attribute (that you'd sort by) - are you 
>> > > actually sorting by this? Generally I'd be surprised if there's much 
>> > > point sorting by large amounts of text. 
>> > > 
>> > > -- 
>> > > Pat 
>> > > 
>> > > On 19/07/2013, at 3:09 AM, Daniel Vandersluis wrote: 
>> > > 
>> > > > Is there any reason that an index would grow in size when upgrading 
>> > > > from thinkingsphinx 2 to 3? The only differences in the configuration 
>> > > > file is changing port to mysql41, and changing version to 
>> > > > 2.0.8-release, but an index that used to be around 500MB is now 
>> > > > resulting in this error: 
>> > > > 
>> > > > ERROR: index 'user_core': too many string attributes (current index 
>> > > > format allows up to 4 GB). 
>> > > > 
>> > > > Anyone have any idea why this would be? 
>> > > > 
>> > > > -- 
>> > > > You received this message because you are subscribed to the Google 
>> > > > Groups "Thinking Sphinx" group. 
>> > > > To unsubscribe from this group and stop receiving emails from it, send 
>> > > > an email to [email protected]. 
>> > > > To post to this group, send email to [email protected]. 
>> > > > Visit this group at http://groups.google.com/group/thinking-sphinx. 
>> > > > For more options, visit https://groups.google.com/groups/opt_out. 
>> > > >   
>> > > >   
>> > > 
>> > > 
>> > > 
>> > > -- 
>> > > You received this message because you are subscribed to the Google 
>> > > Groups "Thinking Sphinx" group. 
>> > > To unsubscribe from this group and stop receiving emails from it, send 
>> > > an email to [email protected]. 
>> > > To post to this group, send email to [email protected]. 
>> > > Visit this group at http://groups.google.com/group/thinking-sphinx. 
>> > > For more options, visit https://groups.google.com/groups/opt_out. 
>> > >   
>> > >   
>> > 
>> > 
>> > 
>> > -- 
>> > You received this message because you are subscribed to the Google Groups 
>> > "Thinking Sphinx" group. 
>> > To unsubscribe from this group and stop receiving emails from it, send an 
>> > email to [email protected]. 
>> > To post to this group, send email to [email protected]. 
>> > Visit this group at http://groups.google.com/group/thinking-sphinx. 
>> > For more options, visit https://groups.google.com/groups/opt_out. 
>> >   
>> >   
>> 
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Thinking Sphinx" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/thinking-sphinx.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>  
>>  
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/thinking-sphinx.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to