On 21/07/2013, at 1:40 PM, Daniel Vandersluis wrote: > Sorry, it's the same index, I was just simplifying the names for the purpose > of this post and missed one. Sorry for the confusion :)
Ah rightio. The change in size is pretty crazy then! > If the change was made prior to 2.0.11, wouldn't that mean that the indexes > previously would have been huge too? I would have thought so, yes. > I'm not sure I understand what you mean about sql_field_string - do > sql_field_strings take up significantly more space than sql_attr_strings do? There's no reason for them to at all. I don't know the dark arts behind the Sphinx source code though (it's C and C++, neither of which I'm confident with). -- Pat > >> On Saturday, July 20, 2013 11:18:40 PM UTC-4, Pat Allan wrote: >> Further to this: I guess I was wrong about 2.0.11 using ordinal attribute >> types instead of string attribute types - that change must have come in >> earlier. >> >> sql_attr_string is a standard string attribute (not ordinal), and >> sql_field_string stores the field value as a string attribute of the same >> name *as well as* the field. The latter removes the need for the _sort >> suffix you'll spot in sortable attributes in 2.x releases. >> >> I wouldn't expect there to be any difference between these two in terms of >> file size though. But just to compare apples with apples - you had user_core >> file sizes previously, but now it's candidate_user_core. Are there other >> large and unnecessary string attributes in the CandidateUser index? >> >> -- >> Pat >> >> On 19/07/2013, at 4:25 AM, Daniel Vandersluis wrote: >> >> > Yeah, I was using 2.0.11 previously. There does not seem to be any >> > difference with removing sortable: true from the index definition (for >> > resumes.document), except that this line disappears from the generated >> > configuration file: sql_field_string = document. This seems to at least >> > let indexer complete properly, but the index size is still huge: >> > >> > indexing index 'candidate_user_core'... >> > collected 199704 docs, 8478.8 MB >> > >> > It also takes a long time to go through the sorting "Mhits" step now. I >> > see how TS2 added sql_attr_string for the sort columns whereas TS3 adds >> > sql_field_string - that's what you're talking about right? Is there any >> > way to either a) get around this issue, or b) force TS to use the ordinal >> > type? (everything should still work that way, correct?) >> > >> > Here's the options I set in thinking_sphinx.yml: >> > >> > development: >> > address: localhost >> > version: 2.0.8-release >> > mem_limit: 256M >> > >> > enable_star: true >> > min_prefix_len: 2 >> > blend_chars: "@, -, &" >> > html_strip: true >> > max_matches: 25000 >> > >> > Is there any way I can speed this up / reduce the size? >> > >> > On Thursday, July 18, 2013 1:59:11 PM UTC-4, Pat Allan wrote: >> > I think with 2.0.11 (what you were using previously, right?) TS uses the >> > ordinal attribute type, which stores an integer for each string >> > (calculated by grabbing all known values, putting them in order, returning >> > the index of each value). >> > >> > With TS v3 (and later 2.x releases if I remember correctly) it'll use the >> > native string attribute type (a relatively recent addition to Sphinx), >> > which means Sphinx is storing the real string value - which is much better >> > if you're sorting across more than one index (say, if you're using deltas, >> > or searching across multiple models). In this case, it would mean Sphinx >> > is now storing potentially a ton of data, instead of a 32-bit integer per >> > record. >> > >> > -- >> > Pat >> > >> > On 19/07/2013, at 3:53 AM, Daniel Vandersluis wrote: >> > >> > > Thanks for the response, Pat - yes, it's the same index as the other >> > > thread. Good point about sorting resumes, that shouldn't be there. >> > > However, why would that make such a difference between TS2 and TS3 (see >> > > my other post which I added at the same time as your response)? >> > > >> > > I will try removing the sortable on resumes and see what difference it >> > > makes! >> > > >> > > On Thursday, July 18, 2013 1:49:13 PM UTC-4, Pat Allan wrote: >> > > Hi Daniel >> > > >> > > If this is the same index as in the other thread, I'm guessing it's the >> > > fact that you've got resumes.document sortable. A record with many >> > > resumes and/or large document values could end up with massive values >> > > for the underlying string attribute (that you'd sort by) - are you >> > > actually sorting by this? Generally I'd be surprised if there's much >> > > point sorting by large amounts of text. >> > > >> > > -- >> > > Pat >> > > >> > > On 19/07/2013, at 3:09 AM, Daniel Vandersluis wrote: >> > > >> > > > Is there any reason that an index would grow in size when upgrading >> > > > from thinkingsphinx 2 to 3? The only differences in the configuration >> > > > file is changing port to mysql41, and changing version to >> > > > 2.0.8-release, but an index that used to be around 500MB is now >> > > > resulting in this error: >> > > > >> > > > ERROR: index 'user_core': too many string attributes (current index >> > > > format allows up to 4 GB). >> > > > >> > > > Anyone have any idea why this would be? >> > > > >> > > > -- >> > > > You received this message because you are subscribed to the Google >> > > > Groups "Thinking Sphinx" group. >> > > > To unsubscribe from this group and stop receiving emails from it, send >> > > > an email to [email protected]. >> > > > To post to this group, send email to [email protected]. >> > > > Visit this group at http://groups.google.com/group/thinking-sphinx. >> > > > For more options, visit https://groups.google.com/groups/opt_out. >> > > > >> > > > >> > > >> > > >> > > >> > > -- >> > > You received this message because you are subscribed to the Google >> > > Groups "Thinking Sphinx" group. >> > > To unsubscribe from this group and stop receiving emails from it, send >> > > an email to [email protected]. >> > > To post to this group, send email to [email protected]. >> > > Visit this group at http://groups.google.com/group/thinking-sphinx. >> > > For more options, visit https://groups.google.com/groups/opt_out. >> > > >> > > >> > >> > >> > >> > -- >> > You received this message because you are subscribed to the Google Groups >> > "Thinking Sphinx" group. >> > To unsubscribe from this group and stop receiving emails from it, send an >> > email to [email protected]. >> > To post to this group, send email to [email protected]. >> > Visit this group at http://groups.google.com/group/thinking-sphinx. >> > For more options, visit https://groups.google.com/groups/opt_out. >> > >> > >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Thinking Sphinx" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/thinking-sphinx. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> > -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/thinking-sphinx. For more options, visit https://groups.google.com/groups/opt_out.
