Does this make any sense: I added source: :query to each of my has_many attributes, and suddenly indexing is fast again and back down to < 500MB...
On Monday, July 22, 2013 8:04:45 PM UTC-4, Pat Allan wrote: > > That is surprising - removing min_prefix_len should certainly drop index > file sizes down. > > It's worth noting the fix I just mentioned in the other thread should > remove the extra join, and this should reduce the amount of data your > database passes through to Sphinx. So: it may help return things to what > you're expecting. Give it a shot, let me know. > > -- > Pat > > On 23/07/2013, at 6:12 AM, Daniel Vandersluis wrote: > > > Does it make any sense for the size to not change even with enable_star > and min_prefix_len being disabled? > > > > On Monday, July 22, 2013 11:20:03 AM UTC-4, Daniel Vandersluis wrote: > > There must be something weird going on here - when I added job_ids to > the index (as per the other thread) with the latest master from github, the > index size grows even more, up to 11GB now... > > > > On Saturday, July 20, 2013 11:43:38 PM UTC-4, Pat Allan wrote: > > On 21/07/2013, at 1:40 PM, Daniel Vandersluis wrote: > > > > > Sorry, it's the same index, I was just simplifying the names for the > purpose of this post and missed one. Sorry for the confusion :) > > > > Ah rightio. The change in size is pretty crazy then! > > > > > If the change was made prior to 2.0.11, wouldn't that mean that the > indexes previously would have been huge too? > > > > I would have thought so, yes. > > > > > I'm not sure I understand what you mean about sql_field_string - do > sql_field_strings take up significantly more space than sql_attr_strings > do? > > > > There's no reason for them to at all. I don't know the dark arts behind > the Sphinx source code though (it's C and C++, neither of which I'm > confident with). > > > > -- > > Pat > > > > > > > >> On Saturday, July 20, 2013 11:18:40 PM UTC-4, Pat Allan wrote: > > >> Further to this: I guess I was wrong about 2.0.11 using ordinal > attribute types instead of string attribute types - that change must have > come in earlier. > > >> > > >> sql_attr_string is a standard string attribute (not ordinal), and > sql_field_string stores the field value as a string attribute of the same > name *as well as* the field. The latter removes the need for the _sort > suffix you'll spot in sortable attributes in 2.x releases. > > >> > > >> I wouldn't expect there to be any difference between these two in > terms of file size though. But just to compare apples with apples - you had > user_core file sizes previously, but now it's candidate_user_core. Are > there other large and unnecessary string attributes in the CandidateUser > index? > > >> > > >> -- > > >> Pat > > >> > > >> On 19/07/2013, at 4:25 AM, Daniel Vandersluis wrote: > > >> > > >> > Yeah, I was using 2.0.11 previously. There does not seem to be any > difference with removing sortable: true from the index definition (for > resumes.document), except that this line disappears from the generated > configuration file: sql_field_string = document. This seems to at least let > indexer complete properly, but the index size is still huge: > > >> > > > >> > indexing index 'candidate_user_core'... > > >> > collected 199704 docs, 8478.8 MB > > >> > > > >> > It also takes a long time to go through the sorting "Mhits" step > now. I see how TS2 added sql_attr_string for the sort columns whereas TS3 > adds sql_field_string - that's what you're talking about right? Is there > any way to either a) get around this issue, or b) force TS to use the > ordinal type? (everything should still work that way, correct?) > > >> > > > >> > Here's the options I set in thinking_sphinx.yml: > > >> > > > >> > development: > > >> > address: localhost > > >> > version: 2.0.8-release > > >> > mem_limit: 256M > > >> > > > >> > enable_star: true > > >> > min_prefix_len: 2 > > >> > blend_chars: "@, -, &" > > >> > html_strip: true > > >> > max_matches: 25000 > > >> > > > >> > Is there any way I can speed this up / reduce the size? > > >> > > > >> > On Thursday, July 18, 2013 1:59:11 PM UTC-4, Pat Allan wrote: > > >> > I think with 2.0.11 (what you were using previously, right?) TS > uses the ordinal attribute type, which stores an integer for each string > (calculated by grabbing all known values, putting them in order, returning > the index of each value). > > >> > > > >> > With TS v3 (and later 2.x releases if I remember correctly) it'll > use the native string attribute type (a relatively recent addition to > Sphinx), which means Sphinx is storing the real string value - which is > much better if you're sorting across more than one index (say, if you're > using deltas, or searching across multiple models). In this case, it would > mean Sphinx is now storing potentially a ton of data, instead of a 32-bit > integer per record. > > >> > > > >> > -- > > >> > Pat > > >> > > > >> > On 19/07/2013, at 3:53 AM, Daniel Vandersluis wrote: > > >> > > > >> > > Thanks for the response, Pat - yes, it's the same index as the > other thread. Good point about sorting resumes, that shouldn't be there. > However, why would that make such a difference between TS2 and TS3 (see my > other post which I added at the same time as your response)? > > >> > > > > >> > > I will try removing the sortable on resumes and see what > difference it makes! > > >> > > > > >> > > On Thursday, July 18, 2013 1:49:13 PM UTC-4, Pat Allan wrote: > > >> > > Hi Daniel > > >> > > > > >> > > If this is the same index as in the other thread, I'm guessing > it's the fact that you've got resumes.document sortable. A record with many > resumes and/or large document values could end up with massive values for > the underlying string attribute (that you'd sort by) - are you actually > sorting by this? Generally I'd be surprised if there's much point sorting > by large amounts of text. > > >> > > > > >> > > -- > > >> > > Pat > > >> > > > > >> > > On 19/07/2013, at 3:09 AM, Daniel Vandersluis wrote: > > >> > > > > >> > > > Is there any reason that an index would grow in size when > upgrading from thinkingsphinx 2 to 3? The only differences in the > configuration file is changing port to mysql41, and changing version to > 2.0.8-release, but an index that used to be around 500MB is now resulting > in this error: > > >> > > > > > >> > > > ERROR: index 'user_core': too many string attributes (current > index format allows up to 4 GB). > > >> > > > > > >> > > > Anyone have any idea why this would be? > > >> > > > > > >> > > > -- > > >> > > > You received this message because you are subscribed to the > Google Groups "Thinking Sphinx" group. > > >> > > > To unsubscribe from this group and stop receiving emails from > it, send an email to [email protected]. > > >> > > > To post to this group, send email to > [email protected]. > > >> > > > Visit this group at > http://groups.google.com/group/thinking-sphinx. > > >> > > > For more options, visit > https://groups.google.com/groups/opt_out. > > >> > > > > > >> > > > > > >> > > > > >> > > > > >> > > > > >> > > -- > > >> > > You received this message because you are subscribed to the > Google Groups "Thinking Sphinx" group. > > >> > > To unsubscribe from this group and stop receiving emails from it, > send an email to [email protected]. > > >> > > To post to this group, send email to [email protected]. > > > >> > > Visit this group at > http://groups.google.com/group/thinking-sphinx. > > >> > > For more options, visit https://groups.google.com/groups/opt_out. > > > >> > > > > >> > > > > >> > > > >> > > > >> > > > >> > -- > > >> > You received this message because you are subscribed to the Google > Groups "Thinking Sphinx" group. > > >> > To unsubscribe from this group and stop receiving emails from it, > send an email to [email protected]. > > >> > To post to this group, send email to [email protected]. > > >> > Visit this group at http://groups.google.com/group/thinking-sphinx. > > > >> > For more options, visit https://groups.google.com/groups/opt_out. > > >> > > > >> > > > >> > > >> > > >> > > >> -- > > >> You received this message because you are subscribed to the Google > Groups "Thinking Sphinx" group. > > >> To unsubscribe from this group and stop receiving emails from it, > send an email to [email protected]. > > >> To post to this group, send email to [email protected]. > > >> Visit this group at http://groups.google.com/group/thinking-sphinx. > > >> For more options, visit https://groups.google.com/groups/opt_out. > > >> > > >> > > > > > > > > > -- > > You received this message because you are subscribed to the Google > Groups "Thinking Sphinx" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected] <javascript:>. > > To post to this group, send email to > > [email protected]<javascript:>. > > > Visit this group at http://groups.google.com/group/thinking-sphinx. > > For more options, visit https://groups.google.com/groups/opt_out. > > > > > > > -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/thinking-sphinx. For more options, visit https://groups.google.com/groups/opt_out.
