Hi Graham Sphinx only talks to (some) SQL databases, or XML. It sounds like the latter is the better option (write a script to pull the data out of S3 into an XML stream every time you index), but the catch there is that Thinking Sphinx doesn't have any hooks for XML options at this point.
Also, I'm not sure if you can mix SQL and XML sources into the same index, but there will be ways to work around that (at a pinch, the XML script would just talk to the database as well as S3). This also means that indexing is going to become much slower, because of all the HTTP request involved (unless there's some way of getting to the data in bulk?) Every now and then someone raises an issue like yours, but it's not been often enough to make it a high priority... I appreciate your offer of funding this kind of extension (well, if you're still interested, now you have an idea of what's involved), but I've got no spare time for the next two months - even Rails 3 support is a while away. So, if you're cool with waiting, then maybe this is an option - otherwise, I'm not sure what the best way forward is for you - if this is a key feature, then maybe Sphinx/TS isn't the best search solution for you currently. Cheers -- Pat e: [email protected] || m: +614 1327 3337 w: http://freelancing-gods.com || t: twitter.com/pat bounce: http://trampolineday.com || skype: patallan On 10/03/2010, at 5:30 AM, Graham Glass wrote: > Hi everyone, > > I'm the founder of EDU 2.0 (http://www.edu20.org) and due to the rapid > growth of the site, much of the text that used to be stored in MySQL > is now being stored in Amazon S3. > > For example, if a lesson is created that has 100K of HTML in it, we > store the text itself in S3 and just hold its name + URL in MySQL. > This approach has allowed us to shrink our MySQL memory requirements > dramatically, which is important for the long-term. > > We already use Sphinx to index things like messages, forum postings, > etc. but not lessons. Now we'd like to start indexing the lessons as > well. So my question is - what is the best way to use Thinking Sphinx > if text is stored in S3? Is there a way to use Sphinx to update its > indexes as files are uploaded to S3? That way, when I search for a > term, I could get back a list of URLs where the terms were found. > > One last thing; we keep a MySQL table that keeps track of every single > S3 file + its URL, so its easy to correlate the S3 files with their > URLs. > > I would be willing to fund extensions to Thinking Sphinx to make this > possible! > > Cheers, > Graham > > -- > You received this message because you are subscribed to the Google Groups > "Thinking Sphinx" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/thinking-sphinx?hl=en. > -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.
