[ts] Random discussion (filtering data to be indexed data, ETL...)

Thibaut Barrère Mon, 02 Feb 2009 02:34:36 -0800

Hi,

I'd love to have some rough ideas or suggestions from you guys - on
how to make my system more robust. I hope, too, that some ideas or
technics described here will be interesting to others :)


I have a small dataset (~ 5k items) that I reimport completely from an
external system and reindex each night. I need to ensure that if for
some reason the import fails, I'll leave the previous data indexed and
available on the website.

At the beginning I was doing this: http://gist.github.com/56865 (the
ETL syntax is TinyTL syntax, largely based on Activewarehouse-ETL).

But obviously there is a big caveat as I first delete all the data. If
either the rest of the ETL or the index fails, we're basically screwed
and now there is no data on the website.

A better version of this marks the existing data for deletion first:
http://gist.github.com/56868

But here I still have to delete_all the data marked for deletion
_before_ indexing it.

Would there be some way of telling sphinx to index only the items with
"marked_for_deletion <> 1" ? I could modify the search I do, but I'd
rather avoid that...

Thoughts, ideas ?

Do other use ETL process here to feed there sphinx data ?

cheers!

-- Thibaut


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en
-~----------~----~----~----~------~----~------~--~---

[ts] Random discussion (filtering data to be indexed data, ETL...)

Reply via email to