Re: Reindex Question / Clarification

2020-09-21 Thread Janos SUTO
Hello Ryan,

reindexing won't cause any waste in sphinx, so go ahead.

Btw. you have several main indices, eg. main1, main2, etc. It's possible to use 
main1 for 2017 data, main2 for 2018 data, etc. In that case you may move main1 
to a backup media to free up some space. The trick is to fix both 
indexer.delta.sh and indexer.main.sh to reference the given main index.

Janos

On 22 Sep 2020, 02:37, at 02:37, Ryan Blenis  wrote:
>Hi Janos,
>
>Quick question on reindexing and its effect on the size of the sphinx
>files.
>
>If I reindex something that is already in the index, will there be
>duplicates in sphinx wasting space, or is this detected and removed?
>
>Background: We weren't searching for anything old normally, so we
>cleared
>the sphinx data and only indexed the last years worth of data. Now we
>have
>requests for data from parts of 2018, 2017, etc. that we've reindexed
>as
>needed to make searchable, but the more requests come in it seems it
>would
>be better to reindex everything and just throw resources at it if
>needed
>and cut out the "manual" process of manually indexing anything prior to
>a
>year ago as the requests come in. If I reindex everything, will I be
>wasting space and should start fresh, or will the duplicates be
>filtered
>out at some point in the process (e.g. not wasting space)?
>
>Thank you as always.


Reindex Question / Clarification

2020-09-21 Thread Ryan Blenis
Hi Janos,

Quick question on reindexing and its effect on the size of the sphinx files.

If I reindex something that is already in the index, will there be
duplicates in sphinx wasting space, or is this detected and removed?

Background: We weren't searching for anything old normally, so we cleared
the sphinx data and only indexed the last years worth of data. Now we have
requests for data from parts of 2018, 2017, etc. that we've reindexed as
needed to make searchable, but the more requests come in it seems it would
be better to reindex everything and just throw resources at it if needed
and cut out the "manual" process of manually indexing anything prior to a
year ago as the requests come in. If I reindex everything, will I be
wasting space and should start fresh, or will the duplicates be filtered
out at some point in the process (e.g. not wasting space)?

Thank you as always.