Thanks .

How do I definitively determine , if a segment has been completely
parsed , if I were to set up a hourly crontab to delete the segments
from HDFS? I have seen that the presence of the crawl_parse directory
in the segments directory at least indicates that the parsing has
started , but I think the directory would be created as  soon as the
parsing begins.

So as to not delete the segments prematurely , while it is still being
fetched , what should I be looking for in my script ?

On Sun, Nov 2, 2014 at 7:58 PM, remi tassing <[email protected]> wrote:
> The next fetching time is computed after "updatedb" is isssued with that
> segment
>
> So as long as you don't need the parsed data anymore then you can delete
> the segment (e.g. after indexing through Solr...).
>
>
>
> On Mon, Nov 3, 2014 at 8:41 AM, Meraj A. Khan <[email protected]> wrote:
>
>> Hi All,
>>
>> I am deleting the segments as soon as they are fetched and parsed , I
>> have read in previous posts that it is safe to delete the segments
>> only if it is older than the db.default.fetch.interval , my
>> understanding is that one does have to wait for the segment to be
>> older than db.default.fetch.interval, but can delete it as soon as the
>> segment is parsed.
>>
>> Is my understanding correct ? I want to delete the segment as soon as
>> possible so as to save as much disk space as possible.
>>
>> Thanks.
>>

Reply via email to