Re: ttl on merge-time possible somehow ?

2016-12-20 Thread Dorian Hoxha
On Mon, Dec 19, 2016 at 7:03 PM, Chris Hostetter 
wrote:

>
> : So, the other way this can be made better in my opinion is (if the
> : optimization is not already there)
> : Is to make the 'delete-query' on ttl-documents operation on translog to
> not
> : be forced to fsync to disk (so still written to translog, but no fsync).
> : The another index/delete happens, it will also fsync the translog of the
> : previous 'delete ttl query'.
> : If the server crashes, meaning we lost those deletes because the translog
> : wasn't fsynced to disk, then a thread can run on startup to recheck
> : ttl-deletes.
> : This will make it so the delete-query comes "free" in disk-fsync on
> : translog.
> : Makes sense ?
>
> All updates in Solr operate on both the in memory IndexWriter and the
> (Solr specific) transaction log, and only when a "hard commit" happens is
> the IndexWriter closed (causing segment files to fsync) ... the TTL code
> only does a "soft commit" which should not do any fsyncs on the index.
>
I wasn't talking about "commiting" segments, I'm talking about not fsyncing
the translog on the delete-by-query-for-ttl.

It does function like normals db, meaning you append to log most of the
time, and when doing a commit(checkpoint), you write the segment and 'cut'
the log (so we don't read old log on restart to go to latest state). But
you have to fsync the log to be sure it's on disk.

>
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: ttl on merge-time possible somehow ?

2016-12-19 Thread Chris Hostetter

: So, the other way this can be made better in my opinion is (if the
: optimization is not already there)
: Is to make the 'delete-query' on ttl-documents operation on translog to not
: be forced to fsync to disk (so still written to translog, but no fsync).
: The another index/delete happens, it will also fsync the translog of the
: previous 'delete ttl query'.
: If the server crashes, meaning we lost those deletes because the translog
: wasn't fsynced to disk, then a thread can run on startup to recheck
: ttl-deletes.
: This will make it so the delete-query comes "free" in disk-fsync on
: translog.
: Makes sense ?

All updates in Solr operate on both the in memory IndexWriter and the 
(Solr specific) transaction log, and only when a "hard commit" happens is 
the IndexWriter closed (causing segment files to fsync) ... the TTL code 
only does a "soft commit" which should not do any fsyncs on the index.



-Hoss
http://www.lucidworks.com/


Re: ttl on merge-time possible somehow ?

2016-12-17 Thread Dorian Hoxha
On Sat, Dec 17, 2016 at 12:04 AM, Chris Hostetter 
wrote:

>
> : > lucene, something has to "mark" the segements as deleted in order for
> them
> ...
> : Note, it doesn't mark the "segment", it marks the "document".
>
> correct, typo on my part -- sorry.
>
> : > The disatisfaction you expressed with this approach confuses me...
> : >
> : Really ?
> : If you have many expiring docs
>
> ...you didn't seem to finish that thought so i'm still not really sure
> what your're suggestion is in terms of why an alternative would be more
> efficient.
>
Sorry about that. The reason why (i think/thought) it won't be as
efficient, is because in some cases, like mine, all docs will expire,
rather fast (30 minutes in my case), so there will be a large number of
"deletes", which I thought were expensive.

So, if rocksdb would do it this way, it would have to keep 1 index on the
ttl-timestamp and then issue 2 deletes (to delete the index, original row).
While in lucene, because the storage is different, this is ~just a
deleted_bitmap[x]=1, which if you disable translog fsync (only for
ttl-delete) should be really fast and nonblocking(my issue).

So, the other way this can be made better in my opinion is (if the
optimization is not already there)
Is to make the 'delete-query' on ttl-documents operation on translog to not
be forced to fsync to disk (so still written to translog, but no fsync).
The another index/delete happens, it will also fsync the translog of the
previous 'delete ttl query'.
If the server crashes, meaning we lost those deletes because the translog
wasn't fsynced to disk, then a thread can run on startup to recheck
ttl-deletes.
This will make it so the delete-query comes "free" in disk-fsync on
translog.
Makes sense ?


>
> : "For example, with the configuration below the
> : DocExpirationUpdateProcessorFactory will create a timer thread that
> wakes
> : up every 30 seconds. When the timer triggers, it will execute a
> : *deleteByQuery* command to *remove any documents* with a value in the
> : press_release_expiration_date field value that is in the past "
>
> that document is describing a *logical* deletion as i mentioned before --
> the documents are "removed" in the sense that they are flaged "not alive"
> won't be included in future searches, but the data still lives in the
> segements on disk until a future merge.  (That is end user documentation,
> focusing on the effects as percieved by clients -- the concept of "delete"
> from a low level storage implementation is a much more involved concept
> that affects any discussion of "deleting" documents in solr, not just TTL
> based deletes)
>
> : > 1) nothing would ensure that docs *ever* get removed during perioids
> when
> : > docs aren't being added (thus no new segments, thus no merging)
> : >
> : This can be done with a periodic/smart thread that wakes up every 'ttl'
> and
> : checks min-max (or histogram) of timestamps on segments. If there are a
> : lot, do merge (or just delete the whole dead segment). At least that's
> how
> : those systems do it.
>
> OK -- with lucene/solr today we have the ConcurrentMergeScheduler which
> will watch for segments that have many (logically deleted) documents
> flaged "not alive" and will proactively merge those segments when the
> number of docs is above some configured/default threshold -- but to
> automatically flag those documents as "deleted" you need something like
> what solr is doing today.
>
I knew it checks "should we be merging". This would just be another clause.

>
>
> Again: i really feel like the only disconnect here is terminology.
>
> You're describing a background thread that wakes up periodically, scans
> the docs in each segment to see if they have an expire field > $now, and
> based on the size of the set of matches merges some segments and expunges
> the docs that were in that set.  For segments that aren't merged, docs
> stay put and are excluded from queries only by filters specified at
> request time.
>
> What Solr/Lucene has are 2 background threads: one wakes up periodically,
> scans the docs in the index to see if the expire field > $now and if so
> flags them as being "not alive" so they don't match queries at request
> time. A second thread chegks each segment to see how many docs are marked
> "not alive" -- either by the previous thread or by some other form of
> (logical) deletion -- and merges some of those segments, expunging the
> docs that were marked "not alive".  For segments that aren't merged, the
> "not alive" docs are still in the segment, but the "not alive" flag
> automatically excludes them from queries.
>
Yes I knew it functions that way.
The ~whole~ misunderstanding, is that the delete is more efficient than I
thought. The whole reason why the other storage engines did it "the other
way" is because of the efficiency of the delete on those engines.

>
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: ttl on merge-time possible somehow ?

2016-12-16 Thread Chris Hostetter

: > lucene, something has to "mark" the segements as deleted in order for them
...
: Note, it doesn't mark the "segment", it marks the "document".

correct, typo on my part -- sorry.

: > The disatisfaction you expressed with this approach confuses me...
: >
: Really ?
: If you have many expiring docs

...you didn't seem to finish that thought so i'm still not really sure 
what your're suggestion is in terms of why an alternative would be more 
efficient.

: "For example, with the configuration below the
: DocExpirationUpdateProcessorFactory will create a timer thread that wakes
: up every 30 seconds. When the timer triggers, it will execute a
: *deleteByQuery* command to *remove any documents* with a value in the
: press_release_expiration_date field value that is in the past "

that document is describing a *logical* deletion as i mentioned before -- 
the documents are "removed" in the sense that they are flaged "not alive" 
won't be included in future searches, but the data still lives in the 
segements on disk until a future merge.  (That is end user documentation, 
focusing on the effects as percieved by clients -- the concept of "delete" 
from a low level storage implementation is a much more involved concept 
that affects any discussion of "deleting" documents in solr, not just TTL 
based deletes)

: > 1) nothing would ensure that docs *ever* get removed during perioids when
: > docs aren't being added (thus no new segments, thus no merging)
: >
: This can be done with a periodic/smart thread that wakes up every 'ttl' and
: checks min-max (or histogram) of timestamps on segments. If there are a
: lot, do merge (or just delete the whole dead segment). At least that's how
: those systems do it.

OK -- with lucene/solr today we have the ConcurrentMergeScheduler which 
will watch for segments that have many (logically deleted) documents 
flaged "not alive" and will proactively merge those segments when the 
number of docs is above some configured/default threshold -- but to 
automatically flag those documents as "deleted" you need something like 
what solr is doing today.


Again: i really feel like the only disconnect here is terminology.

You're describing a background thread that wakes up periodically, scans 
the docs in each segment to see if they have an expire field > $now, and 
based on the size of the set of matches merges some segments and expunges 
the docs that were in that set.  For segments that aren't merged, docs 
stay put and are excluded from queries only by filters specified at 
request time.

What Solr/Lucene has are 2 background threads: one wakes up periodically, 
scans the docs in the index to see if the expire field > $now and if so 
flags them as being "not alive" so they don't match queries at request 
time. A second thread chegks each segment to see how many docs are marked 
"not alive" -- either by the previous thread or by some other form of 
(logical) deletion -- and merges some of those segments, expunging the 
docs that were marked "not alive".  For segments that aren't merged, the 
"not alive" docs are still in the segment, but the "not alive" flag 
automatically excludes them from queries.



-Hoss
http://www.lucidworks.com/


Re: ttl on merge-time possible somehow ?

2016-12-16 Thread Dorian Hoxha
On Fri, Dec 16, 2016 at 10:53 PM, Chris Hostetter 
wrote:

>
> : Yep, that's what came in my search. See how TTL work in hbase/cassandra/
> : rocksdb . There
> : isn't a "delete old docs"query, but old docs are deleted by the storage
> : when merging. Looks like this needs to be a lucene-module which can then
> be
> : configured by solr ?
> ...
> : Just like in hbase,cassandra,rocksdb, when you "select" a row/document
> that
> : has expired, it exists on the storage, but isn't returned by the db,
>
>
> What you're describing is exactly how segment merges work in Lucene, it's
> just a question of terminology.
>
> In Lucene, "deleting" a document is a *logical* operation, the data still
> lives in the (existing) segments but the affected docs are recorded in a
> list of deletions (and automatically excluded from future searchers that
> are opened against them) ... once the segments are merged then the deleted
> documents are "expunged" rather then being copied over to the new
> segments.
>
> Where this diverges from what you describe is that as things stand in
> lucene, something has to "mark" the segements as deleted in order for them
> to later be expunged -- in Solr right now is the code in question that
> does this via (internal) DBQ.
>
Note, it doesn't mark the "segment", it marks the "document".

>
> The disatisfaction you expressed with this approach confuses me...
>
Really ?
If you have many expiring docs

>
> >> I did some search for TTL on solr, and found only a way to do it with a
> >> delete-query. But that ~sucks, because you have to do a lot of inserts
> >> (and queries).
>
> ...nothing about this approach does any "inserts" (or queries -- unless
> you mean the DBQ itself?) so w/o more elaboration on what exactly you find
> problematic about this approach, it's hard to make any sense of your
> objection or request for an alternative.
>
"For example, with the configuration below the
DocExpirationUpdateProcessorFactory will create a timer thread that wakes
up every 30 seconds. When the timer triggers, it will execute a
*deleteByQuery* command to *remove any documents* with a value in the
press_release_expiration_date field value that is in the past "


>
> With all those caveats out of the way...
>
> What you're ultimately requesting -- new code that hooks into segment
> merging to exclude "expired" documents from being copied into the the new
> merged segments --- should be theoretically possible with a custom
> MergePolicy, but I don't really see how it would be better then the
> current approach in typically use cases (ie: i want docs excluded from
> results after the expiration date is reached, with a min tollerance of
> X) ...
>
I mentioned that the client would also make a range-query since expired
documents in this case would still be indexed.

>
> 1) nothing would ensure that docs *ever* get removed during perioids when
> docs aren't being added (thus no new segments, thus no merging)
>
This can be done with a periodic/smart thread that wakes up every 'ttl' and
checks min-max (or histogram) of timestamps on segments. If there are a
lot, do merge (or just delete the whole dead segment). At least that's how
those systems do it.

>
> 2) as you described, query clients would be required to specify date range
> filters on every query to identify the "logically live docs at this
> moment" on a per-request basis -- something that's far less efficient from
> a cachng standpoint then letting the system do a DBQ on the backened to
> affect the *global* set of logically live docs at the index level.
>
This makes sense. Deleted docs-ids is cached better than the range-query
that I said.

>
>
> Frankly: It seems to me that you've looked at how other non-lucene based
> systems X & Y handle TTL type logic and decided that's the best possible
> solution therefore the solution used by Solr "sucks" w/o taking into
> account that what's efficient in the underlying Lucene storage
> implementation might just be diff then what's efficient in the underlying
> storage implementation of X & Y.
>
Yes.

>
> If you'd like to tackle implementing TTL as a lower level primitive
> concept in Lucene, then by all means be my guest -- but personally i
> don't think you're going to find any real perf improvements in an
> approach like you describe compared to what we offer today.  i look
> forward to being proved wrong.
>
Since the implementation is apparently more efficient than I thought I'm
gonna leave it.

>
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: ttl on merge-time possible somehow ?

2016-12-16 Thread Chris Hostetter

: Yep, that's what came in my search. See how TTL work in hbase/cassandra/
: rocksdb . There
: isn't a "delete old docs"query, but old docs are deleted by the storage
: when merging. Looks like this needs to be a lucene-module which can then be
: configured by solr ?
...
: Just like in hbase,cassandra,rocksdb, when you "select" a row/document that
: has expired, it exists on the storage, but isn't returned by the db,


What you're describing is exactly how segment merges work in Lucene, it's 
just a question of terminology.

In Lucene, "deleting" a document is a *logical* operation, the data still 
lives in the (existing) segments but the affected docs are recorded in a 
list of deletions (and automatically excluded from future searchers that 
are opened against them) ... once the segments are merged then the deleted 
documents are "expunged" rather then being copied over to the new 
segments.

Where this diverges from what you describe is that as things stand in 
lucene, something has to "mark" the segements as deleted in order for them 
to later be expunged -- in Solr right now is the code in question that 
does this via (internal) DBQ.

The disatisfaction you expressed with this approach confuses me...

>> I did some search for TTL on solr, and found only a way to do it with a
>> delete-query. But that ~sucks, because you have to do a lot of inserts 
>> (and queries).

...nothing about this approach does any "inserts" (or queries -- unless 
you mean the DBQ itself?) so w/o more elaboration on what exactly you find 
problematic about this approach, it's hard to make any sense of your 
objection or request for an alternative.


With all those caveats out of the way...

What you're ultimately requesting -- new code that hooks into segment 
merging to exclude "expired" documents from being copied into the the new 
merged segments --- should be theoretically possible with a custom 
MergePolicy, but I don't really see how it would be better then the 
current approach in typically use cases (ie: i want docs excluded from 
results after the expiration date is reached, with a min tollerance of 
X) ...

1) nothing would ensure that docs *ever* get removed during perioids when 
docs aren't being added (thus no new segments, thus no merging)

2) as you described, query clients would be required to specify date range 
filters on every query to identify the "logically live docs at this 
moment" on a per-request basis -- something that's far less efficient from 
a cachng standpoint then letting the system do a DBQ on the backened to 
affect the *global* set of logically live docs at the index level.


Frankly: It seems to me that you've looked at how other non-lucene based 
systems X & Y handle TTL type logic and decided that's the best possible 
solution therefore the solution used by Solr "sucks" w/o taking into 
account that what's efficient in the underlying Lucene storage 
implementation might just be diff then what's efficient in the underlying 
storage implementation of X & Y.

If you'd like to tackle implementing TTL as a lower level primitive 
concept in Lucene, then by all means be my guest -- but personally i 
don't think you're going to find any real perf improvements in an 
approach like you describe compared to what we offer today.  i look 
forward to being proved wrong.



-Hoss
http://www.lucidworks.com/


Re: ttl on merge-time possible somehow ?

2016-12-16 Thread Dorian Hoxha
Well there is a reason why they all do it that way.

I'm gonna guess that the reason lucene does it this way is because it keeps
a 'deleted docs bitset', which should act like a filter, which is not as
slow as doing a full-delete/insert like in the other dbs that I mentioned.

Thanks Shawn.

On Fri, Dec 16, 2016 at 9:57 PM, Shawn Heisey  wrote:

> On 12/16/2016 1:12 PM, Dorian Hoxha wrote:
> > Shawn, I know how it works, I read the blog post. But I don't want it
> > that
> > way. So how to do it my way? Like a custom merge function on lucene or
> > something else ?
>
> A considerable amount of custom coding.
>
> At a minimum, you'd have to write your own implementations of some
> Lucene classes and probably some Solr classes.  This sort of integration
> might also require changes to the upstream Lucene/Solr source code.  I
> doubt there would be enough benefit (either performance or anything
> else) to be worth the time and energy required.  If Lucene-level support
> would have produced a demonstrably better expiration feature, it would
> have been implemented that way.
>
> If you're *already* an expert in Lucene/Solr code, then it might be a
> fun intellectual exercise, but such a large-scale overhaul of an
> existing feature that works well is not something I would try to do.
>
> Thanks,
> Shawn
>
>


Re: ttl on merge-time possible somehow ?

2016-12-16 Thread Shawn Heisey
On 12/16/2016 1:12 PM, Dorian Hoxha wrote:
> Shawn, I know how it works, I read the blog post. But I don't want it
> that
> way. So how to do it my way? Like a custom merge function on lucene or
> something else ?

A considerable amount of custom coding.

At a minimum, you'd have to write your own implementations of some
Lucene classes and probably some Solr classes.  This sort of integration
might also require changes to the upstream Lucene/Solr source code.  I
doubt there would be enough benefit (either performance or anything
else) to be worth the time and energy required.  If Lucene-level support
would have produced a demonstrably better expiration feature, it would
have been implemented that way.

If you're *already* an expert in Lucene/Solr code, then it might be a
fun intellectual exercise, but such a large-scale overhaul of an
existing feature that works well is not something I would try to do.

Thanks,
Shawn



Re: ttl on merge-time possible somehow ?

2016-12-16 Thread Dorian Hoxha
On Fri, Dec 16, 2016 at 8:11 PM, Shawn Heisey  wrote:

> On 12/16/2016 11:13 AM, Dorian Hoxha wrote:
> > Yep, that's what came in my search. See how TTL work in hbase/cassandra/
> > rocksdb . There
> > isn't a "delete old docs"query, but old docs are deleted by the
> > storage when merging. Looks like this needs to be a lucene-module
> > which can then be configured by solr ?
>
> No.  Lucene doesn't know about expiration and doesn't need to know about
> expiration.
>
It needs to know or else it will be ~non efficient in my case.

>
> The document expiration happens in Solr.  In the background, Solr
> finds/deletes old documents in the Lucene index according to how the
> expiration feature is configured.  What happens after that is basic
> Lucene operation.  If you index enough new data to trigger a merge (or
> if you do an optimize/forceMerge), then Lucene will get rid of deleted
> documents in the merged segments.  The contents of the documents in your
> index (whether that's a timestamp or something else) are completely
> irrelevant for decisions made during Lucene's segment merging.
>
Shawn, I know how it works, I read the blog post. But I don't want it that
way.
So how to do it my way? Like a custom merge function on lucene or something
else ?

>
> > Just like in hbase,cassandra,rocksdb, when you "select" a row/document
> > that has expired, it exists on the storage, but isn't returned by the
> > db, because it checks the timestamp and sees that it's expired. Looks
> > like this also need to be in lucene?
>
> That's pretty much how Lucene (and by extension, Solr) works, except
> it's not related to expiration, it is *deleted* documents that don't
> show up in the results.
>
No it doesn't. But I want expirations to function that way. Just like you
have "custom update processors", there should be a similar way for get (so
on my custom-get-processor, I check the timestamp and return NotFound if
it's expired)

>
> Thanks,
> Shawn
>
> Makes sense ?


Re: ttl on merge-time possible somehow ?

2016-12-16 Thread Shawn Heisey
On 12/16/2016 11:13 AM, Dorian Hoxha wrote:
> Yep, that's what came in my search. See how TTL work in hbase/cassandra/
> rocksdb . There
> isn't a "delete old docs"query, but old docs are deleted by the
> storage when merging. Looks like this needs to be a lucene-module
> which can then be configured by solr ? 

No.  Lucene doesn't know about expiration and doesn't need to know about
expiration.

The document expiration happens in Solr.  In the background, Solr
finds/deletes old documents in the Lucene index according to how the
expiration feature is configured.  What happens after that is basic
Lucene operation.  If you index enough new data to trigger a merge (or
if you do an optimize/forceMerge), then Lucene will get rid of deleted
documents in the merged segments.  The contents of the documents in your
index (whether that's a timestamp or something else) are completely
irrelevant for decisions made during Lucene's segment merging.

> Just like in hbase,cassandra,rocksdb, when you "select" a row/document
> that has expired, it exists on the storage, but isn't returned by the
> db, because it checks the timestamp and sees that it's expired. Looks
> like this also need to be in lucene?

That's pretty much how Lucene (and by extension, Solr) works, except
it's not related to expiration, it is *deleted* documents that don't
show up in the results.

Thanks,
Shawn



Re: ttl on merge-time possible somehow ?

2016-12-16 Thread Dorian Hoxha
On Fri, Dec 16, 2016 at 4:42 PM, Shawn Heisey  wrote:

> On 12/16/2016 12:54 AM, Dorian Hoxha wrote:
> > I did some search for TTL on solr, and found only a way to do it with
> > a delete-query. But that ~sucks, because you have to do a lot of
> > inserts (and queries).
>
> You're going to have to be very specific about what you want Solr to do.
>
> > The other(kinda better) way to do it, is to set a collection-level
> > ttl, and when indexes are merged, they will drop the documents that
> > have expired in the new merged segment. On the client, I will make
> > sure to do date-range queries so I don't get back old documents. So:
> > 1. is there a way to easily modify the segment-merger (or better way?)
> > to do that ?
>
> Does the following describe the the feature you're after?
>
> https://lucidworks.com/blog/2014/05/07/document-expiration/
>
> If this is what you're after, this is *Solr* functionality.  Segment
> merging is *Lucene* functionality.  Lucene cannot remove documents
> during merge until they have been deleted.  It is Solr that handles
> deleting documents after they expire.  Lucene is not aware of the
> expiration concept.
>
Yep, that's what came in my search. See how TTL work in hbase/cassandra/
rocksdb . There
isn't a "delete old docs"query, but old docs are deleted by the storage
when merging. Looks like this needs to be a lucene-module which can then be
configured by solr ?


> > 2. is there a way to support this also on get ? looks like I can use
> > realtimeget + filter query and it should work based on documentation
>
> Realtime get allows you to retrieve documents that have been indexed but
> not yet committed.  I doubt that deleted documents or document
> expiration affects RTG at all.  We would need to know exactly what you
> want to get working here before we can say whether or not you're right
> when you say "it should work."
>
Just like in hbase,cassandra,rocksdb, when you "select" a row/document that
has expired, it exists on the storage, but isn't returned by the db,
because it checks the timestamp and sees that it's expired. Looks like this
also need to be in lucene?

>
> Thanks,
> Shawn
>
> Makes more sense ?


Re: ttl on merge-time possible somehow ?

2016-12-16 Thread Shawn Heisey
On 12/16/2016 12:54 AM, Dorian Hoxha wrote:
> I did some search for TTL on solr, and found only a way to do it with
> a delete-query. But that ~sucks, because you have to do a lot of
> inserts (and queries). 

You're going to have to be very specific about what you want Solr to do.

> The other(kinda better) way to do it, is to set a collection-level
> ttl, and when indexes are merged, they will drop the documents that
> have expired in the new merged segment. On the client, I will make
> sure to do date-range queries so I don't get back old documents. So:
> 1. is there a way to easily modify the segment-merger (or better way?)
> to do that ? 

Does the following describe the the feature you're after?

https://lucidworks.com/blog/2014/05/07/document-expiration/

If this is what you're after, this is *Solr* functionality.  Segment
merging is *Lucene* functionality.  Lucene cannot remove documents
during merge until they have been deleted.  It is Solr that handles
deleting documents after they expire.  Lucene is not aware of the
expiration concept.

> 2. is there a way to support this also on get ? looks like I can use
> realtimeget + filter query and it should work based on documentation

Realtime get allows you to retrieve documents that have been indexed but
not yet committed.  I doubt that deleted documents or document
expiration affects RTG at all.  We would need to know exactly what you
want to get working here before we can say whether or not you're right
when you say "it should work."

Thanks,
Shawn



ttl on merge-time possible somehow ?

2016-12-15 Thread Dorian Hoxha
Hello searchers,

I did some search for TTL on solr, and found only a way to do it with a
delete-query. But that ~sucks, because you have to do a lot of inserts (and
queries).

The other(kinda better) way to do it, is to set a collection-level ttl, and
when indexes are merged, they will drop the documents that have expired in
the new merged segment. On the client, I will make sure to do date-range
queries so I don't get back old documents.

So:
1. is there a way to easily modify the segment-merger (or better way?) to
do that ?
2. is there a way to support this also on get ? looks like I can use
realtimeget + filter query and it should work based on documentation

Thank You