If your use case can be addressed by adding session option to skip the
checks, that
would be simpler to do and it can be done much faster.
Adding TTL support would be more complex.
I will let someone else comment on long term plans as I don't know the
details.
Thanks
Padma
On Thu, Aug 2,
Hello,
"I think the simplest thing that should be done first is to provide option
to skip the check"
I agree that whatever we do, we should not introduce any change in user
experience by default.
But since the default's behaviour is to not set any TTL in the meta-data, I
have conflicted feelings
I think the simplest thing that should be done first is to provide option
to skip the check.
The default behavior for that option will be what we do today i.e. check
root directory
and all sub directories underneath.
Thanks
Padma
On Mon, Jul 30, 2018 at 3:01 AM, Joel Pfaff wrote:
> Hello,
>
Hello,
Thanks a lot for all these feedbacks, trying to respond to everything below:
@Parth:
"I don't think we would want to maintain a TTL for the metadata store so
introducing one now would mean that we might break backward compatibility
down the road."
Yes, I am aware of this activity
Hi Joel,
This is my understanding:
We have list of all directories (i.e. all subdirectories and their
subdirectories etc.) in the metadata
cache file of each directory. We go through that list of directories and
check
directory modification time against modification time of metadata cache
file in
I believe Vitalii is actively looking at a more robust metadata store
strategy for Drill and in the long term we would want to move all metadata
to the new store. I don't think we would want to maintain a TTL for the
metadata store so introducing one now would mean that we might break
backward
Hello,
Thanks for the feedback.
The logic I had in mind was to add the TTL, as a refresh_interval field in
the root metadata file.
At each query, the current time would be compared to the addition of the
modification time of the root metadata file and the refresh_interval.
If the current time
Hi Joel,
Sounds reasonable.
But if Drill checks this TTL property from metadata cache file for every
query and for every file instead of file timestamp, it will not give the
benefit.
I suppose we can add this TTL property to only root metadata cache file and
check it only once per query.
Could
Hello,
Today, on a table for which we have created statistics (through the REFRESH
TABLE METADATA command), Drill validates the timestamp of
every files or directory involved in the scan.
If the timestamps of the files are greater than the one of the metadata
file, then a re-regeneration of the