Re: Metadata management improvement

2018-08-03 Thread Padma Penumarthy
If your use case can be addressed by adding session option to skip the checks, that would be simpler to do and it can be done much faster. Adding TTL support would be more complex. I will let someone else comment on long term plans as I don't know the details. Thanks Padma On Thu, Aug 2,

Re: Metadata management improvement

2018-08-02 Thread Joel Pfaff
Hello, "I think the simplest thing that should be done first is to provide option to skip the check" I agree that whatever we do, we should not introduce any change in user experience by default. But since the default's behaviour is to not set any TTL in the meta-data, I have conflicted feelings

Re: Metadata management improvement

2018-08-01 Thread Padma Penumarthy
I think the simplest thing that should be done first is to provide option to skip the check. The default behavior for that option will be what we do today i.e. check root directory and all sub directories underneath. Thanks Padma On Mon, Jul 30, 2018 at 3:01 AM, Joel Pfaff wrote: > Hello, >

Re: Metadata management improvement

2018-07-30 Thread Joel Pfaff
Hello, Thanks a lot for all these feedbacks, trying to respond to everything below: @Parth: "I don't think we would want to maintain a TTL for the metadata store so introducing one now would mean that we might break backward compatibility down the road." Yes, I am aware of this activity

Re: Metadata management improvement

2018-07-13 Thread Padma Penumarthy
Hi Joel, This is my understanding: We have list of all directories (i.e. all subdirectories and their subdirectories etc.) in the metadata cache file of each directory. We go through that list of directories and check directory modification time against modification time of metadata cache file in

Re: Metadata management improvement

2018-07-12 Thread Parth Chandra
I believe Vitalii is actively looking at a more robust metadata store strategy for Drill and in the long term we would want to move all metadata to the new store. I don't think we would want to maintain a TTL for the metadata store so introducing one now would mean that we might break backward

Re: Metadata management improvement

2018-07-12 Thread Joel Pfaff
Hello, Thanks for the feedback. The logic I had in mind was to add the TTL, as a refresh_interval field in the root metadata file. At each query, the current time would be compared to the addition of the modification time of the root metadata file and the refresh_interval. If the current time

Re: Metadata management improvement

2018-07-12 Thread Vitalii Diravka
Hi Joel, Sounds reasonable. But if Drill checks this TTL property from metadata cache file for every query and for every file instead of file timestamp, it will not give the benefit. I suppose we can add this TTL property to only root metadata cache file and check it only once per query. Could

Metadata management improvement

2018-07-12 Thread Joel Pfaff
Hello, Today, on a table for which we have created statistics (through the REFRESH TABLE METADATA command), Drill validates the timestamp of every files or directory involved in the scan. If the timestamps of the files are greater than the one of the metadata file, then a re-regeneration of the