lival commented on PR #39626: URL: https://github.com/apache/spark/pull/39626#issuecomment-1820568737
> InMemoryFileIndex Cache data in memory, if two long-running spark ThriftServers, both access table a, one thriftserver adds data to table a, and the other thriftserver accesses table a again, Can't read the latest data, need to refresh table. When is refresh table a problem Thank you for your reply. First of all, the InMemoryFileIndex you mentioned is the cache of file metadata, but the code we submitted this time solves the problem of automatic identification and caching of RDD partition data. The former mainly focuses on metadata management at the file level, such as quick query, obtaining file size and other operations. Our focus is data-level caching to avoid repeated data calculations and achieve fast application execution. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
