Re: [PR] An automatic caching solution for Spark [spark]

via GitHub Tue, 21 Nov 2023 01:45:00 -0800


lival commented on PR #39626:
URL: https://github.com/apache/spark/pull/39626#issuecomment-1820568737


   > InMemoryFileIndex Cache data in memory, if two long-running spark 
ThriftServers, both access table a, one thriftserver adds data to table a, and 
the other thriftserver accesses table a again, Can't read the latest data, need 
to refresh table. When is refresh table a problem
   
   Thank you for your reply. First of all, the InMemoryFileIndex you mentioned 
is the cache of file metadata, but the code we submitted this time solves the 
problem of automatic identification and caching of RDD partition data. The 
former mainly focuses on metadata management at the file level, such as quick 
query, obtaining file size and other operations. Our focus is data-level 
caching to avoid repeated data calculations and achieve fast application 
execution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] An automatic caching solution for Spark [spark]

Reply via email to