HeartSaVioR commented on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-635882784


   I've spent some time to experiment more approaches.
   
   This is the experiment branch: 
https://github.com/HeartSaVioR/spark/tree/SPARK-30946-experiments
   
   > Version 3 is only applying compaction (LZ4) on existing format. See below 
commit:
   
   
https://github.com/HeartSaVioR/spark/commit/406670aa4910c4bec847a590ef37f2f0bd130902
   
   > Version 4 is serializing/deserializing entry via DataInputStream / 
DataOutputStream
   
   
https://github.com/HeartSaVioR/spark/commit/7c665163bdd1930fb8812d46a7f8cdd599b1cafb
   
   I've also implemented simple apps to 1) prepare metadata (so that we can 
experiment on the specific batch) and 2) run simple test with various versions:
   
   
https://github.com/HeartSaVioR/spark-delegation-token-experiment/commit/bea7680e4c588f455f8c3181a96c9eff5002fa1a
   
   The numbers are recorded below:
   
   
https://docs.google.com/spreadsheets/d/1D5P103F_sKOjkDpNr9PaCC8Ehk4Y4dRtH3oEdytM4_c/edit?usp=sharing
   
   version | elapsed time | elapsed time (ratio of v1) | size | size (ratio of 
v1)
   ------- | -------------- | ------------------------- | ----- | 
-----------------
   1  | 10628.75 | 100.00% | 57265744 | 100.00%
   2 | 939.25 | 8.84% | 16655736 | 29.08%
   3 | 10116 | 95.18% | 17259852 | 30.14%
   4 | 837 | 7.87% | 15285626 | 26.69%
   
   The number represents that applying compression on existing format doesn't 
help reducing the time, while the size is reduced similar with other 
alternatives. Other alternatives directly integrated to the data structure 
greatly reduce the time, say, 10 times faster. The size of compact files are 
similar across alternatives.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to