dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-661344352


   @hvanhovell . The following is complete wrong because the above optimization 
was one of the recommendations for many Hortonworks customers to save their 
HDFS usage. I knew many production usages like that. I almost forgot that, but 
it rang my head suddenly during this PR. (Sadly, after I merged this.)
   >  You are currently just lucky that the system accidentally produces a nice 
layout for you; 99% of our users won't be as lucky. The only way you can he 
sure, is when you add these things yourself.
   
   I understand your point of views fully. However, I'm wondering if you can 
persuade the customer to waste their storage by generating 160x bigger files 
(SPARK-32318). Do you think you can?
   
   ```
   -rw-r--r--   1 dongjoon  wheel  939 Jul 14 22:12 
part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
   ```
   ```
   -rw-r--r--   1 dongjoon  wheel  150741 Jul 14 22:08 
part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to