tgravescs commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-658175833
looks good
This is an automated message from the Apache Git Service.
To respond to the message, please log on
tgravescs commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-656819037
test this please
This is an automated message from the Apache Git Service.
To respond to the message, please
tgravescs commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-656690866
test this please
This is an automated message from the Apache Git Service.
To respond to the message, please
tgravescs commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-656148922
yeah 10x definitely seems safe as most of the number are more at the 8x
number for zstd. I'm fine with leaving the current logic for the small files,
we can always follow up
tgravescs commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-655129919
could you run just a couple of the same test with Zstd ( you don't need to
run all those combinations)?
Note thanks again for running all those, I know it takes time.
tgravescs commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-655127979
@baohe-zhang what compression codec was used in the latest numbers?
This is an automated message from the
tgravescs commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-654856475
that is correct.
This is an automated message from the Apache Git Service.
To respond to the message,
tgravescs commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-654838940
thanks, for testing those combinations, that shows the number of jobs
doesn't affect the memory usage much, but if I'm not mistaken in all the cases
you have 400,000 tasks.
tgravescs commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-654835055
test this please
This is an automated message from the Apache Git Service.
To respond to the message, please
tgravescs commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-654344387
thanks for getting the numbers. yeah in general I agree, I would rather
overestimate the memory usage as GC and OOM are bad. in practice if people find
it using way less
tgravescs commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-624070783
concept seems good, a couple of high level questions without me having
looked at the detailed code.
I assume the number of threads to read is still
11 matches
Mail list logo