Github user ConeyLiu commented on the issue:
https://github.com/apache/spark/pull/19135
@cloud-fan Very sorry to reply so late. I updated the code followed your
suggestion. Does this need performance test? If needed, I will test it late.
---
-
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19135
is it better to do batch unrolling? i.e., we can check memory usage and
request memory for like every 10 records, instead of doing it for every record.
---
--
Github user ConeyLiu commented on the issue:
https://github.com/apache/spark/pull/19135
@jiangxb1987 Ok, I can test it later. The following picture is when I run
kmeans and put the source data into the offheap memory, and you can see the CPU
time occupied by `reserveUnrollMemoryForThi
Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/19135
It would be great to test the perf on executors with various
corsPerExecutor settings to ensure we don't bring in regressions by the code
change.
---
-
Github user ConeyLiu commented on the issue:
https://github.com/apache/spark/pull/19135
hi @cloud-fan, The previous writing is the same as `putIteratorAsValues`.
Now I have modified the code, each application for an additional `chunkSize`
bytes of memory, because the size of `ChunkedB
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19135
Does this patch has regressions? It seems to me that allocating more memory
may starve other tasks/operators and reduce the overall performance.
---
-
Github user ConeyLiu commented on the issue:
https://github.com/apache/spark/pull/19135
Firstly, Serialization time did not take a long time. You can see follow:
https://user-images.githubusercontent.com/12733256/30067330-1596eb1e-928d-11e7-818a-4a292e601a26.png";>
Secondl
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19135
Sorry I'm not so familiar with this part, but from the test result seems
that the performance just improved a little. I would doubt the way you generate
RDD `0 until Integer.MAX_VALUE` might take
Github user ConeyLiu commented on the issue:
https://github.com/apache/spark/pull/19135
Here
https://github.com/apache/spark/pull/19135/files?diff=unified#diff-870cd3693df7a5add2ac3119d7d91d34L373,
we call `reserveAdditionalMemoryIfNecessary()` for every record write.
---
-
Github user ConeyLiu commented on the issue:
https://github.com/apache/spark/pull/19135
Hi, @cloud-fan @jerryshao Would you mind take a look ? Thanks a lot
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apac
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19135
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
11 matches
Mail list logo