NetsanetGeb commented on issue #714: Performance Comparison of
HoodieDeltaStreamer and DataSourceAPI
URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-521142058
@vinothchandar Thanks so much for your help. I was specifying the
parallelism in two areas which in-turn
NetsanetGeb commented on issue #714: Performance Comparison of
HoodieDeltaStreamer and DataSourceAPI
URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-513581765
@vinothchandar, yes am on slack and next week sounds good. We can do it
on Monday or Tuesday. The time
NetsanetGeb commented on issue #714: Performance Comparison of
HoodieDeltaStreamer and DataSourceAPI
URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-512140516
Yes, you can extract data from [IPUMS USA](https://usa.ipums.org/usa/) to
run the workload locally. I am
NetsanetGeb commented on issue #714: Performance Comparison of
HoodieDeltaStreamer and DataSourceAPI
URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-511349113
I changed the driver memory and number of executors to be:
spark.driver.memory = 7168m
NetsanetGeb commented on issue #714: Performance Comparison of
HoodieDeltaStreamer and DataSourceAPI
URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-510818215
The failures are:
``` org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output
location
NetsanetGeb commented on issue #714: Performance Comparison of
HoodieDeltaStreamer and DataSourceAPI
URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-510569606
**### Bench marking Hudi Upsert**
I am trying to bench mark Hudi upsert operation and the latency of
NetsanetGeb commented on issue #714: Performance Comparison of
HoodieDeltaStreamer and DataSourceAPI
URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-500112859
Thanks. This is so helpful.
This is an
NetsanetGeb commented on issue #714: Performance Comparison of
HoodieDeltaStreamer and DataSourceAPI
URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-499939181
@vinothchandar Thanks, now the performance is similar after i set the
parallelism to 2.
They have only
NetsanetGeb commented on issue #714: Performance Comparison of
HoodieDeltaStreamer and DataSourceAPI
URL: https://github.com/apache/incubator-hudi/issues/714#issuecomment-498998794
Yes They have the same amount of data at the beginning as a source input.
But in the middle there are some