[jira] [Commented] (HUDI-818) Optimize the default value of hoodie.memory.merge.max.size option

2021-06-23 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368479#comment-17368479
 ] 

Vinoth Chandar commented on HUDI-818:
-

[~rmahindra] seems like we are good here, already. Close this one out?

> Optimize the default value of hoodie.memory.merge.max.size option
> -
>
> Key: HUDI-818
> URL: https://issues.apache.org/jira/browse/HUDI-818
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 0.9.0
>Reporter: lamber-ken
>Assignee: Rajesh Mahindra
>Priority: Blocker
>  Labels: help-requested, sev:high, user-support-issues
> Fix For: 0.9.0
>
>
> The default value of hoodie.memory.merge.max.size option is incapable of 
> meeting their performance requirements
> [https://github.com/apache/incubator-hudi/issues/1491]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-818) Optimize the default value of hoodie.memory.merge.max.size option

2021-06-11 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361999#comment-17361999
 ] 

Vinoth Chandar commented on HUDI-818:
-

https://github.com/apache/hudi/issues/1491#issuecomment-615141491 this comment 
is relevant, so if there is a lot of data merging against the file, the 
spilling may become an issue?

> Optimize the default value of hoodie.memory.merge.max.size option
> -
>
> Key: HUDI-818
> URL: https://issues.apache.org/jira/browse/HUDI-818
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 0.9.0
>Reporter: lamber-ken
>Assignee: Rajesh Mahindra
>Priority: Blocker
>  Labels: help-requested, sev:high, user-support-issues
> Fix For: 0.9.0
>
>
> The default value of hoodie.memory.merge.max.size option is incapable of 
> meeting their performance requirements
> [https://github.com/apache/incubator-hudi/issues/1491]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-818) Optimize the default value of hoodie.memory.merge.max.size option

2021-06-11 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361995#comment-17361995
 ] 

Vinoth Chandar commented on HUDI-818:
-

Could we quickly try out the rocksDB map to see how that looks? on st1 and nvme?

> Optimize the default value of hoodie.memory.merge.max.size option
> -
>
> Key: HUDI-818
> URL: https://issues.apache.org/jira/browse/HUDI-818
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 0.9.0
>Reporter: lamber-ken
>Assignee: Rajesh Mahindra
>Priority: Blocker
>  Labels: help-requested, sev:high, user-support-issues
> Fix For: 0.9.0
>
>
> The default value of hoodie.memory.merge.max.size option is incapable of 
> meeting their performance requirements
> [https://github.com/apache/incubator-hudi/issues/1491]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-818) Optimize the default value of hoodie.memory.merge.max.size option

2021-06-11 Thread Rajesh Mahindra (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361972#comment-17361972
 ] 

Rajesh Mahindra commented on HUDI-818:
--

Benchmarks results across EMR node with both SDD and HDD below. tl;dr: Do not 
see any significant regressions/ unexpected spikes in latencies for spillable 
map, that may require immediate attn. 

 

Case 1: Benchmark results with EMR m5.xlarge
4 vCore, 16 GiB memory, EBS only storage
EBS Storage:2000 GiB with ST1 HDD storage
---

THROUGHPUT using dd:
---

[hadoop@ip-172-31-26-21 hudi]$ dd if=/dev/zero of=/mnt/test bs=512 count=1 
oflag=direct
1+0 records in
1+0 records out
512 bytes (5.1 MB) copied, 35.8048 s, 143 kB/s


[hadoop@ip-172-31-26-21 ~]$ dd if=/dev/zero of=/mnt/test bs=1K count=1 
oflag=direct
1+0 records in
1+0 records out
1024 bytes (10 MB) copied, 33.2558 s, 308 kB/s


[hadoop@ip-172-31-26-21 ~]$ dd if=/dev/zero of=/mnt/test bs=1M count=1 
oflag=direct
1+0 records in
1+0 records out
1048576 bytes (10 GB) copied, 42.2197 s, 248 MB/s


LATENCY using IOPING: 
--

FOR 512 Bytes block size
[hadoop@ip-172-31-26-21 hudi]$ sudo ~/ioping-0.8/ioping -R /dev/nvme1n1p2 -s 
512 -w 120
--- /dev/nvme1n1p2 (device 1.9 TiB) ioping statistics ---
61.5 k requests completed in 2.0 min, 512 iops, 256.2 KiB/s
min/avg/max/mdev = 1 us / 2.0 ms / 34.6 ms / 2.2 ms

FOR 4K block size
[hadoop@ip-172-31-26-21 ~]$ sudo ./ioping-0.8/ioping -R /dev/nvme1n1p2 -s 4K -w 
120
--- /dev/nvme1n1p2 (device 1.9 TiB) ioping statistics ---
61.7 k requests completed in 2.0 min, 515 iops, 2.0 MiB/s
min/avg/max/mdev = 176 us / 1.9 ms / 31.9 ms / 2.1 ms

BENCHMARKING WITH LOAD OF GET AND PUT (Code written in 
org.apache.hudi.common.util.collection.TestExternalSpillableMap):

2 RUNS with 5M records of 500B each:

GET MEM: \{0=860225, 1=485} 
GET DISK: \{128=1, 0=4033664, 65=1, 129=1, 1=105603, 99=1, 5=1, 199=1, 44=1, 
77=1, 16=1, 145=1, 117=3, 118=1, 123=1, 124=3, 221=1, 125=2, 126=1, 30=1, 31=1}

PUT MEM: \{0=859029, 1=423} 
PUT DISK: \{0=4108753, 1=31712, 130=1, 131=2, 128=4, 129=3, 3588=1, 133=2, 
136=1, 139=1, 142=1, 144=1, 145=1, 20=1, 21=1, 152=1, 153=1, 157=1, 3621=1, 
37=2, 44=1, 172=1, 49=1, 50=1, 54=1, 55=1, 60=1, 61=1, 68=1, 70=1, 71=1, 78=1, 
209=1, 82=1, 83=1, 85=1, 89=1, 93=1, 226=1, 101=1, 108=1, 109=3, 111=1, 112=1, 
113=2, 114=1, 116=2, 117=2, 118=3, 119=2, 120=1, 121=1, 122=3, 124=3, 125=3, 
126=2, 127=7}


GET MEM: \{0=860207, 1=668, 3=1, 5=1} 
GET DISK: \{0=3988026, 1=150580, 2=185, 3=104, 4=61, 5=68, 6=27, 7=19, 8=10, 
9=9, 10=7, 11=4, 12=1, 204=1, 13=2, 15=2, 146=1, 18=1, 19=1, 21=1, 150=1, 
155=1, 226=1, 165=1, 230=1, 169=1, 44=1, 239=1, 114=1, 179=1, 253=1, 190=1, 
255=1, 191=1}

PUT MEM: \{0=860348, 1=614, 9=1}

PUT DISK: \{0=4084431, 1=54357, 129=1, 130=1, 2=65, 3=31, 4=23, 261=1, 5=23, 
6=9, 7=9, 8=2, 265=1, 9=4, 10=1, 139=1, 11=1, 12=3, 140=1, 14=2, 270=1, 144=1, 
17=1, 273=1, 145=1, 146=3, 147=3, 20=1, 21=2, 150=1, 280=1, 155=2, 156=1, 
285=1, 287=1, 163=1, 169=1, 170=3, 171=2, 172=2, 173=1, 176=1, 178=1, 180=1, 
181=1, 182=1, 183=2, 314=1, 187=1, 316=1, 191=1, 192=1, 4803=1, 197=1, 202=1, 
75=1, 208=1, 209=1, 84=1, 213=1, 214=1, 223=2, 224=1, 225=1, 227=1, 228=1, 
101=1, 232=1, 237=1, 238=1, 240=1, 242=1, 243=1, 372=1, 245=1, 247=1, 248=1, 
250=1, 254=1}


Case 1: Benchmark results with EMR m5.xlarge
4 vCore, 16 GiB memory, EBS only storage
EBS Storage:2000 GiB with GP2 SDD storage
---

THROUGHPUT using dd:
---
[hadoop@ip-172-31-30-32 hudi]$ dd if=/dev/zero of=/mnt/test bs=512 count=1 
oflag=direct
1+0 records in
1+0 records out
512 bytes (5.1 MB) copied, 8.11925 s, 631 kB/s

[hadoop@ip-172-31-30-32 ~]$ dd if=/dev/zero of=/mnt/test bs=1K count=10 
oflag=direct
10+0 records in
10+0 records out
10240 bytes (102 MB) copied, 85.7164 s, 1.2 MB/s

[hadoop@ip-172-31-30-32 mnt]$ dd if=/dev/zero of=/mnt/test bs=1M count=1 
oflag=direct
1+0 records in
1+0 records out
1048576 bytes (10 GB) copied, 88.494 s, 118 MB/s


LATENCY using IOPING: 
-
For 512 Bytes block size
[hadoop@ip-172-31-30-32 hudi]$ sudo ~/ioping-0.8/ioping -R /dev/nvme1n1p2 -s 
512 -w 120
--- /dev/nvme1n1p2 (device 1.9 TiB) ioping statistics ---
227.7 k requests completed in 2.0 min, 1.9 k iops, 950.8 KiB/s
min/avg/max/mdev = 2 us / 525 us / 19.1 ms / 506 us

For 4K block size
[hadoop@ip-172-31-30-32 ~]$ sudo ./ioping-0.8/ioping -R /dev/nvme1n1p2 -s 4K -w 
120
--- /dev/nvme1n1p2 (device 1.9 TiB) ioping statistics ---
223.4 k requests completed in 2.0 min, 2.0 k iops, 7.6 MiB/s
min/avg/max/mdev = 127 us / 511 us / 35.0 ms 

[jira] [Commented] (HUDI-818) Optimize the default value of hoodie.memory.merge.max.size option

2021-06-05 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357763#comment-17357763
 ] 

Vinoth Chandar commented on HUDI-818:
-

Potentially related: 
[https://github.com/apache/hudi/issues/1552#issuecomment-617965381] 

 

> Optimize the default value of hoodie.memory.merge.max.size option
> -
>
> Key: HUDI-818
> URL: https://issues.apache.org/jira/browse/HUDI-818
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 0.9.0
>Reporter: lamber-ken
>Assignee: sivabalan narayanan
>Priority: Blocker
>  Labels: help-requested, sev:high, user-support-issues
> Fix For: 0.9.0
>
>
> The default value of hoodie.memory.merge.max.size option is incapable of 
> meeting their performance requirements
> [https://github.com/apache/incubator-hudi/issues/1491]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)