Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13680 I think that it is not easy to put ```[not written, use offset3]``` with good performance. I am thinking about **two cases**. In **case 1**, my assumptions are * Do not initialize ```[offset area]``` before writing offsets for performance * Order of writing elements may not be ascending Here, the following two steps are executed. 1. writ e ```offset0``` 2. write ```offset3``` At step 2, it is not easy to determine which ```[not written ]``` fields should be filled by ```[ use offset3]```. This is because we cannot assume any values in ```[not written]```, and then hard to recognize ```[offset0]``` has been written. ``` offset: 0 1 2 3 4 init : [not written] [not written] [not written] [not written] [not written] step1: [offset0] [not written] [not written] [not written] [not written] step2: [offset0] [not written] [not written] [offset3] [not written] ``` In **case 2**, my assumptions are * Initialize ```[offset area]``` by `zero` before writing offsets. But, it may lead to performance issue. * Order of writing elements may not be ascending Here, the following two steps are executed. 1. writ e ```offset4``` and fill all of predecessor fields, which have not been written, by using `[use offset4]` 2. write ```offset2```and fill all of predecessor fields, which have not been written, by using `[use offset2]` This approach always check and fill all of predecessor fields until a field, which have been written, is found. ``` offset: 0 1 2 3 4 init : [zero ] [zero ] [zero ] [zero ] [zero ] step1: [use offset4] [use offset4] [use offset4] [use offset4] [offset4] step2: [use offset2] [use offset2] [offset2] [use offset4] [offset4] ``` What do you think?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org