Re: [DISCUSS] flatbuf footer
Thank you Micah. Will follow up on the PR. On Sun, Feb 8, 2026 at 8:31 PM Micah Kornfield wrote: > Just wanted to follow-up. I did a first pass review on the > flatbuf definitions. > > Cheers, > Micah > > On Thu, Dec 11, 2025 at 11:58 PM Alkis Evlogimenos via dev < > [email protected]> wrote: > >> PR for linking proposal here: >> https://github.com/apache/parquet-format/pull/543 >> PR for parquet footer flatbuf definition: >> https://github.com/apache/parquet-format/pull/544 >> >> On Tue, Dec 9, 2025 at 1:26 AM Julien Le Dem wrote: >> >> > Hello Alkis, >> > Do you think you could add your footer proposal to the proposals page? >> > >> > >> > >> https://github.com/apache/parquet-format/tree/master/proposals#active-proposals >> > That way it gets more visibility. >> > Cheers >> > Julien >> > >> > On Tue, Oct 21, 2025 at 11:49 AM Steve Loughran >> > >> > wrote: >> > >> > > On Mon, 20 Oct 2025 at 18:24, Ed Seidl wrote: >> > > >> > > > IIUC a flatbuffer aware decoder would read the last 36 bytes or so >> of >> > the >> > > > file and look for a known UUID along with size information. With >> this >> > it >> > > > could then read only the flatbuffer bytes. I think this would work >> as >> > > well >> > > > as current systems that prefetch some number of bytes in an attempt >> to >> > > get >> > > > the whole footer in a single get. >> > > > >> > > > Old readers, however, will have to fetch both footers, but won't >> have >> > any >> > > > additional decoding work because the new footer is a binary field >> that >> > > can >> > > > be easily skipped. >> > > > >> > > >> > > really depends what the readers do with footer prefetching. For the >> java >> > > clients >> > > >> > > >> > >1. s3a classic stream: the backwards seek() switches it to random >> IO >> > >mode, next read() from base of thrift will pull in >> > > fs.s3a.readahead.range >> > >of data No penalty >> > >2. google gs://. There's a footer cache option which will need to >> be >> > set >> > >to a larger value >> > >3. azure abfs:// there's a footer cache option which will need to >> be >> > set >> > >to a larger value >> > >4. s3a + amazon analytics stream. This stream is *parquet aware* >> and >> > >actually parses the footer to know what to predictively prefetch. >> The >> > > AWS >> > >developers do know of this work -moving to support the new footer >> > would >> > > be >> > >the ideal strategy here. >> > >5. Iceberg classic input. no idea. >> > >6. iceberg + amazon analytics. same as S3A though without some of >> the >> > >tuning we've been doing for vector reads. >> > > >> > > I wouldn't worry too much about the impact of that footer size >> increase. >> > > Some extra footer prefetch options should compensate, and once apps >> move >> > to >> > > a parquet v3 reader they've got a faster parse time. Of course, >> > ironically, >> > > read time then may dominate even more there -it'll be important to do >> > that >> > > read as efficiently as possible (use a readFully() into a buffer, not >> > lots >> > > of single byte read() calls) >> > > >> > >> >
Re: [DISCUSS] flatbuf footer
Just wanted to follow-up. I did a first pass review on the flatbuf definitions. Cheers, Micah On Thu, Dec 11, 2025 at 11:58 PM Alkis Evlogimenos via dev < [email protected]> wrote: > PR for linking proposal here: > https://github.com/apache/parquet-format/pull/543 > PR for parquet footer flatbuf definition: > https://github.com/apache/parquet-format/pull/544 > > On Tue, Dec 9, 2025 at 1:26 AM Julien Le Dem wrote: > > > Hello Alkis, > > Do you think you could add your footer proposal to the proposals page? > > > > > > > https://github.com/apache/parquet-format/tree/master/proposals#active-proposals > > That way it gets more visibility. > > Cheers > > Julien > > > > On Tue, Oct 21, 2025 at 11:49 AM Steve Loughran > > > > wrote: > > > > > On Mon, 20 Oct 2025 at 18:24, Ed Seidl wrote: > > > > > > > IIUC a flatbuffer aware decoder would read the last 36 bytes or so of > > the > > > > file and look for a known UUID along with size information. With this > > it > > > > could then read only the flatbuffer bytes. I think this would work as > > > well > > > > as current systems that prefetch some number of bytes in an attempt > to > > > get > > > > the whole footer in a single get. > > > > > > > > Old readers, however, will have to fetch both footers, but won't have > > any > > > > additional decoding work because the new footer is a binary field > that > > > can > > > > be easily skipped. > > > > > > > > > > really depends what the readers do with footer prefetching. For the > java > > > clients > > > > > > > > >1. s3a classic stream: the backwards seek() switches it to random > IO > > >mode, next read() from base of thrift will pull in > > > fs.s3a.readahead.range > > >of data No penalty > > >2. google gs://. There's a footer cache option which will need to be > > set > > >to a larger value > > >3. azure abfs:// there's a footer cache option which will need to be > > set > > >to a larger value > > >4. s3a + amazon analytics stream. This stream is *parquet aware* and > > >actually parses the footer to know what to predictively prefetch. > The > > > AWS > > >developers do know of this work -moving to support the new footer > > would > > > be > > >the ideal strategy here. > > >5. Iceberg classic input. no idea. > > >6. iceberg + amazon analytics. same as S3A though without some of > the > > >tuning we've been doing for vector reads. > > > > > > I wouldn't worry too much about the impact of that footer size > increase. > > > Some extra footer prefetch options should compensate, and once apps > move > > to > > > a parquet v3 reader they've got a faster parse time. Of course, > > ironically, > > > read time then may dominate even more there -it'll be important to do > > that > > > read as efficiently as possible (use a readFully() into a buffer, not > > lots > > > of single byte read() calls) > > > > > >
Re: [DISCUSS] flatbuf footer
PR for linking proposal here: https://github.com/apache/parquet-format/pull/543 PR for parquet footer flatbuf definition: https://github.com/apache/parquet-format/pull/544 On Tue, Dec 9, 2025 at 1:26 AM Julien Le Dem wrote: > Hello Alkis, > Do you think you could add your footer proposal to the proposals page? > > > https://github.com/apache/parquet-format/tree/master/proposals#active-proposals > That way it gets more visibility. > Cheers > Julien > > On Tue, Oct 21, 2025 at 11:49 AM Steve Loughran > > wrote: > > > On Mon, 20 Oct 2025 at 18:24, Ed Seidl wrote: > > > > > IIUC a flatbuffer aware decoder would read the last 36 bytes or so of > the > > > file and look for a known UUID along with size information. With this > it > > > could then read only the flatbuffer bytes. I think this would work as > > well > > > as current systems that prefetch some number of bytes in an attempt to > > get > > > the whole footer in a single get. > > > > > > Old readers, however, will have to fetch both footers, but won't have > any > > > additional decoding work because the new footer is a binary field that > > can > > > be easily skipped. > > > > > > > really depends what the readers do with footer prefetching. For the java > > clients > > > > > >1. s3a classic stream: the backwards seek() switches it to random IO > >mode, next read() from base of thrift will pull in > > fs.s3a.readahead.range > >of data No penalty > >2. google gs://. There's a footer cache option which will need to be > set > >to a larger value > >3. azure abfs:// there's a footer cache option which will need to be > set > >to a larger value > >4. s3a + amazon analytics stream. This stream is *parquet aware* and > >actually parses the footer to know what to predictively prefetch. The > > AWS > >developers do know of this work -moving to support the new footer > would > > be > >the ideal strategy here. > >5. Iceberg classic input. no idea. > >6. iceberg + amazon analytics. same as S3A though without some of the > >tuning we've been doing for vector reads. > > > > I wouldn't worry too much about the impact of that footer size increase. > > Some extra footer prefetch options should compensate, and once apps move > to > > a parquet v3 reader they've got a faster parse time. Of course, > ironically, > > read time then may dominate even more there -it'll be important to do > that > > read as efficiently as possible (use a readFully() into a buffer, not > lots > > of single byte read() calls) > > >
Re: [DISCUSS] flatbuf footer
Hello Alkis, Do you think you could add your footer proposal to the proposals page? https://github.com/apache/parquet-format/tree/master/proposals#active-proposals That way it gets more visibility. Cheers Julien On Tue, Oct 21, 2025 at 11:49 AM Steve Loughran wrote: > On Mon, 20 Oct 2025 at 18:24, Ed Seidl wrote: > > > IIUC a flatbuffer aware decoder would read the last 36 bytes or so of the > > file and look for a known UUID along with size information. With this it > > could then read only the flatbuffer bytes. I think this would work as > well > > as current systems that prefetch some number of bytes in an attempt to > get > > the whole footer in a single get. > > > > Old readers, however, will have to fetch both footers, but won't have any > > additional decoding work because the new footer is a binary field that > can > > be easily skipped. > > > > really depends what the readers do with footer prefetching. For the java > clients > > >1. s3a classic stream: the backwards seek() switches it to random IO >mode, next read() from base of thrift will pull in > fs.s3a.readahead.range >of data No penalty >2. google gs://. There's a footer cache option which will need to be set >to a larger value >3. azure abfs:// there's a footer cache option which will need to be set >to a larger value >4. s3a + amazon analytics stream. This stream is *parquet aware* and >actually parses the footer to know what to predictively prefetch. The > AWS >developers do know of this work -moving to support the new footer would > be >the ideal strategy here. >5. Iceberg classic input. no idea. >6. iceberg + amazon analytics. same as S3A though without some of the >tuning we've been doing for vector reads. > > I wouldn't worry too much about the impact of that footer size increase. > Some extra footer prefetch options should compensate, and once apps move to > a parquet v3 reader they've got a faster parse time. Of course, ironically, > read time then may dominate even more there -it'll be important to do that > read as efficiently as possible (use a readFully() into a buffer, not lots > of single byte read() calls) >
Re: [DISCUSS] flatbuf footer: offsets
Assuming LZ4 compression at 2gb/sec (per core) and network bandwidth at 1gb/sec, and taking as example the 367mb thrift footer in the proposal, the tradeoff is as follows: T=thrift, F32=flatbuf with 32-bit offsets, F64=flatbuf with 64-bit offsets T (367mb): 50ms latency + 370ms transfer --> 420ms (ignoring parse time) F32 (113mb raw / 50mb lz4): 50ms latency + 50ms transfer + 56ms decompression --> 156ms F64 (155mb raw / 52mb lz4): 50ms latency + 52ms transfer + 78ms decompression --> 180ms Going with 64 bit offsets leaves some performance on the table and it will make lz4 compression pretty much required for most footers above 256kb. That said 64-bit offsets are still much faster at transfer than thrift even ignoring the horrendous parse times. For simplicity I am still slightly in favor of 64 bit offsets but I am open to argumentation for 32 bit relative offsets plus alignment to bring row group size to 64gb. Thoughts? On Tue, Oct 28, 2025 at 10:57 AM Antoine Pitrou wrote: > > Hi, > > I expect LZ4 to be optional, but enabled by default by most writers. > LZ4 decompression is extremely fast, typically several GB/s on a modern > CPU. > > Regards > > Antoine. > > > On Mon, 27 Oct 2025 17:06:07 +0100 > Jan Finis wrote: > > You are right that even without LZ4, we would still need I/O for the > whole > > footer. And I guess LZ4 is way faster than thrift, so flatbuf+LZ4 would > be > > an improvement over thrift. If you want superb partial decoding, we would > > indeed need to somehow support only reading part of the footer from > > storage. In the end, it's a trade-off. The more flexibility we want > w.r.t. > > partial reads, the more complexity we have to introduce. Maybe flatbuf > > alone is already the sweet spot here and we shouldn't introduce > additional > > complexity. LZ4 compression would after all still be optional, right? > > > > Someone mentioned that they have footers with millions of columns. Maybe > > they should comment on how much partial reading would be required for > their > > use case. I guess the answer will be "the more support for partial > > reading/decoding the better". > > > > You could argue that if you have such a wide file, just don't use LZ4 > then > > and that's probably a valid argument. > > > > Cheers, > > Jan > > > > > > > > Am Mo., 27. Okt. 2025 um 09:28 Uhr schrieb Antoine Pitrou < > > [email protected]>: > > > > > > > > Hmmm... does it? > > > > > > I may be mistaken, but I had the impression that what you call "read > > > only the parts of the footer I'm interested in" is actually "*decode* > > > only the parts of the footer I'm interested in". > > > > > > That is, you still read the entire footer, which is a larger IO than > > > doing smaller reads, but it's also a single IO rather than several > > > smaller ones. > > > > > > Of course, if we want to make things more flexible, we can have > > > individual Flatbuffers metadata pieces for each column, each > > > LZ4-compressed. And embed two sizes at the end of the file: the size of > > > the "core footer" metadata (without columns) and the size of the "full > > > footer" metadata (with columns); so that readers can choose their > > > preferred strategy. > > > > > > Regards > > > > > > Antoine. > > > > > > > > > On Sat, 25 Oct 2025 14:39:37 +0200 > > > Jan Finis wrote: > > > > Note that LZ4 compression destroys the whole "I can read only the > parts > > > of > > > > the footer I'm interested in", so I wouldn't say that LZ4 can be the > > > > solution to everything. > > > > > > > > Cheers, > > > > Jan > > > > > > > > On Sat, Oct 25, 2025, 12:33 Antoine Pitrou < > > > [email protected]> wrote: > > > > > > > > > On Fri, 24 Oct 2025 12:12:02 -0700 > > > > > Julien Le Dem wrote: > > > > > > I had an idea about this topic. > > > > > > What if we say the offset is always a multiple of 16? (I'm > saying > > > 16, but > > > > > > it works with 8 or 32 or any other power of 2). > > > > > > Then we store in the footer the offset divided by 16. > > > > > > That means you need to pad each row group by up to 16 bytes. > > > > > > But now the max size of the file is 32GB. > > > > > > > > > > > > Personally, I still don't like having arbitrary limits but 32GB > > > seems a > > > > > lot > > > > > > less like a restricting limit than 2GB. > > > > > > If we get crazy, we add this to the footer as metadata and the > > > writer > > > > > gets > > > > > > to pick whether you multiply offsets by 32, 64 or 128 if ten > years > > > from > > > > > now > > > > > > we start having much bigger files. > > > > > > The size of the padding becomes negligible over the size of the > file. > > > > > > > > > > > > Thoughts? > > > > > > > > > > That's an interesting suggestion. I would be fine with it > personally, > > > > > provided the multiplier is either large enough (say, 64) or > embedded in > > > > > the footer. > > > > > > > > > > That said, I would first wait for the outcome of the experiment > with > > > > > L
Re: [DISCUSS] flatbuf footer: offsets
Hi, I expect LZ4 to be optional, but enabled by default by most writers. LZ4 decompression is extremely fast, typically several GB/s on a modern CPU. Regards Antoine. On Mon, 27 Oct 2025 17:06:07 +0100 Jan Finis wrote: > You are right that even without LZ4, we would still need I/O for the whole > footer. And I guess LZ4 is way faster than thrift, so flatbuf+LZ4 would be > an improvement over thrift. If you want superb partial decoding, we would > indeed need to somehow support only reading part of the footer from > storage. In the end, it's a trade-off. The more flexibility we want w.r.t. > partial reads, the more complexity we have to introduce. Maybe flatbuf > alone is already the sweet spot here and we shouldn't introduce additional > complexity. LZ4 compression would after all still be optional, right? > > Someone mentioned that they have footers with millions of columns. Maybe > they should comment on how much partial reading would be required for their > use case. I guess the answer will be "the more support for partial > reading/decoding the better". > > You could argue that if you have such a wide file, just don't use LZ4 then > and that's probably a valid argument. > > Cheers, > Jan > > > > Am Mo., 27. Okt. 2025 um 09:28 Uhr schrieb Antoine Pitrou < > [email protected]>: > > > > > Hmmm... does it? > > > > I may be mistaken, but I had the impression that what you call "read > > only the parts of the footer I'm interested in" is actually "*decode* > > only the parts of the footer I'm interested in". > > > > That is, you still read the entire footer, which is a larger IO than > > doing smaller reads, but it's also a single IO rather than several > > smaller ones. > > > > Of course, if we want to make things more flexible, we can have > > individual Flatbuffers metadata pieces for each column, each > > LZ4-compressed. And embed two sizes at the end of the file: the size of > > the "core footer" metadata (without columns) and the size of the "full > > footer" metadata (with columns); so that readers can choose their > > preferred strategy. > > > > Regards > > > > Antoine. > > > > > > On Sat, 25 Oct 2025 14:39:37 +0200 > > Jan Finis wrote: > > > Note that LZ4 compression destroys the whole "I can read only the parts > > of > > > the footer I'm interested in", so I wouldn't say that LZ4 can be the > > > solution to everything. > > > > > > Cheers, > > > Jan > > > > > > On Sat, Oct 25, 2025, 12:33 Antoine Pitrou < > > [email protected]> wrote: > > > > > > > On Fri, 24 Oct 2025 12:12:02 -0700 > > > > Julien Le Dem wrote: > > > > > I had an idea about this topic. > > > > > What if we say the offset is always a multiple of 16? (I'm saying > > 16, but > > > > > it works with 8 or 32 or any other power of 2). > > > > > Then we store in the footer the offset divided by 16. > > > > > That means you need to pad each row group by up to 16 bytes. > > > > > But now the max size of the file is 32GB. > > > > > > > > > > Personally, I still don't like having arbitrary limits but 32GB > > seems a > > > > lot > > > > > less like a restricting limit than 2GB. > > > > > If we get crazy, we add this to the footer as metadata and the > > writer > > > > gets > > > > > to pick whether you multiply offsets by 32, 64 or 128 if ten years > > from > > > > now > > > > > we start having much bigger files. > > > > > The size of the padding becomes negligible over the size of the file. > > > > > > > > > > Thoughts? > > > > > > > > That's an interesting suggestion. I would be fine with it personally, > > > > provided the multiplier is either large enough (say, 64) or embedded in > > > > the footer. > > > > > > > > That said, I would first wait for the outcome of the experiment with > > > > LZ4 compression. If it negates the additional cost of 64-bit offsets, > > > > then we should not bother with this multiplier mechanism. > > > > > > > > Regards > > > > > > > > Antoine. > > > > > > > > > > > > > > > > > > > > > > > On Tue, Oct 21, 2025 at 6:19 AM Alkis Evlogimenos > > > > > > > > > > wrote: > > > > > > > > > > > We've analyzed a large footer from our production environment to > > > > understand > > > > > > byte distribution across its fields. The detailed analysis is > > > > available in > > > > > > the proposal document here: > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit?tab=t.o2lsuuyi8rw6#heading=h.26i914tjp4fk > > > > > > > > . > > > > > > > > > > > > To illustrate the impact of 64-bit fields, we conducted an > > experiment > > > > where > > > > > > all proposed 32-bit fields in the Flatbuf footer were changed to > > > > 64-bit. > > > > > > This resulted in a *40% increase* in footer size. > > > > > > > > > > > > That said, LZ4 manages to compress this away. We will do some > > more > > > > testing > > > > > > with 64 bit offsets/numvals/s
Re: [DISCUSS] flatbuf footer: offsets
You are right that even without LZ4, we would still need I/O for the whole footer. And I guess LZ4 is way faster than thrift, so flatbuf+LZ4 would be an improvement over thrift. If you want superb partial decoding, we would indeed need to somehow support only reading part of the footer from storage. In the end, it's a trade-off. The more flexibility we want w.r.t. partial reads, the more complexity we have to introduce. Maybe flatbuf alone is already the sweet spot here and we shouldn't introduce additional complexity. LZ4 compression would after all still be optional, right? Someone mentioned that they have footers with millions of columns. Maybe they should comment on how much partial reading would be required for their use case. I guess the answer will be "the more support for partial reading/decoding the better". You could argue that if you have such a wide file, just don't use LZ4 then and that's probably a valid argument. Cheers, Jan Am Mo., 27. Okt. 2025 um 09:28 Uhr schrieb Antoine Pitrou < [email protected]>: > > Hmmm... does it? > > I may be mistaken, but I had the impression that what you call "read > only the parts of the footer I'm interested in" is actually "*decode* > only the parts of the footer I'm interested in". > > That is, you still read the entire footer, which is a larger IO than > doing smaller reads, but it's also a single IO rather than several > smaller ones. > > Of course, if we want to make things more flexible, we can have > individual Flatbuffers metadata pieces for each column, each > LZ4-compressed. And embed two sizes at the end of the file: the size of > the "core footer" metadata (without columns) and the size of the "full > footer" metadata (with columns); so that readers can choose their > preferred strategy. > > Regards > > Antoine. > > > On Sat, 25 Oct 2025 14:39:37 +0200 > Jan Finis wrote: > > Note that LZ4 compression destroys the whole "I can read only the parts > of > > the footer I'm interested in", so I wouldn't say that LZ4 can be the > > solution to everything. > > > > Cheers, > > Jan > > > > On Sat, Oct 25, 2025, 12:33 Antoine Pitrou < > [email protected]> wrote: > > > > > On Fri, 24 Oct 2025 12:12:02 -0700 > > > Julien Le Dem wrote: > > > > I had an idea about this topic. > > > > What if we say the offset is always a multiple of 16? (I'm saying > 16, but > > > > it works with 8 or 32 or any other power of 2). > > > > Then we store in the footer the offset divided by 16. > > > > That means you need to pad each row group by up to 16 bytes. > > > > But now the max size of the file is 32GB. > > > > > > > > Personally, I still don't like having arbitrary limits but 32GB > seems a > > > lot > > > > less like a restricting limit than 2GB. > > > > If we get crazy, we add this to the footer as metadata and the > writer > > > gets > > > > to pick whether you multiply offsets by 32, 64 or 128 if ten years > from > > > now > > > > we start having much bigger files. > > > > The size of the padding becomes negligible over the size of the file. > > > > > > > > Thoughts? > > > > > > That's an interesting suggestion. I would be fine with it personally, > > > provided the multiplier is either large enough (say, 64) or embedded in > > > the footer. > > > > > > That said, I would first wait for the outcome of the experiment with > > > LZ4 compression. If it negates the additional cost of 64-bit offsets, > > > then we should not bother with this multiplier mechanism. > > > > > > Regards > > > > > > Antoine. > > > > > > > > > > > > > > > > > > On Tue, Oct 21, 2025 at 6:19 AM Alkis Evlogimenos > > > > wrote: > > > > > > > > > We've analyzed a large footer from our production environment to > > > understand > > > > > byte distribution across its fields. The detailed analysis is > > > available in > > > > > the proposal document here: > > > > > > > > > > > > > > https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit?tab=t.o2lsuuyi8rw6#heading=h.26i914tjp4fk > > > > > > . > > > > > > > > > > To illustrate the impact of 64-bit fields, we conducted an > experiment > > > where > > > > > all proposed 32-bit fields in the Flatbuf footer were changed to > > > 64-bit. > > > > > This resulted in a *40% increase* in footer size. > > > > > > > > > > That said, LZ4 manages to compress this away. We will do some > more > > > testing > > > > > with 64 bit offsets/numvals/sizes and revert back. If it all goes > well > > > we > > > > > can resolve this by going 64 bit. > > > > > > > > > > > > > > > On Wed, Oct 15, 2025 at 12:49 PM Jan Finis < > > > [email protected]> wrote: > > > > > > > > > > > Hi Alkis, > > > > > > > > > > > > one more very simple argument why you want these offsets to be > i64: > > > > > > What if you want to store a single value larger than 4GB? I know > this > > > > > > sounds absurd at first, but some use cases might want to store > data > > > that > > > > > > can sometimes b
Re: [DISCUSS] flatbuf footer: offsets
Hmmm... does it? I may be mistaken, but I had the impression that what you call "read only the parts of the footer I'm interested in" is actually "*decode* only the parts of the footer I'm interested in". That is, you still read the entire footer, which is a larger IO than doing smaller reads, but it's also a single IO rather than several smaller ones. Of course, if we want to make things more flexible, we can have individual Flatbuffers metadata pieces for each column, each LZ4-compressed. And embed two sizes at the end of the file: the size of the "core footer" metadata (without columns) and the size of the "full footer" metadata (with columns); so that readers can choose their preferred strategy. Regards Antoine. On Sat, 25 Oct 2025 14:39:37 +0200 Jan Finis wrote: > Note that LZ4 compression destroys the whole "I can read only the parts of > the footer I'm interested in", so I wouldn't say that LZ4 can be the > solution to everything. > > Cheers, > Jan > > On Sat, Oct 25, 2025, 12:33 Antoine Pitrou > wrote: > > > On Fri, 24 Oct 2025 12:12:02 -0700 > > Julien Le Dem wrote: > > > I had an idea about this topic. > > > What if we say the offset is always a multiple of 16? (I'm saying 16, but > > > it works with 8 or 32 or any other power of 2). > > > Then we store in the footer the offset divided by 16. > > > That means you need to pad each row group by up to 16 bytes. > > > But now the max size of the file is 32GB. > > > > > > Personally, I still don't like having arbitrary limits but 32GB seems a > > lot > > > less like a restricting limit than 2GB. > > > If we get crazy, we add this to the footer as metadata and the writer > > gets > > > to pick whether you multiply offsets by 32, 64 or 128 if ten years from > > now > > > we start having much bigger files. > > > The size of the padding becomes negligible over the size of the file. > > > > > > Thoughts? > > > > That's an interesting suggestion. I would be fine with it personally, > > provided the multiplier is either large enough (say, 64) or embedded in > > the footer. > > > > That said, I would first wait for the outcome of the experiment with > > LZ4 compression. If it negates the additional cost of 64-bit offsets, > > then we should not bother with this multiplier mechanism. > > > > Regards > > > > Antoine. > > > > > > > > > > > > > On Tue, Oct 21, 2025 at 6:19 AM Alkis Evlogimenos > > > wrote: > > > > > > > We've analyzed a large footer from our production environment to > > understand > > > > byte distribution across its fields. The detailed analysis is > > available in > > > > the proposal document here: > > > > > > > > > > https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit?tab=t.o2lsuuyi8rw6#heading=h.26i914tjp4fk > > > > > > . > > > > > > > > To illustrate the impact of 64-bit fields, we conducted an experiment > > where > > > > all proposed 32-bit fields in the Flatbuf footer were changed to > > 64-bit. > > > > This resulted in a *40% increase* in footer size. > > > > > > > > That said, LZ4 manages to compress this away. We will do some more > > testing > > > > with 64 bit offsets/numvals/sizes and revert back. If it all goes well > > we > > > > can resolve this by going 64 bit. > > > > > > > > > > > > On Wed, Oct 15, 2025 at 12:49 PM Jan Finis < > > [email protected]> wrote: > > > > > > > > > Hi Alkis, > > > > > > > > > > one more very simple argument why you want these offsets to be i64: > > > > > What if you want to store a single value larger than 4GB? I know this > > > > > sounds absurd at first, but some use cases might want to store data > > that > > > > > can sometimes be very large (e.g. blob data, or insanely complex > > geo > > > > data). > > > > > And it would be a shame if that would mean that they cannot use > > Parquet > > > > at > > > > > all. > > > > > > > > > > Thus, my opinion here is that we can limit to i32 all fields that > > the > > > > file > > > > > writer has under control, e.g., the number of rows within a row > > group, > > > > but > > > > > we shouldn't limit any values that a file writer doesn't have under > > > > > control, as they fully depend on the input data. > > > > > > > > > > Note though that this means that the number of values in a column > > chunk > > > > > could also exceed i32, if a user has nested data with more than 4 > > billion > > > > > entries. With such data, the file writer again couldn't do anything > > to > > > > > avoid writing a row group with more > > > > > than i32 values, as a single row may not span multiple row groups. > > That > > > > > being said, I think that nested data with more than 4 billion > > entries is > > > > > less likely than a single large blob of 4 billion bytes. > > > > > > > > > > I know that smaller row groups is what most / all engines prefer, > > but we > > > > > have to make sure
Re: [DISCUSS] flatbuf footer: offsets
Note that LZ4 compression destroys the whole "I can read only the parts of the footer I'm interested in", so I wouldn't say that LZ4 can be the solution to everything. Cheers, Jan On Sat, Oct 25, 2025, 12:33 Antoine Pitrou wrote: > On Fri, 24 Oct 2025 12:12:02 -0700 > Julien Le Dem wrote: > > I had an idea about this topic. > > What if we say the offset is always a multiple of 16? (I'm saying 16, but > > it works with 8 or 32 or any other power of 2). > > Then we store in the footer the offset divided by 16. > > That means you need to pad each row group by up to 16 bytes. > > But now the max size of the file is 32GB. > > > > Personally, I still don't like having arbitrary limits but 32GB seems a > lot > > less like a restricting limit than 2GB. > > If we get crazy, we add this to the footer as metadata and the writer > gets > > to pick whether you multiply offsets by 32, 64 or 128 if ten years from > now > > we start having much bigger files. > > The size of the padding becomes negligible over the size of the file. > > > > Thoughts? > > That's an interesting suggestion. I would be fine with it personally, > provided the multiplier is either large enough (say, 64) or embedded in > the footer. > > That said, I would first wait for the outcome of the experiment with > LZ4 compression. If it negates the additional cost of 64-bit offsets, > then we should not bother with this multiplier mechanism. > > Regards > > Antoine. > > > > > > > > On Tue, Oct 21, 2025 at 6:19 AM Alkis Evlogimenos > > wrote: > > > > > We've analyzed a large footer from our production environment to > understand > > > byte distribution across its fields. The detailed analysis is > available in > > > the proposal document here: > > > > > > > https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit?tab=t.o2lsuuyi8rw6#heading=h.26i914tjp4fk > > > . > > > > > > To illustrate the impact of 64-bit fields, we conducted an experiment > where > > > all proposed 32-bit fields in the Flatbuf footer were changed to > 64-bit. > > > This resulted in a *40% increase* in footer size. > > > > > > That said, LZ4 manages to compress this away. We will do some more > testing > > > with 64 bit offsets/numvals/sizes and revert back. If it all goes well > we > > > can resolve this by going 64 bit. > > > > > > > > > On Wed, Oct 15, 2025 at 12:49 PM Jan Finis < > [email protected]> wrote: > > > > > > > Hi Alkis, > > > > > > > > one more very simple argument why you want these offsets to be i64: > > > > What if you want to store a single value larger than 4GB? I know this > > > > sounds absurd at first, but some use cases might want to store data > that > > > > can sometimes be very large (e.g. blob data, or insanely complex > geo > > > data). > > > > And it would be a shame if that would mean that they cannot use > Parquet > > > at > > > > all. > > > > > > > > Thus, my opinion here is that we can limit to i32 all fields that > the > > > file > > > > writer has under control, e.g., the number of rows within a row > group, > > > but > > > > we shouldn't limit any values that a file writer doesn't have under > > > > control, as they fully depend on the input data. > > > > > > > > Note though that this means that the number of values in a column > chunk > > > > could also exceed i32, if a user has nested data with more than 4 > billion > > > > entries. With such data, the file writer again couldn't do anything > to > > > > avoid writing a row group with more > > > > than i32 values, as a single row may not span multiple row groups. > That > > > > being said, I think that nested data with more than 4 billion > entries is > > > > less likely than a single large blob of 4 billion bytes. > > > > > > > > I know that smaller row groups is what most / all engines prefer, > but we > > > > have to make sure the format also works for edge cases. > > > > > > > > Cheers, > > > > Jan > > > > > > > > Am Mi., 15. Okt. 2025 um 05:05 Uhr schrieb Adam Reeve > > > >: > > > > > > > > > Hi Alkis > > > > > > > > > > Thanks for all your work on this proposal. > > > > > > > > > > I'd be in favour of keeping the offsets as i64 and not reducing > the > > > > maximum > > > > > row group size, even if this results in slightly larger footers. > I've > > > > heard > > > > > from some of our users within G-Research that they do have files > with > > > row > > > > > groups > 2 GiB. This is often when they use lower-level APIs to > write > > > > > Parquet that don't automatically split data into row groups, and > they > > > > > either write a single row group for simplicity or have some logical > > > > > partitioning of data into row groups. They might also have wide > tables > > > > with > > > > > many columns, or wide array/tensor valued columns that lead to > large > > > row > > > > > groups. > > > > > > > > > > In many workflows we don't read Parquet with a query engine that > > > supports > > > > > filters and skipping row groups,
Re: [DISCUSS] flatbuf footer: offsets
On Fri, 24 Oct 2025 12:12:02 -0700 Julien Le Dem wrote: > I had an idea about this topic. > What if we say the offset is always a multiple of 16? (I'm saying 16, but > it works with 8 or 32 or any other power of 2). > Then we store in the footer the offset divided by 16. > That means you need to pad each row group by up to 16 bytes. > But now the max size of the file is 32GB. > > Personally, I still don't like having arbitrary limits but 32GB seems a lot > less like a restricting limit than 2GB. > If we get crazy, we add this to the footer as metadata and the writer gets > to pick whether you multiply offsets by 32, 64 or 128 if ten years from now > we start having much bigger files. > The size of the padding becomes negligible over the size of the file. > > Thoughts? That's an interesting suggestion. I would be fine with it personally, provided the multiplier is either large enough (say, 64) or embedded in the footer. That said, I would first wait for the outcome of the experiment with LZ4 compression. If it negates the additional cost of 64-bit offsets, then we should not bother with this multiplier mechanism. Regards Antoine. > > > On Tue, Oct 21, 2025 at 6:19 AM Alkis Evlogimenos > wrote: > > > We've analyzed a large footer from our production environment to understand > > byte distribution across its fields. The detailed analysis is available in > > the proposal document here: > > > > https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit?tab=t.o2lsuuyi8rw6#heading=h.26i914tjp4fk > > . > > > > To illustrate the impact of 64-bit fields, we conducted an experiment where > > all proposed 32-bit fields in the Flatbuf footer were changed to 64-bit. > > This resulted in a *40% increase* in footer size. > > > > That said, LZ4 manages to compress this away. We will do some more testing > > with 64 bit offsets/numvals/sizes and revert back. If it all goes well we > > can resolve this by going 64 bit. > > > > > > On Wed, Oct 15, 2025 at 12:49 PM Jan Finis > > wrote: > > > > > Hi Alkis, > > > > > > one more very simple argument why you want these offsets to be i64: > > > What if you want to store a single value larger than 4GB? I know this > > > sounds absurd at first, but some use cases might want to store data that > > > can sometimes be very large (e.g. blob data, or insanely complex geo > > data). > > > And it would be a shame if that would mean that they cannot use Parquet > > at > > > all. > > > > > > Thus, my opinion here is that we can limit to i32 all fields that the > > file > > > writer has under control, e.g., the number of rows within a row group, > > but > > > we shouldn't limit any values that a file writer doesn't have under > > > control, as they fully depend on the input data. > > > > > > Note though that this means that the number of values in a column chunk > > > could also exceed i32, if a user has nested data with more than 4 billion > > > entries. With such data, the file writer again couldn't do anything to > > > avoid writing a row group with more > > > than i32 values, as a single row may not span multiple row groups. That > > > being said, I think that nested data with more than 4 billion entries is > > > less likely than a single large blob of 4 billion bytes. > > > > > > I know that smaller row groups is what most / all engines prefer, but we > > > have to make sure the format also works for edge cases. > > > > > > Cheers, > > > Jan > > > > > > Am Mi., 15. Okt. 2025 um 05:05 Uhr schrieb Adam Reeve > > > > >: > > > > > > > Hi Alkis > > > > > > > > Thanks for all your work on this proposal. > > > > > > > > I'd be in favour of keeping the offsets as i64 and not reducing the > > > maximum > > > > row group size, even if this results in slightly larger footers. I've > > > heard > > > > from some of our users within G-Research that they do have files with > > row > > > > groups > 2 GiB. This is often when they use lower-level APIs to write > > > > Parquet that don't automatically split data into row groups, and they > > > > either write a single row group for simplicity or have some logical > > > > partitioning of data into row groups. They might also have wide tables > > > with > > > > many columns, or wide array/tensor valued columns that lead to large > > row > > > > groups. > > > > > > > > In many workflows we don't read Parquet with a query engine that > > supports > > > > filters and skipping row groups, but just read all rows, or directly > > > > specify the row groups to read if there is some known logical > > > partitioning > > > > into row groups. I'm sure we could work around a 2 or 4 GiB row group > > > size > > > > limitation if we had to, but it's a new constraint that reduces the > > > > flexibility of the format and makes more work for users who now need to > > > > ensure they don't hit this limit. > > > > > > > > Do you have any measurements of how much of a difference 4 byte
Re: [DISCUSS] flatbuf footer: offsets
I had an idea about this topic. What if we say the offset is always a multiple of 16? (I'm saying 16, but it works with 8 or 32 or any other power of 2). Then we store in the footer the offset divided by 16. That means you need to pad each row group by up to 16 bytes. But now the max size of the file is 32GB. Personally, I still don't like having arbitrary limits but 32GB seems a lot less like a restricting limit than 2GB. If we get crazy, we add this to the footer as metadata and the writer gets to pick whether you multiply offsets by 32, 64 or 128 if ten years from now we start having much bigger files. The size of the padding becomes negligible over the size of the file. Thoughts? On Tue, Oct 21, 2025 at 6:19 AM Alkis Evlogimenos wrote: > We've analyzed a large footer from our production environment to understand > byte distribution across its fields. The detailed analysis is available in > the proposal document here: > > https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit?tab=t.o2lsuuyi8rw6#heading=h.26i914tjp4fk > . > > To illustrate the impact of 64-bit fields, we conducted an experiment where > all proposed 32-bit fields in the Flatbuf footer were changed to 64-bit. > This resulted in a *40% increase* in footer size. > > That said, LZ4 manages to compress this away. We will do some more testing > with 64 bit offsets/numvals/sizes and revert back. If it all goes well we > can resolve this by going 64 bit. > > > On Wed, Oct 15, 2025 at 12:49 PM Jan Finis wrote: > > > Hi Alkis, > > > > one more very simple argument why you want these offsets to be i64: > > What if you want to store a single value larger than 4GB? I know this > > sounds absurd at first, but some use cases might want to store data that > > can sometimes be very large (e.g. blob data, or insanely complex geo > data). > > And it would be a shame if that would mean that they cannot use Parquet > at > > all. > > > > Thus, my opinion here is that we can limit to i32 all fields that the > file > > writer has under control, e.g., the number of rows within a row group, > but > > we shouldn't limit any values that a file writer doesn't have under > > control, as they fully depend on the input data. > > > > Note though that this means that the number of values in a column chunk > > could also exceed i32, if a user has nested data with more than 4 billion > > entries. With such data, the file writer again couldn't do anything to > > avoid writing a row group with more > > than i32 values, as a single row may not span multiple row groups. That > > being said, I think that nested data with more than 4 billion entries is > > less likely than a single large blob of 4 billion bytes. > > > > I know that smaller row groups is what most / all engines prefer, but we > > have to make sure the format also works for edge cases. > > > > Cheers, > > Jan > > > > Am Mi., 15. Okt. 2025 um 05:05 Uhr schrieb Adam Reeve >: > > > > > Hi Alkis > > > > > > Thanks for all your work on this proposal. > > > > > > I'd be in favour of keeping the offsets as i64 and not reducing the > > maximum > > > row group size, even if this results in slightly larger footers. I've > > heard > > > from some of our users within G-Research that they do have files with > row > > > groups > 2 GiB. This is often when they use lower-level APIs to write > > > Parquet that don't automatically split data into row groups, and they > > > either write a single row group for simplicity or have some logical > > > partitioning of data into row groups. They might also have wide tables > > with > > > many columns, or wide array/tensor valued columns that lead to large > row > > > groups. > > > > > > In many workflows we don't read Parquet with a query engine that > supports > > > filters and skipping row groups, but just read all rows, or directly > > > specify the row groups to read if there is some known logical > > partitioning > > > into row groups. I'm sure we could work around a 2 or 4 GiB row group > > size > > > limitation if we had to, but it's a new constraint that reduces the > > > flexibility of the format and makes more work for users who now need to > > > ensure they don't hit this limit. > > > > > > Do you have any measurements of how much of a difference 4 byte offsets > > > make to footer sizes in your data, with and without the optional LZ4 > > > compression? > > > > > > Thanks, > > > Adam > > > > > > On Tue, 14 Oct 2025 at 21:02, Alkis Evlogimenos > > > wrote: > > > > > > > Hi all, > > > > > > > > From the comments on the [EXTERNAL] Parquet metadata > > > > < > > > > > > > > > > https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit?tab=t.0 > > > > > > > > > document, > > > > it appears there's a general consensus on most aspects, with the > > > exception > > > > of the relative 32-bit offsets for column chunks. > > > > > > > > I'm starting this thread to discuss this topic further and work > > towards a > > > > reso
Re: [DISCUSS] flatbuf footer
On Mon, 20 Oct 2025 at 18:24, Ed Seidl wrote: > IIUC a flatbuffer aware decoder would read the last 36 bytes or so of the > file and look for a known UUID along with size information. With this it > could then read only the flatbuffer bytes. I think this would work as well > as current systems that prefetch some number of bytes in an attempt to get > the whole footer in a single get. > > Old readers, however, will have to fetch both footers, but won't have any > additional decoding work because the new footer is a binary field that can > be easily skipped. > really depends what the readers do with footer prefetching. For the java clients 1. s3a classic stream: the backwards seek() switches it to random IO mode, next read() from base of thrift will pull in fs.s3a.readahead.range of data No penalty 2. google gs://. There's a footer cache option which will need to be set to a larger value 3. azure abfs:// there's a footer cache option which will need to be set to a larger value 4. s3a + amazon analytics stream. This stream is *parquet aware* and actually parses the footer to know what to predictively prefetch. The AWS developers do know of this work -moving to support the new footer would be the ideal strategy here. 5. Iceberg classic input. no idea. 6. iceberg + amazon analytics. same as S3A though without some of the tuning we've been doing for vector reads. I wouldn't worry too much about the impact of that footer size increase. Some extra footer prefetch options should compensate, and once apps move to a parquet v3 reader they've got a faster parse time. Of course, ironically, read time then may dominate even more there -it'll be important to do that read as efficiently as possible (use a readFully() into a buffer, not lots of single byte read() calls)
Re: [DISCUSS] flatbuf footer: offsets
We've analyzed a large footer from our production environment to understand byte distribution across its fields. The detailed analysis is available in the proposal document here: https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit?tab=t.o2lsuuyi8rw6#heading=h.26i914tjp4fk . To illustrate the impact of 64-bit fields, we conducted an experiment where all proposed 32-bit fields in the Flatbuf footer were changed to 64-bit. This resulted in a *40% increase* in footer size. That said, LZ4 manages to compress this away. We will do some more testing with 64 bit offsets/numvals/sizes and revert back. If it all goes well we can resolve this by going 64 bit. On Wed, Oct 15, 2025 at 12:49 PM Jan Finis wrote: > Hi Alkis, > > one more very simple argument why you want these offsets to be i64: > What if you want to store a single value larger than 4GB? I know this > sounds absurd at first, but some use cases might want to store data that > can sometimes be very large (e.g. blob data, or insanely complex geo data). > And it would be a shame if that would mean that they cannot use Parquet at > all. > > Thus, my opinion here is that we can limit to i32 all fields that the file > writer has under control, e.g., the number of rows within a row group, but > we shouldn't limit any values that a file writer doesn't have under > control, as they fully depend on the input data. > > Note though that this means that the number of values in a column chunk > could also exceed i32, if a user has nested data with more than 4 billion > entries. With such data, the file writer again couldn't do anything to > avoid writing a row group with more > than i32 values, as a single row may not span multiple row groups. That > being said, I think that nested data with more than 4 billion entries is > less likely than a single large blob of 4 billion bytes. > > I know that smaller row groups is what most / all engines prefer, but we > have to make sure the format also works for edge cases. > > Cheers, > Jan > > Am Mi., 15. Okt. 2025 um 05:05 Uhr schrieb Adam Reeve : > > > Hi Alkis > > > > Thanks for all your work on this proposal. > > > > I'd be in favour of keeping the offsets as i64 and not reducing the > maximum > > row group size, even if this results in slightly larger footers. I've > heard > > from some of our users within G-Research that they do have files with row > > groups > 2 GiB. This is often when they use lower-level APIs to write > > Parquet that don't automatically split data into row groups, and they > > either write a single row group for simplicity or have some logical > > partitioning of data into row groups. They might also have wide tables > with > > many columns, or wide array/tensor valued columns that lead to large row > > groups. > > > > In many workflows we don't read Parquet with a query engine that supports > > filters and skipping row groups, but just read all rows, or directly > > specify the row groups to read if there is some known logical > partitioning > > into row groups. I'm sure we could work around a 2 or 4 GiB row group > size > > limitation if we had to, but it's a new constraint that reduces the > > flexibility of the format and makes more work for users who now need to > > ensure they don't hit this limit. > > > > Do you have any measurements of how much of a difference 4 byte offsets > > make to footer sizes in your data, with and without the optional LZ4 > > compression? > > > > Thanks, > > Adam > > > > On Tue, 14 Oct 2025 at 21:02, Alkis Evlogimenos > > wrote: > > > > > Hi all, > > > > > > From the comments on the [EXTERNAL] Parquet metadata > > > < > > > > > > https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit?tab=t.0 > > > > > > > document, > > > it appears there's a general consensus on most aspects, with the > > exception > > > of the relative 32-bit offsets for column chunks. > > > > > > I'm starting this thread to discuss this topic further and work > towards a > > > resolution. Adam Reeve suggested raising the limitation to 2^32, and he > > > confirmed that Java does not have any issues with this. I am open to > this > > > change as it increases the limit without introducing any drawbacks. > > > > > > However, some still feel that a 2^32-byte limit for a row group is too > > > restrictive. I'd like to understand these specific use cases better. > From > > > my perspective, for most engines, the row group is the primary unit of > > > skipping, making very large row groups less desirable. In our fleet's > > > workloads, it's rare to see row groups larger than 100MB, as anything > > > larger tends to make statistics-based skipping ineffective. > > > > > > Cheers, > > > > > >
Re: [DISCUSS] flatbuf footer
IIUC a flatbuffer aware decoder would read the last 36 bytes or so of the file and look for a known UUID along with size information. With this it could then read only the flatbuffer bytes. I think this would work as well as current systems that prefetch some number of bytes in an attempt to get the whole footer in a single get. Old readers, however, will have to fetch both footers, but won't have any additional decoding work because the new footer is a binary field that can be easily skipped. On 2025/10/20 15:59:14 Adrian Garcia Badaracco wrote: > If we embed both a flat buffer footer and a thrift footer, will readers be > able to completely skip the thrift footer to read the flat buffer footer? Or > will they have to download / read both? Especially if they have to download > the bytes for both I’m not sure how big the win will be, on object storage > slow IO can be what dominates. > > > On Oct 20, 2025, at 9:49 AM, Raphael Taylor-Davies > > wrote: > > > > I don't disagree that two files is much harder than one file, but is that > > the use-case that the flatbuffer format is solving for, or is that > > adequately serviced by the existing thrift-based footer? I had interpreted > > the flatbuffer more as a way to accelerate larger datasets consisting of > > many files, and of less utility for the single-file use-case. > > > > That being said I misread the proposal, I thought it was proposing > > replacing the thrift based footer with a flatbuffer, which would be very > > disruptive. However, it looks like instead the (new?) proposal is to just > > create a duplicate flatbuffer footer embedded within the thrift footer, > > which can just be ignored by readers. The proposal is a bit vague when it > > comes to whether all information would be duplicated, or whether some > > information would only be embedded in the flatbuffer payload, but presuming > > it is a true duplicate, many of my points don't apply. > > > > Kind Regards, > > > > Raphael > > > > On 20/10/2025 15:28, Antoine Pitrou wrote: > >> I don't think it's a "small price to pay". Parquet files are widely > >> used to share or transfer data of all kinds (in a way, they replace CSV > >> with much better characteristics). Sharing a single file is easy, > >> sharing two related files while keeping their relationship intact is an > >> order of magnitude more difficult. > >> > >> Regards > >> > >> Antoine. > >> > >> > >> On Mon, 20 Oct 2025 12:23:20 +0100 > >> Personal > >> > >> wrote: > >>> Apologies if this has already been discussed, but have we considered > >>> simply storing these flatbuffers as separate files alongside existing > >>> parquet files. I think this would have a number of quite compelling > >>> advantages: > >>> > >>> - no breaking format changes, all readers can continue to still read all > >>> parquet files > >>> - people can generate these "index" files for existing datasets without > >>> having to rewrite all their files > >>> - older and newer readers can coexist - no stop the world migrations > >>> - can potentially combine multiple flatbuffers into a single file for > >>> better IO when scanning collections of files - potentially very valuable > >>> for object stores, and would also help for people on HDFS and other > >>> systems that struggle with small files > >>> - could potentially even inline these flatbuffers into catalogs like > >>> iceberg > >>> - can continue to iterate at a faster rate, without the constraints of > >>> needing to move in lockstep with parquet versioning > >>> - potentially less confusing for users, parquet files are still the same, > >>> they just can be accelerated by these new index files > >>> > >>> This would mean some data duplication, but that seems a small price to > >>> pay, and would be strictly opt-in for users with use-cases that justify > >>> it? > >>> > >>> Kind Regards, > >>> > >>> Raphael > >>> > >>> On 20 October 2025 11:08:59 BST, Alkis Evlogimenos > >>> wrote: > > Thank you, these are interesting. Can you share instructions on how to > > reproduce the reported numbers? I am interested to review the code used > > to > > generate these results (esp the C++ thrift code) > > The numbers are based on internal code (Photon). They are not very far > off > >>> >from https://github.com/apache/arrow/pull/43793. I will update that PR in > the coming weeks so that we can repro the same benchmarks with open > source > code too. > > On Fri, Oct 17, 2025 at 5:52 PM Andrew Lamb > wrote: > > > Thanks Alkis, that is interesting data. > > > >> We found that the reported numbers were not reproducible on AWS > >> instances > > I just updated the benchmark results[1] to include results from > > AWS m6id.8xlarge instance (and they are indeed about 2x slower than when > > run on my 2023 Mac laptop) > > > >> You can find the summary of our f
Re: [DISCUSS] flatbuf footer
If we embed both a flat buffer footer and a thrift footer, will readers be able to completely skip the thrift footer to read the flat buffer footer? Or will they have to download / read both? Especially if they have to download the bytes for both I’m not sure how big the win will be, on object storage slow IO can be what dominates. > On Oct 20, 2025, at 9:49 AM, Raphael Taylor-Davies > wrote: > > I don't disagree that two files is much harder than one file, but is that the > use-case that the flatbuffer format is solving for, or is that adequately > serviced by the existing thrift-based footer? I had interpreted the > flatbuffer more as a way to accelerate larger datasets consisting of many > files, and of less utility for the single-file use-case. > > That being said I misread the proposal, I thought it was proposing replacing > the thrift based footer with a flatbuffer, which would be very disruptive. > However, it looks like instead the (new?) proposal is to just create a > duplicate flatbuffer footer embedded within the thrift footer, which can just > be ignored by readers. The proposal is a bit vague when it comes to whether > all information would be duplicated, or whether some information would only > be embedded in the flatbuffer payload, but presuming it is a true duplicate, > many of my points don't apply. > > Kind Regards, > > Raphael > > On 20/10/2025 15:28, Antoine Pitrou wrote: >> I don't think it's a "small price to pay". Parquet files are widely >> used to share or transfer data of all kinds (in a way, they replace CSV >> with much better characteristics). Sharing a single file is easy, >> sharing two related files while keeping their relationship intact is an >> order of magnitude more difficult. >> >> Regards >> >> Antoine. >> >> >> On Mon, 20 Oct 2025 12:23:20 +0100 >> Personal >> >> wrote: >>> Apologies if this has already been discussed, but have we considered simply >>> storing these flatbuffers as separate files alongside existing parquet >>> files. I think this would have a number of quite compelling advantages: >>> >>> - no breaking format changes, all readers can continue to still read all >>> parquet files >>> - people can generate these "index" files for existing datasets without >>> having to rewrite all their files >>> - older and newer readers can coexist - no stop the world migrations >>> - can potentially combine multiple flatbuffers into a single file for >>> better IO when scanning collections of files - potentially very valuable >>> for object stores, and would also help for people on HDFS and other systems >>> that struggle with small files >>> - could potentially even inline these flatbuffers into catalogs like iceberg >>> - can continue to iterate at a faster rate, without the constraints of >>> needing to move in lockstep with parquet versioning >>> - potentially less confusing for users, parquet files are still the same, >>> they just can be accelerated by these new index files >>> >>> This would mean some data duplication, but that seems a small price to pay, >>> and would be strictly opt-in for users with use-cases that justify it? >>> >>> Kind Regards, >>> >>> Raphael >>> >>> On 20 October 2025 11:08:59 BST, Alkis Evlogimenos >>> wrote: > Thank you, these are interesting. Can you share instructions on how to > reproduce the reported numbers? I am interested to review the code used to > generate these results (esp the C++ thrift code) The numbers are based on internal code (Photon). They are not very far off >>> >from https://github.com/apache/arrow/pull/43793. I will update that PR in the coming weeks so that we can repro the same benchmarks with open source code too. On Fri, Oct 17, 2025 at 5:52 PM Andrew Lamb wrote: > Thanks Alkis, that is interesting data. > >> We found that the reported numbers were not reproducible on AWS instances > I just updated the benchmark results[1] to include results from > AWS m6id.8xlarge instance (and they are indeed about 2x slower than when > run on my 2023 Mac laptop) > >> You can find the summary of our findings in a separate tab in the > proposal document: > > Thank you, these are interesting. Can you share instructions on how to > reproduce the reported numbers? I am interested to review the code used to > generate these results (esp the C++ thrift code) > > Thanks > Andrew > > > [1]: > > https://github.com/alamb/parquet_footer_parsing?tab=readme-ov-file#results-on-linux > > > On Fri, Oct 17, 2025 at 10:20 AM Alkis Evlogimenos > wrote: > >> Thank you Andrew for putting the code in open source so that we can repro >> it. >> >> We have run the rust benchmarks and also run the flatbuf proposal with > our >> C++ thrift parser, the flatbuf footer with Thrift conversion, the >> flatbuf foote
Re: [DISCUSS] flatbuf footer
I don't disagree that two files is much harder than one file, but is that the use-case that the flatbuffer format is solving for, or is that adequately serviced by the existing thrift-based footer? I had interpreted the flatbuffer more as a way to accelerate larger datasets consisting of many files, and of less utility for the single-file use-case. That being said I misread the proposal, I thought it was proposing replacing the thrift based footer with a flatbuffer, which would be very disruptive. However, it looks like instead the (new?) proposal is to just create a duplicate flatbuffer footer embedded within the thrift footer, which can just be ignored by readers. The proposal is a bit vague when it comes to whether all information would be duplicated, or whether some information would only be embedded in the flatbuffer payload, but presuming it is a true duplicate, many of my points don't apply. Kind Regards, Raphael On 20/10/2025 15:28, Antoine Pitrou wrote: I don't think it's a "small price to pay". Parquet files are widely used to share or transfer data of all kinds (in a way, they replace CSV with much better characteristics). Sharing a single file is easy, sharing two related files while keeping their relationship intact is an order of magnitude more difficult. Regards Antoine. On Mon, 20 Oct 2025 12:23:20 +0100 Personal wrote: Apologies if this has already been discussed, but have we considered simply storing these flatbuffers as separate files alongside existing parquet files. I think this would have a number of quite compelling advantages: - no breaking format changes, all readers can continue to still read all parquet files - people can generate these "index" files for existing datasets without having to rewrite all their files - older and newer readers can coexist - no stop the world migrations - can potentially combine multiple flatbuffers into a single file for better IO when scanning collections of files - potentially very valuable for object stores, and would also help for people on HDFS and other systems that struggle with small files - could potentially even inline these flatbuffers into catalogs like iceberg - can continue to iterate at a faster rate, without the constraints of needing to move in lockstep with parquet versioning - potentially less confusing for users, parquet files are still the same, they just can be accelerated by these new index files This would mean some data duplication, but that seems a small price to pay, and would be strictly opt-in for users with use-cases that justify it? Kind Regards, Raphael On 20 October 2025 11:08:59 BST, Alkis Evlogimenos wrote: Thank you, these are interesting. Can you share instructions on how to reproduce the reported numbers? I am interested to review the code used to generate these results (esp the C++ thrift code) The numbers are based on internal code (Photon). They are not very far off >from https://github.com/apache/arrow/pull/43793. I will update that PR in the coming weeks so that we can repro the same benchmarks with open source code too. On Fri, Oct 17, 2025 at 5:52 PM Andrew Lamb wrote: Thanks Alkis, that is interesting data. We found that the reported numbers were not reproducible on AWS instances I just updated the benchmark results[1] to include results from AWS m6id.8xlarge instance (and they are indeed about 2x slower than when run on my 2023 Mac laptop) You can find the summary of our findings in a separate tab in the proposal document: Thank you, these are interesting. Can you share instructions on how to reproduce the reported numbers? I am interested to review the code used to generate these results (esp the C++ thrift code) Thanks Andrew [1]: https://github.com/alamb/parquet_footer_parsing?tab=readme-ov-file#results-on-linux On Fri, Oct 17, 2025 at 10:20 AM Alkis Evlogimenos wrote: Thank you Andrew for putting the code in open source so that we can repro it. We have run the rust benchmarks and also run the flatbuf proposal with our C++ thrift parser, the flatbuf footer with Thrift conversion, the flatbuf footer without Thrift conversion, and the flatbuf footer without Thrift conversion and without verification. You can find the summary of our findings in a separate tab in the proposal document: https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit?tab=t.ve65qknb3sq1#heading=h.3uwb5liauf1s The TLDR is that flatbuf is 5x faster with the Thrift conversion vs the optimized Thrift parsing. It also remains faster than the Thrift parser even if the Thrift parser skips statistics. Furthermore if Thrift conversion is skipped, the speedup is 50x, and if verification is skipped it goes beyond 150x. On Tue, Sep 30, 2025 at 5:56 PM Andrew Lamb wrote: Hello, I did some benchmarking for the new parser[2] we are working on in arrow-rs. This benchmark achieves nearly an order of magnitude improvement (7x) pars
Re: [DISCUSS] flatbuf footer
I don't think it's a "small price to pay". Parquet files are widely used to share or transfer data of all kinds (in a way, they replace CSV with much better characteristics). Sharing a single file is easy, sharing two related files while keeping their relationship intact is an order of magnitude more difficult. Regards Antoine. On Mon, 20 Oct 2025 12:23:20 +0100 Personal wrote: > Apologies if this has already been discussed, but have we considered simply > storing these flatbuffers as separate files alongside existing parquet files. > I think this would have a number of quite compelling advantages: > > - no breaking format changes, all readers can continue to still read all > parquet files > - people can generate these "index" files for existing datasets without > having to rewrite all their files > - older and newer readers can coexist - no stop the world migrations > - can potentially combine multiple flatbuffers into a single file for better > IO when scanning collections of files - potentially very valuable for object > stores, and would also help for people on HDFS and other systems that > struggle with small files > - could potentially even inline these flatbuffers into catalogs like iceberg > - can continue to iterate at a faster rate, without the constraints of > needing to move in lockstep with parquet versioning > - potentially less confusing for users, parquet files are still the same, > they just can be accelerated by these new index files > > This would mean some data duplication, but that seems a small price to pay, > and would be strictly opt-in for users with use-cases that justify it? > > Kind Regards, > > Raphael > > On 20 October 2025 11:08:59 BST, Alkis Evlogimenos > wrote: > >> > >> Thank you, these are interesting. Can you share instructions on how to > >> reproduce the reported numbers? I am interested to review the code used to > >> generate these results (esp the C++ thrift code) > > > > > >The numbers are based on internal code (Photon). They are not very far off > >from https://github.com/apache/arrow/pull/43793. I will update that PR in > >the coming weeks so that we can repro the same benchmarks with open source > >code too. > > > >On Fri, Oct 17, 2025 at 5:52 PM Andrew Lamb wrote: > > > >> Thanks Alkis, that is interesting data. > >> > >> > We found that the reported numbers were not reproducible on AWS > >> > instances > >> > >> I just updated the benchmark results[1] to include results from > >> AWS m6id.8xlarge instance (and they are indeed about 2x slower than when > >> run on my 2023 Mac laptop) > >> > >> > You can find the summary of our findings in a separate tab in the > >> proposal document: > >> > >> Thank you, these are interesting. Can you share instructions on how to > >> reproduce the reported numbers? I am interested to review the code used to > >> generate these results (esp the C++ thrift code) > >> > >> Thanks > >> Andrew > >> > >> > >> [1]: > >> > >> https://github.com/alamb/parquet_footer_parsing?tab=readme-ov-file#results-on-linux > >> > >> > >> On Fri, Oct 17, 2025 at 10:20 AM Alkis Evlogimenos > >> wrote: > >> > >> > Thank you Andrew for putting the code in open source so that we can repro > >> > it. > >> > > >> > We have run the rust benchmarks and also run the flatbuf proposal with > >> our > >> > C++ thrift parser, the flatbuf footer with Thrift conversion, the > >> > flatbuf footer without Thrift conversion, and the flatbuf footer > >> > without Thrift conversion and without verification. You can find the > >> > summary of our findings in a separate tab in the proposal document: > >> > > >> > > >> https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit?tab=t.ve65qknb3sq1#heading=h.3uwb5liauf1s > >> > >> > > >> > The TLDR is that flatbuf is 5x faster with the Thrift conversion vs the > >> > optimized Thrift parsing. It also remains faster than the Thrift parser > >> > even if the Thrift parser skips statistics. Furthermore if Thrift > >> > conversion is skipped, the speedup is 50x, and if verification is skipped > >> > it goes beyond 150x. > >> > > >> > > >> > On Tue, Sep 30, 2025 at 5:56 PM Andrew Lamb > >> > wrote: > >> > > >> > > Hello, > >> > > > >> > > I did some benchmarking for the new parser[2] we are working on in > >> > > arrow-rs. > >> > > > >> > > This benchmark achieves nearly an order of magnitude improvement (7x) > >> > > parsing Parquet metadata with no changes to the Parquet format, by > >> simply > >> > > writing a more efficient thrift decoder (which can also skip > >> statistics). > >> > > > >> > > While we have not implemented a similar decoder in other languages > >> > > such > >> > as > >> > > C/C++ or Java, given the similarities in the existing thrift libraries > >> > > > >> > and > >> > > usage, we expect similar improvements are possible in those languages > >> as > >> > > well. > >> > > > >> > > Here are some inline image
Re: [DISCUSS] flatbuf footer
> I don't see any issue here: https://github.com/apache/parquet-format/issues That is a good call -- I filed https://github.com/apache/parquet-format/issues/530 to track On Mon, Oct 20, 2025 at 8:17 AM Andrew Bell wrote: > On Mon, Oct 20, 2025 at 6:07 AM Alkis Evlogimenos > wrote: > > > Flatbuf parsing is trivial compared to thrift. Thrift walks the bytes of > > the serialized form and picks fields out of it one by one. Flatbuf > instead > > takes the serialized form and uses offsets already embedded in it to > > extract fields from the serialized form directly. In other words there is > > no parsing done. We have 3 ways to use the flatbuf each of which adds > more > > overhead > > > ... > > Maybe I was confused by this: > > This benchmark achieves nearly an order of magnitude improvement (7x) > > > > > parsing Parquet metadata with no changes to the Parquet format, by > > > simply > > > > > writing a more efficient thrift decoder (which can also skip > > > statistics). > > It was unclear to me if this was still about flatbuf or about writing > better thrift decoder. Is there a write-up describing exactly what's being > proposed? I don't see any issue here: > https://github.com/apache/parquet-format/issues > > -- > Andrew Bell > [email protected] >
Re: [DISCUSS] flatbuf footer
> Maybe I was confused by this: There are (at least) two parallel things going on: 1. Work in the Rust Parquet implementation to speed up the parsing of thrift footers (no change to Parquet format)[1][2] 2. A proposal to change the Parquet format to add a optional FlatBuffers based footer [3] [1]: https://github.com/apache/arrow-rs/issues/5854 [2]: https://github.com/alamb/parquet_footer_parsing [3]: https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit?tab=t.0#heading=h.ccu4zzsy0tm5 Andrew On Mon, Oct 20, 2025 at 8:17 AM Andrew Bell wrote: > On Mon, Oct 20, 2025 at 6:07 AM Alkis Evlogimenos > wrote: > > > Flatbuf parsing is trivial compared to thrift. Thrift walks the bytes of > > the serialized form and picks fields out of it one by one. Flatbuf > instead > > takes the serialized form and uses offsets already embedded in it to > > extract fields from the serialized form directly. In other words there is > > no parsing done. We have 3 ways to use the flatbuf each of which adds > more > > overhead > > > ... > > Maybe I was confused by this: > > This benchmark achieves nearly an order of magnitude improvement (7x) > > > > > parsing Parquet metadata with no changes to the Parquet format, by > > > simply > > > > > writing a more efficient thrift decoder (which can also skip > > > statistics). > > It was unclear to me if this was still about flatbuf or about writing > better thrift decoder. Is there a write-up describing exactly what's being > proposed? I don't see any issue here: > https://github.com/apache/parquet-format/issues > > -- > Andrew Bell > [email protected] >
Re: [DISCUSS] flatbuf footer
On Mon, Oct 20, 2025 at 6:07 AM Alkis Evlogimenos wrote: > Flatbuf parsing is trivial compared to thrift. Thrift walks the bytes of > the serialized form and picks fields out of it one by one. Flatbuf instead > takes the serialized form and uses offsets already embedded in it to > extract fields from the serialized form directly. In other words there is > no parsing done. We have 3 ways to use the flatbuf each of which adds more > overhead > ... Maybe I was confused by this: This benchmark achieves nearly an order of magnitude improvement (7x) > > > > parsing Parquet metadata with no changes to the Parquet format, by > > simply > > > > writing a more efficient thrift decoder (which can also skip > > statistics). It was unclear to me if this was still about flatbuf or about writing better thrift decoder. Is there a write-up describing exactly what's being proposed? I don't see any issue here: https://github.com/apache/parquet-format/issues -- Andrew Bell [email protected]
Re: [DISCUSS] flatbuf footer
Apologies if this has already been discussed, but have we considered simply storing these flatbuffers as separate files alongside existing parquet files. I think this would have a number of quite compelling advantages: - no breaking format changes, all readers can continue to still read all parquet files - people can generate these "index" files for existing datasets without having to rewrite all their files - older and newer readers can coexist - no stop the world migrations - can potentially combine multiple flatbuffers into a single file for better IO when scanning collections of files - potentially very valuable for object stores, and would also help for people on HDFS and other systems that struggle with small files - could potentially even inline these flatbuffers into catalogs like iceberg - can continue to iterate at a faster rate, without the constraints of needing to move in lockstep with parquet versioning - potentially less confusing for users, parquet files are still the same, they just can be accelerated by these new index files This would mean some data duplication, but that seems a small price to pay, and would be strictly opt-in for users with use-cases that justify it? Kind Regards, Raphael On 20 October 2025 11:08:59 BST, Alkis Evlogimenos wrote: >> >> Thank you, these are interesting. Can you share instructions on how to >> reproduce the reported numbers? I am interested to review the code used to >> generate these results (esp the C++ thrift code) > > >The numbers are based on internal code (Photon). They are not very far off >from https://github.com/apache/arrow/pull/43793. I will update that PR in >the coming weeks so that we can repro the same benchmarks with open source >code too. > >On Fri, Oct 17, 2025 at 5:52 PM Andrew Lamb wrote: > >> Thanks Alkis, that is interesting data. >> >> > We found that the reported numbers were not reproducible on AWS instances >> >> I just updated the benchmark results[1] to include results from >> AWS m6id.8xlarge instance (and they are indeed about 2x slower than when >> run on my 2023 Mac laptop) >> >> > You can find the summary of our findings in a separate tab in the >> proposal document: >> >> Thank you, these are interesting. Can you share instructions on how to >> reproduce the reported numbers? I am interested to review the code used to >> generate these results (esp the C++ thrift code) >> >> Thanks >> Andrew >> >> >> [1]: >> >> https://github.com/alamb/parquet_footer_parsing?tab=readme-ov-file#results-on-linux >> >> >> On Fri, Oct 17, 2025 at 10:20 AM Alkis Evlogimenos >> wrote: >> >> > Thank you Andrew for putting the code in open source so that we can repro >> > it. >> > >> > We have run the rust benchmarks and also run the flatbuf proposal with >> our >> > C++ thrift parser, the flatbuf footer with Thrift conversion, the >> > flatbuf footer without Thrift conversion, and the flatbuf footer >> > without Thrift conversion and without verification. You can find the >> > summary of our findings in a separate tab in the proposal document: >> > >> > >> https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit?tab=t.ve65qknb3sq1#heading=h.3uwb5liauf1s >> > >> > The TLDR is that flatbuf is 5x faster with the Thrift conversion vs the >> > optimized Thrift parsing. It also remains faster than the Thrift parser >> > even if the Thrift parser skips statistics. Furthermore if Thrift >> > conversion is skipped, the speedup is 50x, and if verification is skipped >> > it goes beyond 150x. >> > >> > >> > On Tue, Sep 30, 2025 at 5:56 PM Andrew Lamb >> > wrote: >> > >> > > Hello, >> > > >> > > I did some benchmarking for the new parser[2] we are working on in >> > > arrow-rs. >> > > >> > > This benchmark achieves nearly an order of magnitude improvement (7x) >> > > parsing Parquet metadata with no changes to the Parquet format, by >> simply >> > > writing a more efficient thrift decoder (which can also skip >> statistics). >> > > >> > > While we have not implemented a similar decoder in other languages such >> > as >> > > C/C++ or Java, given the similarities in the existing thrift libraries >> > and >> > > usage, we expect similar improvements are possible in those languages >> as >> > > well. >> > > >> > > Here are some inline images: >> > > [image: image.png] >> > > [image: image.png] >> > > >> > > >> > > You can find full details here [1] >> > > >> > > Andrew >> > > >> > > >> > > [1]: https://github.com/alamb/parquet_footer_parsing >> > > [2]: https://github.com/apache/arrow-rs/issues/5854 >> > > >> > > >> > > On Wed, Sep 24, 2025 at 5:59 PM Ed Seidl wrote: >> > > >> > >> > Concerning Thrift optimization, while a 2-3x improvement might be >> > >> > achievable, Flatbuffers are currently demonstrating a 10x >> improvement. >> > >> > Andrew, do you have a more precise estimate for the speedup we could >> > >> expect >> > >> > in C++? >> > >> >> > >> Given my past experience on cuDF, I'd estimate about 2X th
Re: [DISCUSS] flatbuf footer
> > Thank you, these are interesting. Can you share instructions on how to > reproduce the reported numbers? I am interested to review the code used to > generate these results (esp the C++ thrift code) The numbers are based on internal code (Photon). They are not very far off from https://github.com/apache/arrow/pull/43793. I will update that PR in the coming weeks so that we can repro the same benchmarks with open source code too. On Fri, Oct 17, 2025 at 5:52 PM Andrew Lamb wrote: > Thanks Alkis, that is interesting data. > > > We found that the reported numbers were not reproducible on AWS instances > > I just updated the benchmark results[1] to include results from > AWS m6id.8xlarge instance (and they are indeed about 2x slower than when > run on my 2023 Mac laptop) > > > You can find the summary of our findings in a separate tab in the > proposal document: > > Thank you, these are interesting. Can you share instructions on how to > reproduce the reported numbers? I am interested to review the code used to > generate these results (esp the C++ thrift code) > > Thanks > Andrew > > > [1]: > > https://github.com/alamb/parquet_footer_parsing?tab=readme-ov-file#results-on-linux > > > On Fri, Oct 17, 2025 at 10:20 AM Alkis Evlogimenos > wrote: > > > Thank you Andrew for putting the code in open source so that we can repro > > it. > > > > We have run the rust benchmarks and also run the flatbuf proposal with > our > > C++ thrift parser, the flatbuf footer with Thrift conversion, the > > flatbuf footer without Thrift conversion, and the flatbuf footer > > without Thrift conversion and without verification. You can find the > > summary of our findings in a separate tab in the proposal document: > > > > > https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit?tab=t.ve65qknb3sq1#heading=h.3uwb5liauf1s > > > > The TLDR is that flatbuf is 5x faster with the Thrift conversion vs the > > optimized Thrift parsing. It also remains faster than the Thrift parser > > even if the Thrift parser skips statistics. Furthermore if Thrift > > conversion is skipped, the speedup is 50x, and if verification is skipped > > it goes beyond 150x. > > > > > > On Tue, Sep 30, 2025 at 5:56 PM Andrew Lamb > > wrote: > > > > > Hello, > > > > > > I did some benchmarking for the new parser[2] we are working on in > > > arrow-rs. > > > > > > This benchmark achieves nearly an order of magnitude improvement (7x) > > > parsing Parquet metadata with no changes to the Parquet format, by > simply > > > writing a more efficient thrift decoder (which can also skip > statistics). > > > > > > While we have not implemented a similar decoder in other languages such > > as > > > C/C++ or Java, given the similarities in the existing thrift libraries > > and > > > usage, we expect similar improvements are possible in those languages > as > > > well. > > > > > > Here are some inline images: > > > [image: image.png] > > > [image: image.png] > > > > > > > > > You can find full details here [1] > > > > > > Andrew > > > > > > > > > [1]: https://github.com/alamb/parquet_footer_parsing > > > [2]: https://github.com/apache/arrow-rs/issues/5854 > > > > > > > > > On Wed, Sep 24, 2025 at 5:59 PM Ed Seidl wrote: > > > > > >> > Concerning Thrift optimization, while a 2-3x improvement might be > > >> > achievable, Flatbuffers are currently demonstrating a 10x > improvement. > > >> > Andrew, do you have a more precise estimate for the speedup we could > > >> expect > > >> > in C++? > > >> > > >> Given my past experience on cuDF, I'd estimate about 2X there as well. > > >> cuDF has it's own metadata parser that I once benchmarked against the > > >> thrift generated parser. > > >> > > >> And I'd point out that beyond the initial 2X improvement, rolling your > > >> own parser frees you of having to parse out every structure in the > > metadata. > > >> > > > > > >
Re: [DISCUSS] flatbuf footer
Flatbuf parsing is trivial compared to thrift. Thrift walks the bytes of the serialized form and picks fields out of it one by one. Flatbuf instead takes the serialized form and uses offsets already embedded in it to extract fields from the serialized form directly. In other words there is no parsing done. We have 3 ways to use the flatbuf each of which adds more overhead: 1. raw flatbuf without postprocessing: +150x speedup 2. verified flatbuf: +50x speedup. Verified flatbuf means that before any flatbuf access, all offsets are bounds check to not cause memory accesses outside the flatbuf encoded blob 3. verified flatbuf + conversion to FileMetadata struct generated by thrift compiler: +5x speedup. This is the easy migration path for most engines where we take flatbuf, verify it and then put together the same FileMetadata struct that would come out of thrift parser had we parsed the thrift representation. On Fri, Oct 17, 2025 at 5:19 PM Andrew Bell wrote: > On Fri, Oct 17, 2025 at 10:20 AM Alkis Evlogimenos > wrote: > > > Thank you Andrew for putting the code in open source so that we can repro > > it. > > > > The TLDR is that flatbuf is 5x faster with the Thrift conversion vs the > > optimized Thrift parsing. It also remains faster than the Thrift parser > > even if the Thrift parser skips statistics. Furthermore if Thrift > > conversion is skipped, the speedup is 50x, and if verification is skipped > > it goes beyond 150x. > > > Can you explain a bit the differences/changes in the parser that provides > such a speedup? > > -- > Andrew Bell > [email protected] >
Re: [DISCUSS] flatbuf footer
We have to deal with very wide files (up to million columns) The approach we took is very similar to flatbuffer metadata + skiplist Seeing this happening in parquet open interesting possibilities This is explained in this blog: Husky: Efficient compaction at Datadog scale | Datadog https://www.datadoghq.com/blog/engineering/husky-storage-compaction/ https://imgix.datadoghq.com/img/blog/engineering/husky-storage-compaction/compaction_static_diagram_2_rev.png On Sat, Oct 18, 2025, 9:58 PM Ed Seidl wrote: > Of course there's nothing to preclude adding just such an index to the > current format. > > On 2025/10/17 22:10:36 Corwin Joy wrote: > > For us, the exciting thing about the flatbuf footer approach is the > > potential for fast random access. For wide tables, the metadata becomes > > huge, and there is a lot of overhead to access a particular rowgroup. > (See > > previous discussions, e.g., https://github.com/apache/arrow/issues/38149 > ). > > Even if we can get a faster thrift parser, this is still limited, because > > you have to parse the entire metadata, which is inherently slow. Pulling > > information for a selected rowgroup is a lot faster. > > Right now, we have a workaround: we create an external index to get fast > > random access. (https://github.com/G-Research/PalletJack). But, having a > > fast internal random access index like the proposed flatbuf footer would > be > > a big step forward. > > > > On Fri, Oct 17, 2025 at 8:50 AM Andrew Lamb > wrote: > > > > > Thanks Alkis, that is interesting data. > > > > > > > We found that the reported numbers were not reproducible on AWS > instances > > > > > > I just updated the benchmark results[1] to include results from > > > AWS m6id.8xlarge instance (and they are indeed about 2x slower than > when > > > run on my 2023 Mac laptop) > > > > > > > You can find the summary of our findings in a separate tab in the > > > proposal document: > > > > > > Thank you, these are interesting. Can you share instructions on how to > > > reproduce the reported numbers? I am interested to review the code > used to > > > generate these results (esp the C++ thrift code) > > > > > > Thanks > > > Andrew > > > > > > > > > [1]: > > > > > > > https://github.com/alamb/parquet_footer_parsing?tab=readme-ov-file#results-on-linux > > > > > > > > > On Fri, Oct 17, 2025 at 10:20 AM Alkis Evlogimenos > > > wrote: > > > > > > > Thank you Andrew for putting the code in open source so that we can > repro > > > > it. > > > > > > > > We have run the rust benchmarks and also run the flatbuf proposal > with > > > our > > > > C++ thrift parser, the flatbuf footer with Thrift conversion, the > > > > flatbuf footer without Thrift conversion, and the flatbuf footer > > > > without Thrift conversion and without verification. You can find the > > > > summary of our findings in a separate tab in the proposal document: > > > > > > > > > > > > https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit?tab=t.ve65qknb3sq1#heading=h.3uwb5liauf1s > > > > > > > > The TLDR is that flatbuf is 5x faster with the Thrift conversion vs > the > > > > optimized Thrift parsing. It also remains faster than the Thrift > parser > > > > even if the Thrift parser skips statistics. Furthermore if Thrift > > > > conversion is skipped, the speedup is 50x, and if verification is > skipped > > > > it goes beyond 150x. > > > > > > > > > > > > On Tue, Sep 30, 2025 at 5:56 PM Andrew Lamb > > > > wrote: > > > > > > > > > Hello, > > > > > > > > > > I did some benchmarking for the new parser[2] we are working on in > > > > > arrow-rs. > > > > > > > > > > This benchmark achieves nearly an order of magnitude improvement > (7x) > > > > > parsing Parquet metadata with no changes to the Parquet format, by > > > simply > > > > > writing a more efficient thrift decoder (which can also skip > > > statistics). > > > > > > > > > > While we have not implemented a similar decoder in other languages > such > > > > as > > > > > C/C++ or Java, given the similarities in the existing thrift > libraries > > > > and > > > > > usage, we expect similar improvements are possible in those > languages > > > as > > > > > well. > > > > > > > > > > Here are some inline images: > > > > > [image: image.png] > > > > > [image: image.png] > > > > > > > > > > > > > > > You can find full details here [1] > > > > > > > > > > Andrew > > > > > > > > > > > > > > > [1]: https://github.com/alamb/parquet_footer_parsing > > > > > [2]: https://github.com/apache/arrow-rs/issues/5854 > > > > > > > > > > > > > > > On Wed, Sep 24, 2025 at 5:59 PM Ed Seidl > wrote: > > > > > > > > > >> > Concerning Thrift optimization, while a 2-3x improvement might > be > > > > >> > achievable, Flatbuffers are currently demonstrating a 10x > > > improvement. > > > > >> > Andrew, do you have a more precise estimate for the speedup we > could > > > > >> expect > > > > >> > in C++? > > > > >> > > > > >> Given my past experience on cuDF, I'd estimate about
Re: [DISCUSS] flatbuf footer
Of course there's nothing to preclude adding just such an index to the current format. On 2025/10/17 22:10:36 Corwin Joy wrote: > For us, the exciting thing about the flatbuf footer approach is the > potential for fast random access. For wide tables, the metadata becomes > huge, and there is a lot of overhead to access a particular rowgroup. (See > previous discussions, e.g., https://github.com/apache/arrow/issues/38149). > Even if we can get a faster thrift parser, this is still limited, because > you have to parse the entire metadata, which is inherently slow. Pulling > information for a selected rowgroup is a lot faster. > Right now, we have a workaround: we create an external index to get fast > random access. (https://github.com/G-Research/PalletJack). But, having a > fast internal random access index like the proposed flatbuf footer would be > a big step forward. > > On Fri, Oct 17, 2025 at 8:50 AM Andrew Lamb wrote: > > > Thanks Alkis, that is interesting data. > > > > > We found that the reported numbers were not reproducible on AWS instances > > > > I just updated the benchmark results[1] to include results from > > AWS m6id.8xlarge instance (and they are indeed about 2x slower than when > > run on my 2023 Mac laptop) > > > > > You can find the summary of our findings in a separate tab in the > > proposal document: > > > > Thank you, these are interesting. Can you share instructions on how to > > reproduce the reported numbers? I am interested to review the code used to > > generate these results (esp the C++ thrift code) > > > > Thanks > > Andrew > > > > > > [1]: > > > > https://github.com/alamb/parquet_footer_parsing?tab=readme-ov-file#results-on-linux > > > > > > On Fri, Oct 17, 2025 at 10:20 AM Alkis Evlogimenos > > wrote: > > > > > Thank you Andrew for putting the code in open source so that we can repro > > > it. > > > > > > We have run the rust benchmarks and also run the flatbuf proposal with > > our > > > C++ thrift parser, the flatbuf footer with Thrift conversion, the > > > flatbuf footer without Thrift conversion, and the flatbuf footer > > > without Thrift conversion and without verification. You can find the > > > summary of our findings in a separate tab in the proposal document: > > > > > > > > https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit?tab=t.ve65qknb3sq1#heading=h.3uwb5liauf1s > > > > > > The TLDR is that flatbuf is 5x faster with the Thrift conversion vs the > > > optimized Thrift parsing. It also remains faster than the Thrift parser > > > even if the Thrift parser skips statistics. Furthermore if Thrift > > > conversion is skipped, the speedup is 50x, and if verification is skipped > > > it goes beyond 150x. > > > > > > > > > On Tue, Sep 30, 2025 at 5:56 PM Andrew Lamb > > > wrote: > > > > > > > Hello, > > > > > > > > I did some benchmarking for the new parser[2] we are working on in > > > > arrow-rs. > > > > > > > > This benchmark achieves nearly an order of magnitude improvement (7x) > > > > parsing Parquet metadata with no changes to the Parquet format, by > > simply > > > > writing a more efficient thrift decoder (which can also skip > > statistics). > > > > > > > > While we have not implemented a similar decoder in other languages such > > > as > > > > C/C++ or Java, given the similarities in the existing thrift libraries > > > and > > > > usage, we expect similar improvements are possible in those languages > > as > > > > well. > > > > > > > > Here are some inline images: > > > > [image: image.png] > > > > [image: image.png] > > > > > > > > > > > > You can find full details here [1] > > > > > > > > Andrew > > > > > > > > > > > > [1]: https://github.com/alamb/parquet_footer_parsing > > > > [2]: https://github.com/apache/arrow-rs/issues/5854 > > > > > > > > > > > > On Wed, Sep 24, 2025 at 5:59 PM Ed Seidl wrote: > > > > > > > >> > Concerning Thrift optimization, while a 2-3x improvement might be > > > >> > achievable, Flatbuffers are currently demonstrating a 10x > > improvement. > > > >> > Andrew, do you have a more precise estimate for the speedup we could > > > >> expect > > > >> > in C++? > > > >> > > > >> Given my past experience on cuDF, I'd estimate about 2X there as well. > > > >> cuDF has it's own metadata parser that I once benchmarked against the > > > >> thrift generated parser. > > > >> > > > >> And I'd point out that beyond the initial 2X improvement, rolling your > > > >> own parser frees you of having to parse out every structure in the > > > metadata. > > > >> > > > > > > > > > >
Re: [DISCUSS] flatbuf footer
Hello, I did some benchmarking for the new parser[2] we are working on in arrow-rs. This benchmark achieves nearly an order of magnitude improvement (7x) parsing Parquet metadata with no changes to the Parquet format, by simply writing a more efficient thrift decoder (which can also skip statistics). While we have not implemented a similar decoder in other languages such as C/C++ or Java, given the similarities in the existing thrift libraries and usage, we expect similar improvements are possible in those languages as well. Here are some inline images: [image: image.png] [image: image.png] You can find full details here [1] Andrew [1]: https://github.com/alamb/parquet_footer_parsing [2]: https://github.com/apache/arrow-rs/issues/5854 On Wed, Sep 24, 2025 at 5:59 PM Ed Seidl wrote: > > Concerning Thrift optimization, while a 2-3x improvement might be > > achievable, Flatbuffers are currently demonstrating a 10x improvement. > > Andrew, do you have a more precise estimate for the speedup we could > expect > > in C++? > > Given my past experience on cuDF, I'd estimate about 2X there as well. > cuDF has it's own metadata parser that I once benchmarked against the > thrift generated parser. > > And I'd point out that beyond the initial 2X improvement, rolling your own > parser frees you of having to parse out every structure in the metadata. >
Re: [DISCUSS] flatbuf footer: offsets
Hi Alkis Thanks for all your work on this proposal. I'd be in favour of keeping the offsets as i64 and not reducing the maximum row group size, even if this results in slightly larger footers. I've heard from some of our users within G-Research that they do have files with row groups > 2 GiB. This is often when they use lower-level APIs to write Parquet that don't automatically split data into row groups, and they either write a single row group for simplicity or have some logical partitioning of data into row groups. They might also have wide tables with many columns, or wide array/tensor valued columns that lead to large row groups. In many workflows we don't read Parquet with a query engine that supports filters and skipping row groups, but just read all rows, or directly specify the row groups to read if there is some known logical partitioning into row groups. I'm sure we could work around a 2 or 4 GiB row group size limitation if we had to, but it's a new constraint that reduces the flexibility of the format and makes more work for users who now need to ensure they don't hit this limit. Do you have any measurements of how much of a difference 4 byte offsets make to footer sizes in your data, with and without the optional LZ4 compression? Thanks, Adam On Tue, 14 Oct 2025 at 21:02, Alkis Evlogimenos wrote: > Hi all, > > From the comments on the [EXTERNAL] Parquet metadata > < > https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit?tab=t.0 > > > document, > it appears there's a general consensus on most aspects, with the exception > of the relative 32-bit offsets for column chunks. > > I'm starting this thread to discuss this topic further and work towards a > resolution. Adam Reeve suggested raising the limitation to 2^32, and he > confirmed that Java does not have any issues with this. I am open to this > change as it increases the limit without introducing any drawbacks. > > However, some still feel that a 2^32-byte limit for a row group is too > restrictive. I'd like to understand these specific use cases better. From > my perspective, for most engines, the row group is the primary unit of > skipping, making very large row groups less desirable. In our fleet's > workloads, it's rare to see row groups larger than 100MB, as anything > larger tends to make statistics-based skipping ineffective. > > Cheers, >
Re: [DISCUSS] flatbuf footer: offsets
Hi Alkis, one more very simple argument why you want these offsets to be i64: What if you want to store a single value larger than 4GB? I know this sounds absurd at first, but some use cases might want to store data that can sometimes be very large (e.g. blob data, or insanely complex geo data). And it would be a shame if that would mean that they cannot use Parquet at all. Thus, my opinion here is that we can limit to i32 all fields that the file writer has under control, e.g., the number of rows within a row group, but we shouldn't limit any values that a file writer doesn't have under control, as they fully depend on the input data. Note though that this means that the number of values in a column chunk could also exceed i32, if a user has nested data with more than 4 billion entries. With such data, the file writer again couldn't do anything to avoid writing a row group with more than i32 values, as a single row may not span multiple row groups. That being said, I think that nested data with more than 4 billion entries is less likely than a single large blob of 4 billion bytes. I know that smaller row groups is what most / all engines prefer, but we have to make sure the format also works for edge cases. Cheers, Jan Am Mi., 15. Okt. 2025 um 05:05 Uhr schrieb Adam Reeve : > Hi Alkis > > Thanks for all your work on this proposal. > > I'd be in favour of keeping the offsets as i64 and not reducing the maximum > row group size, even if this results in slightly larger footers. I've heard > from some of our users within G-Research that they do have files with row > groups > 2 GiB. This is often when they use lower-level APIs to write > Parquet that don't automatically split data into row groups, and they > either write a single row group for simplicity or have some logical > partitioning of data into row groups. They might also have wide tables with > many columns, or wide array/tensor valued columns that lead to large row > groups. > > In many workflows we don't read Parquet with a query engine that supports > filters and skipping row groups, but just read all rows, or directly > specify the row groups to read if there is some known logical partitioning > into row groups. I'm sure we could work around a 2 or 4 GiB row group size > limitation if we had to, but it's a new constraint that reduces the > flexibility of the format and makes more work for users who now need to > ensure they don't hit this limit. > > Do you have any measurements of how much of a difference 4 byte offsets > make to footer sizes in your data, with and without the optional LZ4 > compression? > > Thanks, > Adam > > On Tue, 14 Oct 2025 at 21:02, Alkis Evlogimenos > wrote: > > > Hi all, > > > > From the comments on the [EXTERNAL] Parquet metadata > > < > > > https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit?tab=t.0 > > > > > document, > > it appears there's a general consensus on most aspects, with the > exception > > of the relative 32-bit offsets for column chunks. > > > > I'm starting this thread to discuss this topic further and work towards a > > resolution. Adam Reeve suggested raising the limitation to 2^32, and he > > confirmed that Java does not have any issues with this. I am open to this > > change as it increases the limit without introducing any drawbacks. > > > > However, some still feel that a 2^32-byte limit for a row group is too > > restrictive. I'd like to understand these specific use cases better. From > > my perspective, for most engines, the row group is the primary unit of > > skipping, making very large row groups less desirable. In our fleet's > > workloads, it's rare to see row groups larger than 100MB, as anything > > larger tends to make statistics-based skipping ineffective. > > > > Cheers, > > >
Re: [DISCUSS] flatbuf footer
For us, the exciting thing about the flatbuf footer approach is the potential for fast random access. For wide tables, the metadata becomes huge, and there is a lot of overhead to access a particular rowgroup. (See previous discussions, e.g., https://github.com/apache/arrow/issues/38149). Even if we can get a faster thrift parser, this is still limited, because you have to parse the entire metadata, which is inherently slow. Pulling information for a selected rowgroup is a lot faster. Right now, we have a workaround: we create an external index to get fast random access. (https://github.com/G-Research/PalletJack). But, having a fast internal random access index like the proposed flatbuf footer would be a big step forward. On Fri, Oct 17, 2025 at 8:50 AM Andrew Lamb wrote: > Thanks Alkis, that is interesting data. > > > We found that the reported numbers were not reproducible on AWS instances > > I just updated the benchmark results[1] to include results from > AWS m6id.8xlarge instance (and they are indeed about 2x slower than when > run on my 2023 Mac laptop) > > > You can find the summary of our findings in a separate tab in the > proposal document: > > Thank you, these are interesting. Can you share instructions on how to > reproduce the reported numbers? I am interested to review the code used to > generate these results (esp the C++ thrift code) > > Thanks > Andrew > > > [1]: > > https://github.com/alamb/parquet_footer_parsing?tab=readme-ov-file#results-on-linux > > > On Fri, Oct 17, 2025 at 10:20 AM Alkis Evlogimenos > wrote: > > > Thank you Andrew for putting the code in open source so that we can repro > > it. > > > > We have run the rust benchmarks and also run the flatbuf proposal with > our > > C++ thrift parser, the flatbuf footer with Thrift conversion, the > > flatbuf footer without Thrift conversion, and the flatbuf footer > > without Thrift conversion and without verification. You can find the > > summary of our findings in a separate tab in the proposal document: > > > > > https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit?tab=t.ve65qknb3sq1#heading=h.3uwb5liauf1s > > > > The TLDR is that flatbuf is 5x faster with the Thrift conversion vs the > > optimized Thrift parsing. It also remains faster than the Thrift parser > > even if the Thrift parser skips statistics. Furthermore if Thrift > > conversion is skipped, the speedup is 50x, and if verification is skipped > > it goes beyond 150x. > > > > > > On Tue, Sep 30, 2025 at 5:56 PM Andrew Lamb > > wrote: > > > > > Hello, > > > > > > I did some benchmarking for the new parser[2] we are working on in > > > arrow-rs. > > > > > > This benchmark achieves nearly an order of magnitude improvement (7x) > > > parsing Parquet metadata with no changes to the Parquet format, by > simply > > > writing a more efficient thrift decoder (which can also skip > statistics). > > > > > > While we have not implemented a similar decoder in other languages such > > as > > > C/C++ or Java, given the similarities in the existing thrift libraries > > and > > > usage, we expect similar improvements are possible in those languages > as > > > well. > > > > > > Here are some inline images: > > > [image: image.png] > > > [image: image.png] > > > > > > > > > You can find full details here [1] > > > > > > Andrew > > > > > > > > > [1]: https://github.com/alamb/parquet_footer_parsing > > > [2]: https://github.com/apache/arrow-rs/issues/5854 > > > > > > > > > On Wed, Sep 24, 2025 at 5:59 PM Ed Seidl wrote: > > > > > >> > Concerning Thrift optimization, while a 2-3x improvement might be > > >> > achievable, Flatbuffers are currently demonstrating a 10x > improvement. > > >> > Andrew, do you have a more precise estimate for the speedup we could > > >> expect > > >> > in C++? > > >> > > >> Given my past experience on cuDF, I'd estimate about 2X there as well. > > >> cuDF has it's own metadata parser that I once benchmarked against the > > >> thrift generated parser. > > >> > > >> And I'd point out that beyond the initial 2X improvement, rolling your > > >> own parser frees you of having to parse out every structure in the > > metadata. > > >> > > > > > >
Re: [DISCUSS] flatbuf footer
Thanks Alkis, that is interesting data. > We found that the reported numbers were not reproducible on AWS instances I just updated the benchmark results[1] to include results from AWS m6id.8xlarge instance (and they are indeed about 2x slower than when run on my 2023 Mac laptop) > You can find the summary of our findings in a separate tab in the proposal document: Thank you, these are interesting. Can you share instructions on how to reproduce the reported numbers? I am interested to review the code used to generate these results (esp the C++ thrift code) Thanks Andrew [1]: https://github.com/alamb/parquet_footer_parsing?tab=readme-ov-file#results-on-linux On Fri, Oct 17, 2025 at 10:20 AM Alkis Evlogimenos wrote: > Thank you Andrew for putting the code in open source so that we can repro > it. > > We have run the rust benchmarks and also run the flatbuf proposal with our > C++ thrift parser, the flatbuf footer with Thrift conversion, the > flatbuf footer without Thrift conversion, and the flatbuf footer > without Thrift conversion and without verification. You can find the > summary of our findings in a separate tab in the proposal document: > > https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit?tab=t.ve65qknb3sq1#heading=h.3uwb5liauf1s > > The TLDR is that flatbuf is 5x faster with the Thrift conversion vs the > optimized Thrift parsing. It also remains faster than the Thrift parser > even if the Thrift parser skips statistics. Furthermore if Thrift > conversion is skipped, the speedup is 50x, and if verification is skipped > it goes beyond 150x. > > > On Tue, Sep 30, 2025 at 5:56 PM Andrew Lamb > wrote: > > > Hello, > > > > I did some benchmarking for the new parser[2] we are working on in > > arrow-rs. > > > > This benchmark achieves nearly an order of magnitude improvement (7x) > > parsing Parquet metadata with no changes to the Parquet format, by simply > > writing a more efficient thrift decoder (which can also skip statistics). > > > > While we have not implemented a similar decoder in other languages such > as > > C/C++ or Java, given the similarities in the existing thrift libraries > and > > usage, we expect similar improvements are possible in those languages as > > well. > > > > Here are some inline images: > > [image: image.png] > > [image: image.png] > > > > > > You can find full details here [1] > > > > Andrew > > > > > > [1]: https://github.com/alamb/parquet_footer_parsing > > [2]: https://github.com/apache/arrow-rs/issues/5854 > > > > > > On Wed, Sep 24, 2025 at 5:59 PM Ed Seidl wrote: > > > >> > Concerning Thrift optimization, while a 2-3x improvement might be > >> > achievable, Flatbuffers are currently demonstrating a 10x improvement. > >> > Andrew, do you have a more precise estimate for the speedup we could > >> expect > >> > in C++? > >> > >> Given my past experience on cuDF, I'd estimate about 2X there as well. > >> cuDF has it's own metadata parser that I once benchmarked against the > >> thrift generated parser. > >> > >> And I'd point out that beyond the initial 2X improvement, rolling your > >> own parser frees you of having to parse out every structure in the > metadata. > >> > > >
Re: [DISCUSS] flatbuf footer
On Fri, Oct 17, 2025 at 10:20 AM Alkis Evlogimenos wrote: > Thank you Andrew for putting the code in open source so that we can repro > it. > > The TLDR is that flatbuf is 5x faster with the Thrift conversion vs the > optimized Thrift parsing. It also remains faster than the Thrift parser > even if the Thrift parser skips statistics. Furthermore if Thrift > conversion is skipped, the speedup is 50x, and if verification is skipped > it goes beyond 150x. Can you explain a bit the differences/changes in the parser that provides such a speedup? -- Andrew Bell [email protected]
Re: [DISCUSS] flatbuf footer
Thank you Andrew for putting the code in open source so that we can repro it. We have run the rust benchmarks and also run the flatbuf proposal with our C++ thrift parser, the flatbuf footer with Thrift conversion, the flatbuf footer without Thrift conversion, and the flatbuf footer without Thrift conversion and without verification. You can find the summary of our findings in a separate tab in the proposal document: https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit?tab=t.ve65qknb3sq1#heading=h.3uwb5liauf1s The TLDR is that flatbuf is 5x faster with the Thrift conversion vs the optimized Thrift parsing. It also remains faster than the Thrift parser even if the Thrift parser skips statistics. Furthermore if Thrift conversion is skipped, the speedup is 50x, and if verification is skipped it goes beyond 150x. On Tue, Sep 30, 2025 at 5:56 PM Andrew Lamb wrote: > Hello, > > I did some benchmarking for the new parser[2] we are working on in > arrow-rs. > > This benchmark achieves nearly an order of magnitude improvement (7x) > parsing Parquet metadata with no changes to the Parquet format, by simply > writing a more efficient thrift decoder (which can also skip statistics). > > While we have not implemented a similar decoder in other languages such as > C/C++ or Java, given the similarities in the existing thrift libraries and > usage, we expect similar improvements are possible in those languages as > well. > > Here are some inline images: > [image: image.png] > [image: image.png] > > > You can find full details here [1] > > Andrew > > > [1]: https://github.com/alamb/parquet_footer_parsing > [2]: https://github.com/apache/arrow-rs/issues/5854 > > > On Wed, Sep 24, 2025 at 5:59 PM Ed Seidl wrote: > >> > Concerning Thrift optimization, while a 2-3x improvement might be >> > achievable, Flatbuffers are currently demonstrating a 10x improvement. >> > Andrew, do you have a more precise estimate for the speedup we could >> expect >> > in C++? >> >> Given my past experience on cuDF, I'd estimate about 2X there as well. >> cuDF has it's own metadata parser that I once benchmarked against the >> thrift generated parser. >> >> And I'd point out that beyond the initial 2X improvement, rolling your >> own parser frees you of having to parse out every structure in the metadata. >> >
Re: [DISCUSS] flatbuf footer
> Andrew, do you have a more precise estimate for the speedup we could expect in C++? I do not yet, but I will try and find out. I have filed an issue[1] to track the question / will try and enlist some help. It will be fun to benchmaxx our new parser Andrew [1]: https://github.com/apache/arrow-rs/issues/8441 On Wed, Sep 24, 2025 at 6:38 AM Alkis Evlogimenos wrote: > Thank you all for taking the time to go through the doc and your feedback. > I'd like to address some of the key points raised: > > Regarding nested Flatbuffers, there's no parsing benefit to using them. In > the current prototype, approximately two-thirds of the decoding cost comes > from converting the Flatbuffer to `FileMetadata` (the Thrift object) to > simplify the rollout process. Even with this conversion, we're observing a > greater than 10x improvement in footer decoding time for footers that > perform poorly with Thrift (at the p999 percentile). Removing the > `FileMetadata` translation should easily provide another 2x speedup. > > Concerning Thrift optimization, while a 2-3x improvement might be > achievable, Flatbuffers are currently demonstrating a 10x improvement. > Andrew, do you have a more precise estimate for the speedup we could expect > in C++? It's also important to note that Thrift's format does not allow for > random access, meaning we will always have to parse the entire footer, > regardless of which columns are requested. > > I will work on getting numbers for LZ4 compressed versus raw footers, but > please be aware that this will take some time. > > Finally, the 32-bit narrowing of row group sizes appears to be the most > contentious aspect of the design. I suggest we discuss this live during our > next Parquet sync. For the record, shrinking the offsets is the second most > significant optimization for Flatbuffer footer size, with statistics being > the first. > > See you all in the next sync. > > > On Wed, Sep 17, 2025 at 10:02 AM Antoine Pitrou > wrote: > > > > > Hi Andrew, > > > > I haven't heard of anything like this for C++, but it is an intriguing > > idea. > > > > Regards > > > > Antoine. > > > > > > On Tue, 16 Sep 2025 16:44:14 -0400 > > Andrew Lamb > > wrote: > > > Has anyone spent time optimizing the thrift decoder (e.g. not just use > > > whatever a general purpose thrift compiler generates, but custom code a > > > parser just for Parquet metadata)? > > > > > > Ed is in the process of implementing just such a decoder in arrow-rs[1] > > and > > > has seen a 2-3x performance improvement (with no change to the format) > in > > > early benchmark results. This is inline with our earlier work on the > > > topic[2] where we estimated there is a 2-4x performance improvement > with > > > implementation improvements alone. > > > > > > Andrew > > > > > > [1]: https://github.com/apache/arrow-rs/issues/5854 > > > [2]: https://www.influxdata.com/blog/how-good-parquet-wide-tables/ > > > > > > On Tue, Sep 16, 2025 at 4:12 AM Antoine Pitrou < > > [email protected]> wrote: > > > > > > > > > > > Hi again, > > > > > > > > Ok, a quick summary of my current feedback on this: > > > > > > > > - decoding speed measurements are given, but not footer size > > > > measurements; it would be interesting to have both > > > > > > > > - it's not obvious whether the stated numbers are for reading all > > > > columns or a subset of them > > > > > > > > - optional LZ4 compression is mentioned, but no numbers are given for > > > > it; it would be nice if numbers were available for both > uncompressed > > > > and compressed footers > > > > > > > > - the numbers seem quite underwhelming currently, I think most of us > > > > were expecting massive speed improvements given past discussions > > > > > > > > - I'm firmly against narrowing sizes to 32 bits; making the footer > more > > > > compact is useful, but not to the point of reducing usefulness or > > > > generality > > > > > > > > > > > > A more general proposal: given the slightly underwhelming perf > > > > numbers, has nested Flatbuffers been considered as an alternative? > > > > > > > > For example, the RowGroup table could become: > > > > ``` > > > > table ColumnChunk { > > > > file_path: string; > > > > meta_data: ColumnMetadata; > > > > // etc. > > > > } > > > > > > > > struct EncodedColumnChunk { > > > > // Flatbuffers-encoded ColumnChunk, to be decoded/validated > > indidually > > > > column: [ubyte]; > > > > } > > > > > > > > table RowGroup { > > > > columns: [EncodedColumnChunk]; > > > > total_byte_size: int; > > > > num_rows: int; > > > > sorting_columns: [SortingColumn]; > > > > file_offset: long; > > > > total_compressed_size: int; > > > > ordinal: short = null; > > > > } > > > > ``` > > > > > > > > Regards > > > > > > > > Antoine. > > > > > > > > > > > > > > > > On Thu, 11 Sep 2025 08:41:34 +0200 > > > > Alkis Evlogimenos > > > > > > > > wrote: > > > > > Hi all. I am sharing as a separate thread the pr
Re: [DISCUSS] flatbuf footer
> Concerning Thrift optimization, while a 2-3x improvement might be > achievable, Flatbuffers are currently demonstrating a 10x improvement. > Andrew, do you have a more precise estimate for the speedup we could expect > in C++? Given my past experience on cuDF, I'd estimate about 2X there as well. cuDF has it's own metadata parser that I once benchmarked against the thrift generated parser. And I'd point out that beyond the initial 2X improvement, rolling your own parser frees you of having to parse out every structure in the metadata.
Re: [DISCUSS] flatbuf footer
On Wed, 24 Sep 2025 12:37:13 +0200 Alkis Evlogimenos wrote: > Thank you all for taking the time to go through the doc and your feedback. > I'd like to address some of the key points raised: > > Regarding nested Flatbuffers, there's no parsing benefit to using them. In > the current prototype, approximately two-thirds of the decoding cost comes > from converting the Flatbuffer to `FileMetadata` (the Thrift object) to > simplify the rollout process. Even with this conversion, we're observing a > greater than 10x improvement in footer decoding time for footers that > perform poorly with Thrift (at the p999 percentile). Removing the > `FileMetadata` translation should easily provide another 2x speedup. 1. Your own numbers show p50 percentile performance at around 1x, not 10x. It's nice that p999 (!!) percentile performance is so good, but that probably doesn't paint a representative picture of overall performance. 2. It would be useful to have p05 and p01 performance results, by the way. For now we know only about the best results, not the worst, which is a bit surprising. 3. As you said in one of the comments: "even without Thrift, we still have to verify the flatbuf which means we still have to walk all the bytes". Nested Flatbuffers would avoid verifying the flatbuf data for unused columns or indices, for example. > Finally, the 32-bit narrowing of row group sizes appears to be the most > contentious aspect of the design. I suggest we discuss this live during our > next Parquet sync. Well, not everyone can often make it to the Parquet syncs. Important spec discussions should be accessible to anyone regardless of their personal/professional schedules. > For the record, shrinking the offsets is the second most > significant optimization for Flatbuffer footer size, with statistics being > the first. I'm curious whether LZ4 would make the optimization less significant. Regards Antoine.
Re: [DISCUSS] flatbuf footer
Thank you all for taking the time to go through the doc and your feedback. I'd like to address some of the key points raised: Regarding nested Flatbuffers, there's no parsing benefit to using them. In the current prototype, approximately two-thirds of the decoding cost comes from converting the Flatbuffer to `FileMetadata` (the Thrift object) to simplify the rollout process. Even with this conversion, we're observing a greater than 10x improvement in footer decoding time for footers that perform poorly with Thrift (at the p999 percentile). Removing the `FileMetadata` translation should easily provide another 2x speedup. Concerning Thrift optimization, while a 2-3x improvement might be achievable, Flatbuffers are currently demonstrating a 10x improvement. Andrew, do you have a more precise estimate for the speedup we could expect in C++? It's also important to note that Thrift's format does not allow for random access, meaning we will always have to parse the entire footer, regardless of which columns are requested. I will work on getting numbers for LZ4 compressed versus raw footers, but please be aware that this will take some time. Finally, the 32-bit narrowing of row group sizes appears to be the most contentious aspect of the design. I suggest we discuss this live during our next Parquet sync. For the record, shrinking the offsets is the second most significant optimization for Flatbuffer footer size, with statistics being the first. See you all in the next sync. On Wed, Sep 17, 2025 at 10:02 AM Antoine Pitrou wrote: > > Hi Andrew, > > I haven't heard of anything like this for C++, but it is an intriguing > idea. > > Regards > > Antoine. > > > On Tue, 16 Sep 2025 16:44:14 -0400 > Andrew Lamb > wrote: > > Has anyone spent time optimizing the thrift decoder (e.g. not just use > > whatever a general purpose thrift compiler generates, but custom code a > > parser just for Parquet metadata)? > > > > Ed is in the process of implementing just such a decoder in arrow-rs[1] > and > > has seen a 2-3x performance improvement (with no change to the format) in > > early benchmark results. This is inline with our earlier work on the > > topic[2] where we estimated there is a 2-4x performance improvement with > > implementation improvements alone. > > > > Andrew > > > > [1]: https://github.com/apache/arrow-rs/issues/5854 > > [2]: https://www.influxdata.com/blog/how-good-parquet-wide-tables/ > > > > On Tue, Sep 16, 2025 at 4:12 AM Antoine Pitrou < > [email protected]> wrote: > > > > > > > > Hi again, > > > > > > Ok, a quick summary of my current feedback on this: > > > > > > - decoding speed measurements are given, but not footer size > > > measurements; it would be interesting to have both > > > > > > - it's not obvious whether the stated numbers are for reading all > > > columns or a subset of them > > > > > > - optional LZ4 compression is mentioned, but no numbers are given for > > > it; it would be nice if numbers were available for both uncompressed > > > and compressed footers > > > > > > - the numbers seem quite underwhelming currently, I think most of us > > > were expecting massive speed improvements given past discussions > > > > > > - I'm firmly against narrowing sizes to 32 bits; making the footer more > > > compact is useful, but not to the point of reducing usefulness or > > > generality > > > > > > > > > A more general proposal: given the slightly underwhelming perf > > > numbers, has nested Flatbuffers been considered as an alternative? > > > > > > For example, the RowGroup table could become: > > > ``` > > > table ColumnChunk { > > > file_path: string; > > > meta_data: ColumnMetadata; > > > // etc. > > > } > > > > > > struct EncodedColumnChunk { > > > // Flatbuffers-encoded ColumnChunk, to be decoded/validated > indidually > > > column: [ubyte]; > > > } > > > > > > table RowGroup { > > > columns: [EncodedColumnChunk]; > > > total_byte_size: int; > > > num_rows: int; > > > sorting_columns: [SortingColumn]; > > > file_offset: long; > > > total_compressed_size: int; > > > ordinal: short = null; > > > } > > > ``` > > > > > > Regards > > > > > > Antoine. > > > > > > > > > > > > On Thu, 11 Sep 2025 08:41:34 +0200 > > > Alkis Evlogimenos > > > > > > wrote: > > > > Hi all. I am sharing as a separate thread the proposal for the footer > > > > change we have been working on: > > > > > > > > https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit > > > > > . > > > > > > > > The proposal outlines the technical aspects of the design and the > > > > experimental results of shadow testing this in production workloads. > I > > > > would like to discuss the proposal's most salient points in the > next > > > sync: > > > > 1. the use of flatbuffers as footer serialization format > > > > 2. the additional limitations imposed on parquet files (row group > size > > > > limit, row group max num row limit) > > > > > >
Re: [DISCUSS] flatbuf footer
Hi Andrew,
I haven't heard of anything like this for C++, but it is an intriguing
idea.
Regards
Antoine.
On Tue, 16 Sep 2025 16:44:14 -0400
Andrew Lamb
wrote:
> Has anyone spent time optimizing the thrift decoder (e.g. not just use
> whatever a general purpose thrift compiler generates, but custom code a
> parser just for Parquet metadata)?
>
> Ed is in the process of implementing just such a decoder in arrow-rs[1] and
> has seen a 2-3x performance improvement (with no change to the format) in
> early benchmark results. This is inline with our earlier work on the
> topic[2] where we estimated there is a 2-4x performance improvement with
> implementation improvements alone.
>
> Andrew
>
> [1]: https://github.com/apache/arrow-rs/issues/5854
> [2]: https://www.influxdata.com/blog/how-good-parquet-wide-tables/
>
> On Tue, Sep 16, 2025 at 4:12 AM Antoine Pitrou
> wrote:
>
> >
> > Hi again,
> >
> > Ok, a quick summary of my current feedback on this:
> >
> > - decoding speed measurements are given, but not footer size
> > measurements; it would be interesting to have both
> >
> > - it's not obvious whether the stated numbers are for reading all
> > columns or a subset of them
> >
> > - optional LZ4 compression is mentioned, but no numbers are given for
> > it; it would be nice if numbers were available for both uncompressed
> > and compressed footers
> >
> > - the numbers seem quite underwhelming currently, I think most of us
> > were expecting massive speed improvements given past discussions
> >
> > - I'm firmly against narrowing sizes to 32 bits; making the footer more
> > compact is useful, but not to the point of reducing usefulness or
> > generality
> >
> >
> > A more general proposal: given the slightly underwhelming perf
> > numbers, has nested Flatbuffers been considered as an alternative?
> >
> > For example, the RowGroup table could become:
> > ```
> > table ColumnChunk {
> > file_path: string;
> > meta_data: ColumnMetadata;
> > // etc.
> > }
> >
> > struct EncodedColumnChunk {
> > // Flatbuffers-encoded ColumnChunk, to be decoded/validated indidually
> > column: [ubyte];
> > }
> >
> > table RowGroup {
> > columns: [EncodedColumnChunk];
> > total_byte_size: int;
> > num_rows: int;
> > sorting_columns: [SortingColumn];
> > file_offset: long;
> > total_compressed_size: int;
> > ordinal: short = null;
> > }
> > ```
> >
> > Regards
> >
> > Antoine.
> >
> >
> >
> > On Thu, 11 Sep 2025 08:41:34 +0200
> > Alkis Evlogimenos
> >
> > wrote:
> > > Hi all. I am sharing as a separate thread the proposal for the footer
> > > change we have been working on:
> > >
> > https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit
> >
> > > .
> > >
> > > The proposal outlines the technical aspects of the design and the
> > > experimental results of shadow testing this in production workloads. I
> > > would like to discuss the proposal's most salient points in the next
> > sync:
> > > 1. the use of flatbuffers as footer serialization format
> > > 2. the additional limitations imposed on parquet files (row group size
> > > limit, row group max num row limit)
> > >
> > > I would prefer comments on the google doc to facilitate async discussion.
> > >
> > > Thank you,
> > >
> >
> >
> >
> >
>
Re: [DISCUSS] flatbuf footer
I just found this thread went to my spam folder so I just want to bump it up before reading the details. On Thu, Sep 11, 2025 at 2:42 PM Alkis Evlogimenos wrote: > > Hi all. I am sharing as a separate thread the proposal for the footer > change we have been working on: > https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit > . > > The proposal outlines the technical aspects of the design and the > experimental results of shadow testing this in production workloads. I > would like to discuss the proposal's most salient points in the next sync: > 1. the use of flatbuffers as footer serialization format > 2. the additional limitations imposed on parquet files (row group size > limit, row group max num row limit) > > I would prefer comments on the google doc to facilitate async discussion. > > Thank you,
Re: [DISCUSS] flatbuf footer
Has anyone spent time optimizing the thrift decoder (e.g. not just use
whatever a general purpose thrift compiler generates, but custom code a
parser just for Parquet metadata)?
Ed is in the process of implementing just such a decoder in arrow-rs[1] and
has seen a 2-3x performance improvement (with no change to the format) in
early benchmark results. This is inline with our earlier work on the
topic[2] where we estimated there is a 2-4x performance improvement with
implementation improvements alone.
Andrew
[1]: https://github.com/apache/arrow-rs/issues/5854
[2]: https://www.influxdata.com/blog/how-good-parquet-wide-tables/
On Tue, Sep 16, 2025 at 4:12 AM Antoine Pitrou wrote:
>
> Hi again,
>
> Ok, a quick summary of my current feedback on this:
>
> - decoding speed measurements are given, but not footer size
> measurements; it would be interesting to have both
>
> - it's not obvious whether the stated numbers are for reading all
> columns or a subset of them
>
> - optional LZ4 compression is mentioned, but no numbers are given for
> it; it would be nice if numbers were available for both uncompressed
> and compressed footers
>
> - the numbers seem quite underwhelming currently, I think most of us
> were expecting massive speed improvements given past discussions
>
> - I'm firmly against narrowing sizes to 32 bits; making the footer more
> compact is useful, but not to the point of reducing usefulness or
> generality
>
>
> A more general proposal: given the slightly underwhelming perf
> numbers, has nested Flatbuffers been considered as an alternative?
>
> For example, the RowGroup table could become:
> ```
> table ColumnChunk {
> file_path: string;
> meta_data: ColumnMetadata;
> // etc.
> }
>
> struct EncodedColumnChunk {
> // Flatbuffers-encoded ColumnChunk, to be decoded/validated indidually
> column: [ubyte];
> }
>
> table RowGroup {
> columns: [EncodedColumnChunk];
> total_byte_size: int;
> num_rows: int;
> sorting_columns: [SortingColumn];
> file_offset: long;
> total_compressed_size: int;
> ordinal: short = null;
> }
> ```
>
> Regards
>
> Antoine.
>
>
>
> On Thu, 11 Sep 2025 08:41:34 +0200
> Alkis Evlogimenos
>
> wrote:
> > Hi all. I am sharing as a separate thread the proposal for the footer
> > change we have been working on:
> >
> https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit
> > .
> >
> > The proposal outlines the technical aspects of the design and the
> > experimental results of shadow testing this in production workloads. I
> > would like to discuss the proposal's most salient points in the next
> sync:
> > 1. the use of flatbuffers as footer serialization format
> > 2. the additional limitations imposed on parquet files (row group size
> > limit, row group max num row limit)
> >
> > I would prefer comments on the google doc to facilitate async discussion.
> >
> > Thank you,
> >
>
>
>
>
Re: [DISCUSS] flatbuf footer
Hi again,
Ok, a quick summary of my current feedback on this:
- decoding speed measurements are given, but not footer size
measurements; it would be interesting to have both
- it's not obvious whether the stated numbers are for reading all
columns or a subset of them
- optional LZ4 compression is mentioned, but no numbers are given for
it; it would be nice if numbers were available for both uncompressed
and compressed footers
- the numbers seem quite underwhelming currently, I think most of us
were expecting massive speed improvements given past discussions
- I'm firmly against narrowing sizes to 32 bits; making the footer more
compact is useful, but not to the point of reducing usefulness or
generality
A more general proposal: given the slightly underwhelming perf
numbers, has nested Flatbuffers been considered as an alternative?
For example, the RowGroup table could become:
```
table ColumnChunk {
file_path: string;
meta_data: ColumnMetadata;
// etc.
}
struct EncodedColumnChunk {
// Flatbuffers-encoded ColumnChunk, to be decoded/validated indidually
column: [ubyte];
}
table RowGroup {
columns: [EncodedColumnChunk];
total_byte_size: int;
num_rows: int;
sorting_columns: [SortingColumn];
file_offset: long;
total_compressed_size: int;
ordinal: short = null;
}
```
Regards
Antoine.
On Thu, 11 Sep 2025 08:41:34 +0200
Alkis Evlogimenos
wrote:
> Hi all. I am sharing as a separate thread the proposal for the footer
> change we have been working on:
> https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit
> .
>
> The proposal outlines the technical aspects of the design and the
> experimental results of shadow testing this in production workloads. I
> would like to discuss the proposal's most salient points in the next sync:
> 1. the use of flatbuffers as footer serialization format
> 2. the additional limitations imposed on parquet files (row group size
> limit, row group max num row limit)
>
> I would prefer comments on the google doc to facilitate async discussion.
>
> Thank you,
>
Re: [DISCUSS] flatbuf footer
commented, mostly on the must/may/shall section, -it's as important to call out those MUST NOT requirements. I'm worried about the "should Not substantially degrade performance of old readers" -I'd put that in the MUST NOT group and define "substantially". If this slows down existing readers other than a slightly larger end of file range to read before parsing, it won't be welcome and so less likely to be adopted. I also added a security requirement; maybe it should have its own section primarily as one of due diligence in which illegal/invalid values are discussed, such as references to different columns referring to overlapping files -but add that clients are NOT required to check this where the check is expensive. It would be good for all readers to add an option to validate the thrift and flatbuf footers to make sure they are consistent -stop somebody trying to sneak something malicious deeper into the pipeline where they know that the front end only checks the thrift values. A full scan of the whole footer for consistency of offsets again has to be an option. What does matter is that if my code reads a file from an untrusted source which does have an inconsistent footer (columns declared as overlapping) this is not going to generate any exploit. You'd make full-footer-validation part of the process for ingress of external sources, and from then on consider it well-formed and consistent across all runtimes. Steve (why yes, I am getting more into cybersecurity :) On Thu, 11 Sept 2025 at 07:43, Alkis Evlogimenos wrote: > Hi all. I am sharing as a separate thread the proposal for the footer > change we have been working on: > > https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit > . > > The proposal outlines the technical aspects of the design and the > experimental results of shadow testing this in production workloads. I > would like to discuss the proposal's most salient points in the next sync: > 1. the use of flatbuffers as footer serialization format > 2. the additional limitations imposed on parquet files (row group size > limit, row group max num row limit) > > I would prefer comments on the google doc to facilitate async discussion. > > Thank you, >
Re: [DISCUSS] flatbuf footer
Hello, I haven't read everything in detail yet, but I'm going to say upfront that I'm -1 on limiting sizes to 32 bits rather than the current 64 bits, unless it brings really sizable benefits (which I doubt, given the affected fields). Regards Antoine. On Thu, 11 Sep 2025 08:41:34 +0200 Alkis Evlogimenos wrote: > Hi all. I am sharing as a separate thread the proposal for the footer > change we have been working on: > https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit > . > > The proposal outlines the technical aspects of the design and the > experimental results of shadow testing this in production workloads. I > would like to discuss the proposal's most salient points in the next sync: > 1. the use of flatbuffers as footer serialization format > 2. the additional limitations imposed on parquet files (row group size > limit, row group max num row limit) > > I would prefer comments on the google doc to facilitate async discussion. > > Thank you, >
