Re: BitWeaving in Parquet?

2018-10-23 Thread Jim Apple
> For Vertical Bit-Parallel (VBP), I think the reason why I didn't think it > would be useful for Parquet is that it is really expensive to produce and > really expensive to reconstruct values that aren't filtered out. Julien, this would be a thing I think the list would love to hear from Jignesh

Re: BitWeaving in Parquet?

2018-10-23 Thread Jim Apple
> For Vertical Bit-Parallel (VBP), I think the reason why I didn't think it > would be useful for Parquet is that it is really expensive to produce and > really expensive to reconstruct values that aren't filtered out. Yes - you can see in Figure 12(a) that the aggregation time went up for the

Re: BitWeaving in Parquet?

2018-10-22 Thread Ryan Blue
I looked into this a while ago. Assuming that I remember correctly, the conclusion I came to was that Horizontal Bit-Parallel (HBP) might be helpful, but the vertical option was probably not appropriate. HBP would allow Parquet readers to run predicates on multiple values at once without needing

Re: BitWeaving in Parquet?

2018-10-14 Thread Jim Apple
On 2018/10/08 22:08:16, Julien Le Dem wrote: > it's a variation of bit packing. right? I looked into it on https://github.com/apache/parquet-format/blob/master/Encodings.md and I believe that the Horizontal Bit-Parallel encoding in the paper is a variant on bit packing. There are three

Re: BitWeaving in Parquet?

2018-10-08 Thread Julien Le Dem
If you want (and if you don't already know him) I'm happy to ask Jignesh if he wants an intro. I think he would be happy to tell you about it. On Mon, Oct 8, 2018 at 4:04 PM Jim Apple wrote: > > That sounds like an interesting possibility. It's not that fresh in my > mind > > but I'd say from

Re: BitWeaving in Parquet?

2018-10-08 Thread Jim Apple
> That sounds like an interesting possibility. It's not that fresh in my mind > but I'd say from the storage perspective it's a variation of bit packing. > right? I'm not familiar with bit packing, so I'd have to look into that. I found the paper readable enough at the time that I didn't end up

Re: BitWeaving in Parquet?

2018-10-08 Thread Julien Le Dem
Hi Jim, I remember chatting with Jignesh Patel about it at the time. Since his company locomatix was acquired by twitter we had him as an adviser for some time. That sounds like an interesting possibility. It's not that fresh in my mind but I'd say from the storage perspective it's a variation of

BitWeaving in Parquet?

2018-10-08 Thread Jim Apple
The BitWeaving paper from a few years ago demonstrates some large performance wins in predicate evaluation based partially on reconfiguring the storage layout: http://pages.cs.wisc.edu/~jignesh/publ/BitWeaving.pdf Is it technically possible for Parquet to support "Vertical Bit-Parallel"