Re: [HACKERS] Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption

2013-10-12 Thread Tomas Vondra
On 11.10.2013 13:42, Huchev wrote: > > gettimeofday(&start, NULL); > for (i = 0; i < VALUES; i++) { > state = XXH32_init(result); > XXH32_update(state, &i, 4); > XXH32_digest(state); > } > gettimeofday(&end, NULL); > > > This code is using the "update" variant, wh

[HACKERS] Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption

2013-10-11 Thread Huchev
gettimeofday(&start, NULL); for (i = 0; i < VALUES; i++) { state = XXH32_init(result); XXH32_update(state, &i, 4); XXH32_digest(state); } gettimeofday(&end, NULL); This code is using the "update" variant, which is only useful when dealing with very large amount of

Re: [HACKERS] Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption

2013-10-08 Thread Tomas Vondra
On 8 Říjen 2013, 13:52, Atri Sharma wrote: > On Tue, Oct 8, 2013 at 4:15 PM, Tomas Vondra wrote: >> On 8 Říjen 2013, 11:42, Atri Sharma wrote: I've made some significant improvements in the chaining version (in the master branch), now getting about the memory consumption I've >

Re: [HACKERS] Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption

2013-10-08 Thread Atri Sharma
On Tue, Oct 8, 2013 at 4:15 PM, Tomas Vondra wrote: > On 8 Říjen 2013, 11:42, Atri Sharma wrote: >>> >>> I've made some significant improvements in the chaining version (in the >>> master branch), now getting about the memory consumption I've estimated. >>> >> I agree, we can hope to reduce the me

Re: [HACKERS] Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption

2013-10-08 Thread Tomas Vondra
On 8 Říjen 2013, 11:42, Atri Sharma wrote: >> >> I've made some significant improvements in the chaining version (in the >> master branch), now getting about the memory consumption I've estimated. >> > I agree, we can hope to reduce the memory consumption by making changes in > the current chaining

Re: [HACKERS] Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption

2013-10-08 Thread Atri Sharma
Sent from my iPad > > However I'm unable to make it at least comparable to chaining. The trick > is that the hash table performs reasonably only until it's ~ 65-70% full, > it gets really slow after that. So to maintain performance comparable to > chaining, I'd have to keep the table below this

Re: [HACKERS] Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption

2013-10-08 Thread Tomas Vondra
On 8 Říjen 2013, 6:23, Atri Sharma wrote: > Hi Tomas, > > >>> Consider the aspects associated with open addressing.Open addressing >>> can quickly lead to growth in the main table.Also, chaining is a much >>> cleaner way of collision resolution,IMHO. >> >> What do you mean by "growth in the main ta

Re: [HACKERS] Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption

2013-10-08 Thread Atri Sharma
Sent from my iPad > On 08-Oct-2013, at 10:41, Claudio Freire wrote: > > On Tue, Oct 8, 2013 at 1:23 AM, Atri Sharma wrote: Consider the aspects associated with open addressing.Open addressing can quickly lead to growth in the main table.Also, chaining is a much cleaner way of c

Re: [HACKERS] Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption

2013-10-07 Thread Claudio Freire
On Tue, Oct 8, 2013 at 1:23 AM, Atri Sharma wrote: >>> Consider the aspects associated with open addressing.Open addressing >>> can quickly lead to growth in the main table.Also, chaining is a much >>> cleaner way of collision resolution,IMHO. >> >> What do you mean by "growth in the main table"?

Re: [HACKERS] Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption

2013-10-07 Thread Atri Sharma
Hi Tomas, >> Consider the aspects associated with open addressing.Open addressing >> can quickly lead to growth in the main table.Also, chaining is a much >> cleaner way of collision resolution,IMHO. > > What do you mean by "growth in the main table"? Sorry, I should have been more verbose. AFA

Re: [HACKERS] Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption

2013-10-07 Thread Tomas Vondra
Hi Atri! On 7.10.2013 16:56, Atri Sharma wrote: >>> 3. Consider dropping buckets in favor of open addressing (linear >>> probing, quadratic, whatever). This avoids another level of >>> pointer indirection. >> >> OK, this sounds really interesting. It should be fairly >> straightforward for fixed

Re: [HACKERS] Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption

2013-10-07 Thread Tomas Vondra
On 7.10.2013 14:50, k...@rice.edu wrote: > On Mon, Oct 07, 2013 at 12:41:58AM +0200, Tomas Vondra wrote: >>> 2. Consider using a simpler/faster hash function, like FNV[1] or Jenkins[2]. >>>For fun, try not hashing those ints at all and see how that performs >>> (that, >>>I think, is what y

Re: [HACKERS] Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption

2013-10-07 Thread Atri Sharma
Sent from my iPad > On 07-Oct-2013, at 4:11, Tomas Vondra wrote: > >> On 6.10.2013 20:37, Tomáš Janoušek wrote: >> Hi, >> >>> On Sat, Oct 05, 2013 at 08:22:54PM +0200, Tomas Vondra wrote: >>> I'm on 64-bit architecture and the example works with int32, which means >>> the sizes should be abou

Re: [HACKERS] Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption

2013-10-07 Thread k...@rice.edu
On Mon, Oct 07, 2013 at 12:41:58AM +0200, Tomas Vondra wrote: > > 2. Consider using a simpler/faster hash function, like FNV[1] or Jenkins[2]. > >For fun, try not hashing those ints at all and see how that performs > > (that, > >I think, is what you get from HashSet in Java/C#). > > I've

Re: [HACKERS] Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption

2013-10-06 Thread Tomas Vondra
On 6.10.2013 20:37, Tomáš Janoušek wrote: > Hi, > > On Sat, Oct 05, 2013 at 08:22:54PM +0200, Tomas Vondra wrote: >> I'm on 64-bit architecture and the example works with int32, which means >> the sizes should be about this: >> >> hash_element_t => 20B >> hash_bucket_t => 4B + (20B * item

[HACKERS] Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption

2013-10-06 Thread Tomáš Janoušek
Hi, On Sat, Oct 05, 2013 at 08:22:54PM +0200, Tomas Vondra wrote: > I'm on 64-bit architecture and the example works with int32, which means > the sizes should be about this: > > hash_element_t => 20B > hash_bucket_t => 4B + (20B * items in the bucket [in steps of 5]) > hash_table_t