[GitHub] flink pull request: [FLINK-2545] add bucket member count verificat...

2015-09-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1067


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2545] add bucket member count verificat...

2015-09-01 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1067#issuecomment-136667661
  
Looks good, I'll merge this!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2545] add bucket member count verificat...

2015-08-30 Thread ChengXiangLi
Github user ChengXiangLi commented on the pull request:

https://github.com/apache/flink/pull/1067#issuecomment-136243437
  
Nice job, @greghogan , you just pointed out the root cause and the 
solution. I add the logic to skip latest buckets as @StephanEwen suggested, and 
add related unit test for this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2545] add bucket member count verificat...

2015-08-29 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1067#issuecomment-135983916
  
Ah, that makes perfect sense. The last memory segment is not fully used 
(only until the hash index has initialized enough buckets). The bloom filter 
initialization loops should also skip those last buckets.

Since the memory for these buckets is not initialized, their contents (like 
count) is undefined.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2545] add bucket member count verificat...

2015-08-28 Thread greghogan
Github user greghogan commented on the pull request:

https://github.com/apache/flink/pull/1067#issuecomment-135885866
  
I am currently running release-0.10.0-milestone-1.

Debugging with Eclipse and looking at MutableHashTable.initTable, 
numBuckets is computed as 16086. There are 63 memory segments with 256 buckets 
each = 16128 total buckets. The last 16128 - 16086 = 42 buckets are not 
initialized by initTable which terminates the inner loop when bucket == 
numBuckets. Here is an example header dump from the last memory segment showing 
the crossover from initialized to uninitialized data.

offset, partition, status, count, next-pointer
26880 10 0 0 -72340172838076673
27008 11 0 0 -72340172838076673
27136 12 0 0 -72340172838076673
27264 13 0 0 -72340172838076673
27392 0 -56 9 844425030795264
27520 0 -56 9 -9191846839379296256
27648 0 -56 9 10133099245469696
27776 0 -56 9 12103424082444288

Setting a breakpoint for MutableHashTable.buildBloomFilterForBucket for 
count  0, the last memory segment looked as follows (this is from a different 
execution, operation, and thread).

offset, partition, status, count, next-pointer
26880 10 0 9 27584547767975936
27008 11 0 9 -9208735337998712832
27136 12 0 9 4503599694479360
27264 13 0 9 -9219994337067139072
27392 0 0 -32697 1161165883580435
27520 0 3 -15328 18016855230957176
27648 0 5 1388 -33740636012148672
27776 0 6 25494 -17363350186618861

MutableHashTable.buildBloomFilterForBucketsInPartition processed offset 
27392 which happened to match the partition number and bucket status even 
though it looks to be uninitialized.

After changing MutableHashTable.initTable to initialize all buckets in all 
segments I have not seen the bug reoccur.

{code}
for (int k = 0; k  bucketsPerSegment /*  bucket  numBuckets*/; k++, 
bucket++) {
}
{code}

I see at least three potential resolutions: 1) have 
MutableHashTable.initTable initialize all buckets, 2) have 
MutableHashTable.buildBloomFilterForBucket skip uninitialized buckets, or 3) I 
have not looked enough at MutableHashTable.getInitialTableSize but it is 
possible to completely fill the last segment with usable buckets?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2545] add bucket member count verificat...

2015-08-27 Thread ChengXiangLi
Github user ChengXiangLi commented on the pull request:

https://github.com/apache/flink/pull/1067#issuecomment-135612599
  
Thanks for the remind, @zentol and @StephanEwen , I should be too hurry to 
open this PR. I tried to fix the exception in bloom filter in this PR and 
verify other potential issues in hash table behind negative count number 
separately, obviously, there is no need to do in that way. So let's wait for 
Greg's response now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2545] add bucket member count verificat...

2015-08-27 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1067#issuecomment-135354418
  
@ChengXiangLi Do you know what caused the problem initially? I was puzzled, 
because the count in the bucket should never be negative, and a zero sized 
bucket should work with your original code.

Would be great to capture that error, to see if the root bug was actually 
somewhere else (not in the bloom filters), but in the other parts of the hash 
table structure.

Hopefully Greg can help us to reproduce this...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---