On 2005-11-21, at 19:32, Martijn van Oosterhout wrote:
On Mon, Nov 21, 2005 at 04:58:25PM +0100, Grzegorz Jaskiewicz wrote:
my conquers with Gist index for custom type are nearly finished. It
is working as it is now, but there are few problems here and there.
One of em, being amount of disc space index it self takes. The type
stucture it self takes 160bytes. Adding 100.000 rows into table -
CREATE TABLE blah (a serial, b customType);
Let's see, 160bytes means you'll get aboud 50 keys per page. So you
would expect 2000 leaf page, 40 level 1 pages. This should be less
than
20-30MB
yep;
You mean you sometimes put the same elements in the two halves? You
shouldn't do that. The whole point is that the search will descend any
node that matches consistant, but any single key should only appear
once in each index.
picksplit should *split* the set, not return two sets about the same
size as you started...
Nope, I mean that 'masks' created to match either 'half' sometimes
match elements in the other one.
This shouldn't be a big deal, just one level to go down on query to
much more specific result set.
I have fixed that with, somewhat hack.
Here's some pseudo code
sort_data(input);
find_split_point(input);
mask1 = generate_two_masks(input[0]);
mask2 = generate_two_masks(input[1]);
foreach(input) {
bool a = matches1(input);
bool b = matches2(input);
if ( a && b ) {
if ( left_index == 0 ) {
left[left_index++] = input;
}
else {
if ( right_index == 0 ) {
right[right_index++] = input;
continue;
}
/* this part is new code, and helped a lot, now gist index takes much
less space
and is much faster, because of lower I/O consumption*/
if ( loop_index % 2 ) {
right[right_index++] = input;
}
else {
left[left_index++] = input;
}
}
}
else {
if ( a) left[left_index++] = input;
if ( b) right[right_index++] = input;
}
}
mask1 = generate(left );
mask2 = generate(right);
return (left, right, blah, blih, others);
Ok, so the part with i%2 helped a lot, it distributes elements
matching both masks evenly.
Thanks guys.
I will play with k-means, and see if they will work better with no
hacks. Either way, I have to have some code that will handle "matches
both" case.
Thanks again.
--
GJ
"If we knew what we were doing, it wouldn't be called Research, would
it?" - AE
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend