Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation

2014-09-08 Thread Nejc Saje



On 09/08/2014 06:22 AM, Robert Collins wrote:

On 8 September 2014 05:57, Nejc Saje ns...@redhat.com wrote:
\

That generator API is pretty bad IMO - because it means you're very
heavily dependent on gc and refcount behaviour to keep things clean -
and there isn't (IMO) a use case for walking the entire ring from the
perspective of an item. Whats the concern with having replicas a part
of the API?



Because they don't really make sense conceptually. Hash ring itself doesn't
actually 'make' any replicas. The replicas parameter in the current Ironic
implementation is used solely to limit the amount of buckets returned.
Conceptually, that seems to me the same as take(replicas,
iterate_nodes()). I don't know python internals enough to know what problems
this would cause though, can you please clarify?


I could see replicas being a parameter to a function call, but take(N,
generator) has the same poor behaviour - generators in general that
won't be fully consumed rely on reference counting to be freed.
Sometimes thats absolutely the right tradeoff.


Ok, I can agree with it being a function call.





its absolutely a partition of the hash space - each spot we hash a
bucket onto is thats how consistent hashing works at all :)



Yes, but you don't assign the number of partitions beforehand, it depends on
the number of buckets. What you do assign is the amount of times you hash a
single bucket onto the ring, which is currently named 'replicas' in
Ceilometer code, but I suggested 'distribution_quality' or something
similarly descriptive in an earlier e-mail.


I think you misunderstand the code. We do assign the number of
partitions beforehand - its approximately fixed and independent of the
number of buckets. More buckets == less times we hash each bucket.


Ah, your first sentence tipped me off that we're not actually speaking 
about the same code :). I'm talking about current Ceilometer code and 
the way the algorithm is described in the original paper. There, the 
number of times we hash a bucket doesn't depend on the number of buckets 
at all. The implementation with an array that Ironic used to have 
definitely needed to define the number of partition, but I don't see the 
need for it with the new approach as well. Why would you want to limit 
yourself to a fixed number of partitions if you're limited only by the 
output range of the hash function?


Cheers,
Nejc



-Rob



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation

2014-09-07 Thread Nejc Saje



On 09/04/2014 11:24 PM, Robert Collins wrote:

On 4 September 2014 23:42, Nejc Saje ns...@redhat.com wrote:



On 09/04/2014 11:51 AM, Robert Collins wrote:



It doesn't contain that term precisely, but it does talk about replicating
the buckets. What about using a descriptive name for this parameter, like
'distribution_quality', where the higher the value, higher the distribution
evenness (and higher memory usage)?




I've no objection talking about keys, but 'node' is an API object in
Ironic, so I'd rather we talk about hosts - or make it something
clearly not node like 'bucket' (which the 1997 paper talks about in
describing consistent hash functions).

So proposal:
   - key - a stringifyable thing to be mapped to buckets


What about using the term 'item' from the original paper as well?


Sure. Item it is.




   - bucket a worker/store that wants keys mapped to it
   - replicas - number of buckets a single key wants to be mapped to


Can we keep this as an Ironic-internal parameter? Because it doesn't really
affect the hash ring. If you want multiple buckets for your item, you just
continue your journey along the ring and keep returning new buckets. Check
out how the pypi lib does it:
https://github.com/Doist/hash_ring/blob/master/hash_ring/ring.py#L119


That generator API is pretty bad IMO - because it means you're very
heavily dependent on gc and refcount behaviour to keep things clean -
and there isn't (IMO) a use case for walking the entire ring from the
perspective of an item. Whats the concern with having replicas a part
of the API?


Because they don't really make sense conceptually. Hash ring itself 
doesn't actually 'make' any replicas. The replicas parameter in the 
current Ironic implementation is used solely to limit the amount of 
buckets returned. Conceptually, that seems to me the same as 
take(replicas, iterate_nodes()). I don't know python internals enough 
to know what problems this would cause though, can you please clarify?





   - partitions - number of total divisions of the hash space (power of
2 required)


I don't think there are any divisions of the hash space in the correct
implementation, are there? I think that in the current Ironic implementation
this tweaks the distribution quality, just like 'replicas' parameter in
Ceilo implementation.


its absolutely a partition of the hash space - each spot we hash a
bucket onto is thats how consistent hashing works at all :)


Yes, but you don't assign the number of partitions beforehand, it 
depends on the number of buckets. What you do assign is the amount of 
times you hash a single bucket onto the ring, which is currently named 
'replicas' in Ceilometer code, but I suggested 'distribution_quality' or 
something similarly descriptive in an earlier e-mail.


Cheers,
Nejc



-Rob



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation

2014-09-07 Thread Robert Collins
On 8 September 2014 05:57, Nejc Saje ns...@redhat.com wrote:
\
 That generator API is pretty bad IMO - because it means you're very
 heavily dependent on gc and refcount behaviour to keep things clean -
 and there isn't (IMO) a use case for walking the entire ring from the
 perspective of an item. Whats the concern with having replicas a part
 of the API?


 Because they don't really make sense conceptually. Hash ring itself doesn't
 actually 'make' any replicas. The replicas parameter in the current Ironic
 implementation is used solely to limit the amount of buckets returned.
 Conceptually, that seems to me the same as take(replicas,
 iterate_nodes()). I don't know python internals enough to know what problems
 this would cause though, can you please clarify?

I could see replicas being a parameter to a function call, but take(N,
generator) has the same poor behaviour - generators in general that
won't be fully consumed rely on reference counting to be freed.
Sometimes thats absolutely the right tradeoff.


 its absolutely a partition of the hash space - each spot we hash a
 bucket onto is thats how consistent hashing works at all :)


 Yes, but you don't assign the number of partitions beforehand, it depends on
 the number of buckets. What you do assign is the amount of times you hash a
 single bucket onto the ring, which is currently named 'replicas' in
 Ceilometer code, but I suggested 'distribution_quality' or something
 similarly descriptive in an earlier e-mail.

I think you misunderstand the code. We do assign the number of
partitions beforehand - its approximately fixed and independent of the
number of buckets. More buckets == less times we hash each bucket.

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation

2014-09-04 Thread Nejc Saje



On 09/04/2014 01:37 AM, Robert Collins wrote:

On 4 September 2014 00:13, Eoghan Glynn egl...@redhat.com wrote:




On 09/02/2014 11:33 PM, Robert Collins wrote:

The implementation in ceilometer is very different to the Ironic one -
are you saying the test you linked fails with Ironic, or that it fails
with the ceilometer code today?


Disclaimer: in Ironic terms, node = conductor, key = host

The test I linked fails with Ironic hash ring code (specifically the
part that tests consistency). With 1000 keys being mapped to 10 nodes,
when you add a node:
- current ceilometer code remaps around 7% of the keys ( 1/#nodes)
- Ironic code remaps  90% of the keys


So just to underscore what Nejc is saying here ...

The key point is the proportion of such baremetal-nodes that would
end up being re-assigned when a new conductor is fired up.


That was 100% clear, but thanks for making sure.

The question was getting a proper understanding of why it was
happening in Ironic.

The ceilometer hashring implementation is good, but it uses the same
terms very differently (e.g. replicas for partitions) - I'm adapting
the key fix back into Ironic - I'd like to see us converge on a single
implementation, and making sure the Ironic one is suitable for
ceilometer seems applicable here (since ceilometer seems to need less
from the API),


I used the terms that are used in the original caching use-case, as 
described in [1] and are used in the pypi lib as well[2]. With the 
correct approach, there aren't actually any partitions, 'replicas' 
actually denotes the number of times you hash a node onto the ring. As 
for nodeskeys, what's your suggestion?


I've opened a bug[3], so you can add a Closes-Bug to your patch.

[1] http://www.martinbroadhurst.com/Consistent-Hash-Ring.html
[2] https://pypi.python.org/pypi/hash_ring
[3] https://bugs.launchpad.net/ironic/+bug/1365334



If reassigning was cheap Ironic wouldn't have bothered having a hash ring :)

-Rob





___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation

2014-09-04 Thread Robert Collins
On 4 September 2014 19:53, Nejc Saje ns...@redhat.com wrote:

 I used the terms that are used in the original caching use-case, as
 described in [1] and are used in the pypi lib as well[2]. With the correct
 approach, there aren't actually any partitions, 'replicas' actually denotes
 the number of times you hash a node onto the ring. As for nodeskeys, what's
 your suggestion?

So - we should change the Ironic terms then, I suspect (but lets check
with Deva who wrote the original code where he got them from).

The parameters we need to create a ring are:
 - how many fallback positions we use for data (currently referred to
as replicas)
 - how many times we hash the servers hosting data into the ring
(currently inferred via the hash_partition_exponent / server count)
 - the servers

and then we probe data items as we go.

The original paper isn't
http://www.martinbroadhurst.com/Consistent-Hash-Ring.html -
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.147.1879 is
referenced by it, and that paper doesn't include the term replica
count at all. In other systems like cassandra, replicas generally
refers to how many servers end up holding a copy of the data: Martin
Broadhurts paper uses replica there in quite a different sense - I
much prefer the Ironic use, which says how many servers will be
operating on the data: its externally relevant.

I've no objection talking about keys, but 'node' is an API object in
Ironic, so I'd rather we talk about hosts - or make it something
clearly not node like 'bucket' (which the 1997 paper talks about in
describing consistent hash functions).

So proposal:
 - key - a stringifyable thing to be mapped to buckets
 - bucket a worker/store that wants keys mapped to it
 - replicas - number of buckets a single key wants to be mapped to
 - partitions - number of total divisions of the hash space (power of
2 required)

 I've opened a bug[3], so you can add a Closes-Bug to your patch.

Thanks!

-Rob


-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation

2014-09-04 Thread Eoghan Glynn


   The implementation in ceilometer is very different to the Ironic one -
   are you saying the test you linked fails with Ironic, or that it fails
   with the ceilometer code today?
 
  Disclaimer: in Ironic terms, node = conductor, key = host
 
  The test I linked fails with Ironic hash ring code (specifically the
  part that tests consistency). With 1000 keys being mapped to 10 nodes,
  when you add a node:
  - current ceilometer code remaps around 7% of the keys ( 1/#nodes)
  - Ironic code remaps  90% of the keys
 
  So just to underscore what Nejc is saying here ...
 
  The key point is the proportion of such baremetal-nodes that would
  end up being re-assigned when a new conductor is fired up.
 
 That was 100% clear, but thanks for making sure.
 
 The question was getting a proper understanding of why it was
 happening in Ironic.
 
 The ceilometer hashring implementation is good, but it uses the same
 terms very differently (e.g. replicas for partitions) - I'm adapting
 the key fix back into Ironic - I'd like to see us converge on a single
 implementation, and making sure the Ironic one is suitable for
 ceilometer seems applicable here (since ceilometer seems to need less
 from the API),

Absolutely +1 on converging on a single implementation. That was
our intent on the ceilometer side from the get-go, to promote a
single implementation to oslo that both projects could share.

This turned out not to be possible in the short term when the
non-consistent aspect of the Ironic implementation was discovered
by Nejc, with the juno-3 deadline looming. However for kilo, we
would definitely be interested in leveraging a best-of-breed
implementation from oslo.

 If reassigning was cheap Ironic wouldn't have bothered having a hash ring :)

Fair enough, I was just allowing for the possibility that avoidance
of needless re-mapping hasn't as high a priority on the ironic side
as it was for ceilometer.

Cheers,
Eoghan

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation

2014-09-04 Thread Nejc Saje



On 09/04/2014 11:51 AM, Robert Collins wrote:

On 4 September 2014 19:53, Nejc Saje ns...@redhat.com wrote:


I used the terms that are used in the original caching use-case, as
described in [1] and are used in the pypi lib as well[2]. With the correct
approach, there aren't actually any partitions, 'replicas' actually denotes
the number of times you hash a node onto the ring. As for nodeskeys, what's
your suggestion?


So - we should change the Ironic terms then, I suspect (but lets check
with Deva who wrote the original code where he got them from).

The parameters we need to create a ring are:
  - how many fallback positions we use for data (currently referred to
as replicas)
  - how many times we hash the servers hosting data into the ring
(currently inferred via the hash_partition_exponent / server count)
  - the servers

and then we probe data items as we go.

The original paper isn't
http://www.martinbroadhurst.com/Consistent-Hash-Ring.html -
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.147.1879 is
referenced by it, and that paper doesn't include the term replica
count at all. In other systems like cassandra, replicas generally
refers to how many servers end up holding a copy of the data: Martin
Broadhurts paper uses replica there in quite a different sense - I
much prefer the Ironic use, which says how many servers will be
operating on the data: its externally relevant.


It doesn't contain that term precisely, but it does talk about 
replicating the buckets. What about using a descriptive name for this 
parameter, like 'distribution_quality', where the higher the value, 
higher the distribution evenness (and higher memory usage)?




I've no objection talking about keys, but 'node' is an API object in
Ironic, so I'd rather we talk about hosts - or make it something
clearly not node like 'bucket' (which the 1997 paper talks about in
describing consistent hash functions).

So proposal:
  - key - a stringifyable thing to be mapped to buckets

What about using the term 'item' from the original paper as well?


  - bucket a worker/store that wants keys mapped to it
  - replicas - number of buckets a single key wants to be mapped to
Can we keep this as an Ironic-internal parameter? Because it doesn't 
really affect the hash ring. If you want multiple buckets for your item, 
you just continue your journey along the ring and keep returning new 
buckets. Check out how the pypi lib does it: 
https://github.com/Doist/hash_ring/blob/master/hash_ring/ring.py#L119



  - partitions - number of total divisions of the hash space (power of
2 required)
I don't think there are any divisions of the hash space in the correct 
implementation, are there? I think that in the current Ironic 
implementation this tweaks the distribution quality, just like 
'replicas' parameter in Ceilo implementation.


Cheers,
Nejc




I've opened a bug[3], so you can add a Closes-Bug to your patch.


Thanks!

-Rob




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation

2014-09-04 Thread Robert Collins
On 4 September 2014 23:42, Nejc Saje ns...@redhat.com wrote:


 On 09/04/2014 11:51 AM, Robert Collins wrote:

 It doesn't contain that term precisely, but it does talk about replicating
 the buckets. What about using a descriptive name for this parameter, like
 'distribution_quality', where the higher the value, higher the distribution
 evenness (and higher memory usage)?



 I've no objection talking about keys, but 'node' is an API object in
 Ironic, so I'd rather we talk about hosts - or make it something
 clearly not node like 'bucket' (which the 1997 paper talks about in
 describing consistent hash functions).

 So proposal:
   - key - a stringifyable thing to be mapped to buckets

 What about using the term 'item' from the original paper as well?

Sure. Item it is.


   - bucket a worker/store that wants keys mapped to it
   - replicas - number of buckets a single key wants to be mapped to

 Can we keep this as an Ironic-internal parameter? Because it doesn't really
 affect the hash ring. If you want multiple buckets for your item, you just
 continue your journey along the ring and keep returning new buckets. Check
 out how the pypi lib does it:
 https://github.com/Doist/hash_ring/blob/master/hash_ring/ring.py#L119

That generator API is pretty bad IMO - because it means you're very
heavily dependent on gc and refcount behaviour to keep things clean -
and there isn't (IMO) a use case for walking the entire ring from the
perspective of an item. Whats the concern with having replicas a part
of the API?

   - partitions - number of total divisions of the hash space (power of
 2 required)

 I don't think there are any divisions of the hash space in the correct
 implementation, are there? I think that in the current Ironic implementation
 this tweaks the distribution quality, just like 'replicas' parameter in
 Ceilo implementation.

its absolutely a partition of the hash space - each spot we hash a
bucket onto is thats how consistent hashing works at all :)

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation

2014-09-03 Thread Nejc Saje



On 09/02/2014 11:19 PM, Gregory Haynes wrote:

Excerpts from Nejc Saje's message of 2014-09-01 07:48:46 +:

Hey guys,

in Ceilometer we're using consistent hash rings to do workload
partitioning[1]. We've considered generalizing your hash ring
implementation and moving it up to oslo, but unfortunately your
implementation is not actually consistent, which is our requirement.

Since you divide your ring into a number of equal sized partitions,
instead of hashing hosts onto the ring, when you add a new host,
an unbound amount of keys get re-mapped to different hosts (instead of
the 1/#nodes remapping guaranteed by hash ring). I've confirmed this
with the test in aforementioned patch[2].


I am just getting started with the ironic hash ring code, but this seems
surprising to me. AIUI we do require some rebalancing when a conductor
is removed or added (which is normal use of a CHT) but not for every
host added. This is supported by the fact that we currently dont have a
rebalancing routine, so I would be surprised if ironic worked at all if
we required it for each host that is added.


Sorry, I used terms that are used in the distributed caching use-case 
for hash-rings (which is why this algorithm was developed), where you 
have hosts and keys, and hash-ring tells you which host the key is on.


To translate the original e-mail into Ironic use-case, where you have 
hosts being mapped to conductors:


 Since you divide your ring into a number of equal sized partitions
 instead of hashing *conductors* onto the ring, when you add a new 
*host*,
 an unbound amount of *hosts* get re-mapped to different *conductors* 
(instead of
 the 1/#*conductors* of *hosts* being re-mapped guaranteed by hash 
ring). I've confirmed this

 with the test in aforementioned patch[2].

Cheers,
Nejc



Can anyone in Ironic with a bit more experience confirm/deny this?



If this is good enough for your use-case, great, otherwise we can get a
generalized hash ring implementation into oslo for use in both projects
or we can both use an external library[3].

Cheers,
Nejc

[1] https://review.openstack.org/#/c/113549/
[2]
https://review.openstack.org/#/c/113549/21/ceilometer/tests/test_utils.py
[3] https://pypi.python.org/pypi/hash_ring



Thanks,
Greg

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation

2014-09-03 Thread Nejc Saje



On 09/02/2014 11:33 PM, Robert Collins wrote:

The implementation in ceilometer is very different to the Ironic one -
are you saying the test you linked fails with Ironic, or that it fails
with the ceilometer code today?


Disclaimer: in Ironic terms, node = conductor, key = host

The test I linked fails with Ironic hash ring code (specifically the 
part that tests consistency). With 1000 keys being mapped to 10 nodes, 
when you add a node:

- current ceilometer code remaps around 7% of the keys ( 1/#nodes)
- Ironic code remaps  90% of the keys

The problem lies in the way you build your hash ring[1]. You initialize 
a statically-sized array and divide its fields among nodes. When you do 
a lookup, you check which field in the array the key hashes to and then 
return the node that that field belongs to. This is the wrong approach 
because when you add a new node, you will resize the array and assign 
the fields differently, but the same key will still hash to the same 
field and will therefore hash to a different node.


Nodes must be hashed onto the ring as well, statically chopping up the 
ring and dividing it among nodes isn't enough for consistency.


Cheers,
Nejc



The Ironic hash_ring implementation uses a hash:
 def _get_partition(self, data):
 try:
 return (struct.unpack_from('I', hashlib.md5(data).digest())[0]
  self.partition_shift)
 except TypeError:
 raise exception.Invalid(
 _(Invalid data supplied to HashRing.get_hosts.))


so I don't see the fixed size thing you're referring to. Could you
point a little more specifically? Thanks!

-Rob

On 1 September 2014 19:48, Nejc Saje ns...@redhat.com wrote:

Hey guys,

in Ceilometer we're using consistent hash rings to do workload
partitioning[1]. We've considered generalizing your hash ring implementation
and moving it up to oslo, but unfortunately your implementation is not
actually consistent, which is our requirement.

Since you divide your ring into a number of equal sized partitions, instead
of hashing hosts onto the ring, when you add a new host,
an unbound amount of keys get re-mapped to different hosts (instead of the
1/#nodes remapping guaranteed by hash ring). I've confirmed this with the
test in aforementioned patch[2].

If this is good enough for your use-case, great, otherwise we can get a
generalized hash ring implementation into oslo for use in both projects or
we can both use an external library[3].

Cheers,
Nejc

[1] https://review.openstack.org/#/c/113549/
[2]
https://review.openstack.org/#/c/113549/21/ceilometer/tests/test_utils.py
[3] https://pypi.python.org/pypi/hash_ring

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev






___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation

2014-09-03 Thread Nejc Saje

Sorry, forgot to link the reference:

[1] 
https://github.com/openstack/ironic/blob/b56db42aa39e855e558a52eb71e656ea14380f8a/ironic/common/hash_ring.py#L72


On 09/03/2014 01:50 PM, Nejc Saje wrote:



On 09/02/2014 11:33 PM, Robert Collins wrote:

The implementation in ceilometer is very different to the Ironic one -
are you saying the test you linked fails with Ironic, or that it fails
with the ceilometer code today?


Disclaimer: in Ironic terms, node = conductor, key = host

The test I linked fails with Ironic hash ring code (specifically the
part that tests consistency). With 1000 keys being mapped to 10 nodes,
when you add a node:
- current ceilometer code remaps around 7% of the keys ( 1/#nodes)
- Ironic code remaps  90% of the keys

The problem lies in the way you build your hash ring[1]. You initialize
a statically-sized array and divide its fields among nodes. When you do
a lookup, you check which field in the array the key hashes to and then
return the node that that field belongs to. This is the wrong approach
because when you add a new node, you will resize the array and assign
the fields differently, but the same key will still hash to the same
field and will therefore hash to a different node.

Nodes must be hashed onto the ring as well, statically chopping up the
ring and dividing it among nodes isn't enough for consistency.

Cheers,
Nejc



The Ironic hash_ring implementation uses a hash:
 def _get_partition(self, data):
 try:
 return (struct.unpack_from('I',
hashlib.md5(data).digest())[0]
  self.partition_shift)
 except TypeError:
 raise exception.Invalid(
 _(Invalid data supplied to HashRing.get_hosts.))


so I don't see the fixed size thing you're referring to. Could you
point a little more specifically? Thanks!

-Rob

On 1 September 2014 19:48, Nejc Saje ns...@redhat.com wrote:

Hey guys,

in Ceilometer we're using consistent hash rings to do workload
partitioning[1]. We've considered generalizing your hash ring
implementation
and moving it up to oslo, but unfortunately your implementation is not
actually consistent, which is our requirement.

Since you divide your ring into a number of equal sized partitions,
instead
of hashing hosts onto the ring, when you add a new host,
an unbound amount of keys get re-mapped to different hosts (instead
of the
1/#nodes remapping guaranteed by hash ring). I've confirmed this with
the
test in aforementioned patch[2].

If this is good enough for your use-case, great, otherwise we can get a
generalized hash ring implementation into oslo for use in both
projects or
we can both use an external library[3].

Cheers,
Nejc

[1] https://review.openstack.org/#/c/113549/
[2]
https://review.openstack.org/#/c/113549/21/ceilometer/tests/test_utils.py

[3] https://pypi.python.org/pypi/hash_ring

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev






___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation

2014-09-03 Thread Eoghan Glynn


 On 09/02/2014 11:33 PM, Robert Collins wrote:
  The implementation in ceilometer is very different to the Ironic one -
  are you saying the test you linked fails with Ironic, or that it fails
  with the ceilometer code today?
 
 Disclaimer: in Ironic terms, node = conductor, key = host
 
 The test I linked fails with Ironic hash ring code (specifically the
 part that tests consistency). With 1000 keys being mapped to 10 nodes,
 when you add a node:
 - current ceilometer code remaps around 7% of the keys ( 1/#nodes)
 - Ironic code remaps  90% of the keys

So just to underscore what Nejc is saying here ... 

The key point is the proportion of such baremetal-nodes that would
end up being re-assigned when a new conductor is fired up.

The defining property of a consistent hash-ring is that it
significantly reduces the number of re-mappings that occur when
the number of buckets change, when compared to a plain ol' hash.

This desire for stickiness would often be motivated by some form
of statefulness or initialization cost. In the ceilometer case,
we want to minimize unnecessary re-assignment so as to keep the
cadence of meter collection and alarm evaluation as even as
possible (given that each agent will be running off
non-synchronized periodic tasks).

In the case of the Ironic baremetal-node to conductor mapping,
perhaps such stickiness isn't really of any benefit?

If so, that's fine, but Nejc's main point UUIC is that an
approach that doesn't minimize the number of re-mappings isn't
really a form of *consistent* hashing.

Cheers,
Eoghan
 
 The problem lies in the way you build your hash ring[1]. You initialize
 a statically-sized array and divide its fields among nodes. When you do
 a lookup, you check which field in the array the key hashes to and then
 return the node that that field belongs to. This is the wrong approach
 because when you add a new node, you will resize the array and assign
 the fields differently, but the same key will still hash to the same
 field and will therefore hash to a different node.
 
 Nodes must be hashed onto the ring as well, statically chopping up the
 ring and dividing it among nodes isn't enough for consistency.
 
 Cheers,
 Nejc
 
 
  The Ironic hash_ring implementation uses a hash:
   def _get_partition(self, data):
   try:
   return (struct.unpack_from('I',
   hashlib.md5(data).digest())[0]
self.partition_shift)
   except TypeError:
   raise exception.Invalid(
   _(Invalid data supplied to HashRing.get_hosts.))
 
 
  so I don't see the fixed size thing you're referring to. Could you
  point a little more specifically? Thanks!
 
  -Rob
 
  On 1 September 2014 19:48, Nejc Saje ns...@redhat.com wrote:
  Hey guys,
 
  in Ceilometer we're using consistent hash rings to do workload
  partitioning[1]. We've considered generalizing your hash ring
  implementation
  and moving it up to oslo, but unfortunately your implementation is not
  actually consistent, which is our requirement.
 
  Since you divide your ring into a number of equal sized partitions,
  instead
  of hashing hosts onto the ring, when you add a new host,
  an unbound amount of keys get re-mapped to different hosts (instead of the
  1/#nodes remapping guaranteed by hash ring). I've confirmed this with the
  test in aforementioned patch[2].
 
  If this is good enough for your use-case, great, otherwise we can get a
  generalized hash ring implementation into oslo for use in both projects or
  we can both use an external library[3].
 
  Cheers,
  Nejc
 
  [1] https://review.openstack.org/#/c/113549/
  [2]
  https://review.openstack.org/#/c/113549/21/ceilometer/tests/test_utils.py
  [3] https://pypi.python.org/pypi/hash_ring
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation

2014-09-03 Thread Lucas Alvares Gomes
On Wed, Sep 3, 2014 at 12:50 PM, Nejc Saje ns...@redhat.com wrote:


 On 09/02/2014 11:33 PM, Robert Collins wrote:

 The implementation in ceilometer is very different to the Ironic one -
 are you saying the test you linked fails with Ironic, or that it fails
 with the ceilometer code today?


 Disclaimer: in Ironic terms, node = conductor, key = host

 The test I linked fails with Ironic hash ring code (specifically the part
 that tests consistency). With 1000 keys being mapped to 10 nodes, when you
 add a node:
 - current ceilometer code remaps around 7% of the keys ( 1/#nodes)
 - Ironic code remaps  90% of the keys

Thanks Nejc for posting it here, I'm not super familiar with the
consistent hashing code but I def take a look at it. IMO this behavior
is not desirable in Ironic at all, we don't want  90% of the hash to
get remapped when a conductor join or leave the cluster.

Also it would be good to open a bug about this problem in Ironic so we
can track the progress there (I can open it if you want, but I will
first do some investigation to understand the problem better).

Cheers,
Lucas

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation

2014-09-03 Thread Robert Collins
On 3 September 2014 23:50, Nejc Saje ns...@redhat.com wrote:

Forgive my slowness :).

 Disclaimer: in Ironic terms, node = conductor, key = host

Sadly not inside the hash_ring code :/. host == conductor, key == data.

 The test I linked fails with Ironic hash ring code (specifically the part
 that tests consistency). With 1000 keys being mapped to 10 nodes, when you
 add a node:
 - current ceilometer code remaps around 7% of the keys ( 1/#nodes)
 - Ironic code remaps  90% of the keys

Ok thats pretty definitive and definitely not intended. However
remapping 7% when adding 10% capacity is also undesirable - we'd like
to remap 1/11 - +-9%.

 The problem lies in the way you build your hash ring[1]. You initialize a
 statically-sized array and divide its fields among nodes. When you do a
 lookup, you check which field in the array the key hashes to and then return
 the node that that field belongs to. This is the wrong approach because when
 you add a new node, you will resize the array and assign the fields
 differently, but the same key will still hash to the same field and will
 therefore hash to a different node.

You're referring to part2host where we round-robin using mod to map a
partition (hash value of key) to a host(conductor). Then when we add a
conductor this entire map scales out - yup I see the issue.

Have you filed a bug for this?

 Nodes must be hashed onto the ring as well, statically chopping up the ring
 and dividing it among nodes isn't enough for consistency.

 Cheers,
 Nejc



 The Ironic hash_ring implementation uses a hash:
  def _get_partition(self, data):
  try:
  return (struct.unpack_from('I',
 hashlib.md5(data).digest())[0]
   self.partition_shift)
  except TypeError:
  raise exception.Invalid(
  _(Invalid data supplied to HashRing.get_hosts.))


 so I don't see the fixed size thing you're referring to. Could you
 point a little more specifically? Thanks!

 -Rob

 On 1 September 2014 19:48, Nejc Saje ns...@redhat.com wrote:

 Hey guys,

 in Ceilometer we're using consistent hash rings to do workload
 partitioning[1]. We've considered generalizing your hash ring
 implementation
 and moving it up to oslo, but unfortunately your implementation is not
 actually consistent, which is our requirement.

 Since you divide your ring into a number of equal sized partitions,
 instead
 of hashing hosts onto the ring, when you add a new host,
 an unbound amount of keys get re-mapped to different hosts (instead of
 the
 1/#nodes remapping guaranteed by hash ring). I've confirmed this with the
 test in aforementioned patch[2].

 If this is good enough for your use-case, great, otherwise we can get a
 generalized hash ring implementation into oslo for use in both projects
 or
 we can both use an external library[3].

 Cheers,
 Nejc

 [1] https://review.openstack.org/#/c/113549/
 [2]
 https://review.openstack.org/#/c/113549/21/ceilometer/tests/test_utils.py
 [3] https://pypi.python.org/pypi/hash_ring

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev





 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation

2014-09-03 Thread Robert Collins
On 4 September 2014 00:13, Eoghan Glynn egl...@redhat.com wrote:


 On 09/02/2014 11:33 PM, Robert Collins wrote:
  The implementation in ceilometer is very different to the Ironic one -
  are you saying the test you linked fails with Ironic, or that it fails
  with the ceilometer code today?

 Disclaimer: in Ironic terms, node = conductor, key = host

 The test I linked fails with Ironic hash ring code (specifically the
 part that tests consistency). With 1000 keys being mapped to 10 nodes,
 when you add a node:
 - current ceilometer code remaps around 7% of the keys ( 1/#nodes)
 - Ironic code remaps  90% of the keys

 So just to underscore what Nejc is saying here ...

 The key point is the proportion of such baremetal-nodes that would
 end up being re-assigned when a new conductor is fired up.

That was 100% clear, but thanks for making sure.

The question was getting a proper understanding of why it was
happening in Ironic.

The ceilometer hashring implementation is good, but it uses the same
terms very differently (e.g. replicas for partitions) - I'm adapting
the key fix back into Ironic - I'd like to see us converge on a single
implementation, and making sure the Ironic one is suitable for
ceilometer seems applicable here (since ceilometer seems to need less
from the API),

If reassigning was cheap Ironic wouldn't have bothered having a hash ring :)

-Rob


-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation

2014-09-03 Thread Robert Collins
On 4 September 2014 08:13, Robert Collins robe...@robertcollins.net wrote:
 On 3 September 2014 23:50, Nejc Saje ns...@redhat.com wrote:

 Forgive my slowness :).

 Disclaimer: in Ironic terms, node = conductor, key = host

 Sadly not inside the hash_ring code :/. host == conductor, key == data.

 The test I linked fails with Ironic hash ring code (specifically the part
 that tests consistency). With 1000 keys being mapped to 10 nodes, when you
 add a node:
 - current ceilometer code remaps around 7% of the keys ( 1/#nodes)
 - Ironic code remaps  90% of the keys

 Ok thats pretty definitive and definitely not intended. However
 remapping 7% when adding 10% capacity is also undesirable - we'd like
 to remap 1/11 - +-9%.

 The problem lies in the way you build your hash ring[1]. You initialize a
 statically-sized array and divide its fields among nodes. When you do a
 lookup, you check which field in the array the key hashes to and then return
 the node that that field belongs to. This is the wrong approach because when
 you add a new node, you will resize the array and assign the fields
 differently, but the same key will still hash to the same field and will
 therefore hash to a different node.

 You're referring to part2host where we round-robin using mod to map a
 partition (hash value of key) to a host(conductor). Then when we add a
 conductor this entire map scales out - yup I see the issue.

 Have you filed a bug for this?

https://review.openstack.org/118932 has an equivalent test, which
failed before the fixes were applied to the Ironic hash ring
implementation.

Cheers,
Rob


-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation

2014-09-02 Thread Gregory Haynes
Excerpts from Nejc Saje's message of 2014-09-01 07:48:46 +:
 Hey guys,
 
 in Ceilometer we're using consistent hash rings to do workload 
 partitioning[1]. We've considered generalizing your hash ring 
 implementation and moving it up to oslo, but unfortunately your 
 implementation is not actually consistent, which is our requirement.
 
 Since you divide your ring into a number of equal sized partitions, 
 instead of hashing hosts onto the ring, when you add a new host,
 an unbound amount of keys get re-mapped to different hosts (instead of 
 the 1/#nodes remapping guaranteed by hash ring). I've confirmed this 
 with the test in aforementioned patch[2].

I am just getting started with the ironic hash ring code, but this seems
surprising to me. AIUI we do require some rebalancing when a conductor
is removed or added (which is normal use of a CHT) but not for every
host added. This is supported by the fact that we currently dont have a
rebalancing routine, so I would be surprised if ironic worked at all if
we required it for each host that is added.

Can anyone in Ironic with a bit more experience confirm/deny this?

 
 If this is good enough for your use-case, great, otherwise we can get a 
 generalized hash ring implementation into oslo for use in both projects 
 or we can both use an external library[3].
 
 Cheers,
 Nejc
 
 [1] https://review.openstack.org/#/c/113549/
 [2] 
 https://review.openstack.org/#/c/113549/21/ceilometer/tests/test_utils.py
 [3] https://pypi.python.org/pypi/hash_ring
 

Thanks,
Greg

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation

2014-09-01 Thread Nejc Saje

Hey guys,

in Ceilometer we're using consistent hash rings to do workload 
partitioning[1]. We've considered generalizing your hash ring 
implementation and moving it up to oslo, but unfortunately your 
implementation is not actually consistent, which is our requirement.


Since you divide your ring into a number of equal sized partitions, 
instead of hashing hosts onto the ring, when you add a new host,
an unbound amount of keys get re-mapped to different hosts (instead of 
the 1/#nodes remapping guaranteed by hash ring). I've confirmed this 
with the test in aforementioned patch[2].


If this is good enough for your use-case, great, otherwise we can get a 
generalized hash ring implementation into oslo for use in both projects 
or we can both use an external library[3].


Cheers,
Nejc

[1] https://review.openstack.org/#/c/113549/
[2] 
https://review.openstack.org/#/c/113549/21/ceilometer/tests/test_utils.py

[3] https://pypi.python.org/pypi/hash_ring

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev