Re: Hashing With Consistent Results
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Atamert, sorry for replying late. On 11.08.2015 10:29, Atamert Ölçgen wrote: Hi Christian, hasch looks nice, I might end up just using it. I will be hashing smaller collections (maps where keys are keywords and values are atomic data like integers). Ok, then io will probably hurt you much more than some overhead for hashing, I guess. Collisions BTW are not such a big deal for my use case. I will have a limited number of fragments (buckets, index pages, etc.) anyway. 65536 of them perhaps. The more I think about the problem the more I realize I am implementing some sort of hash map. I guess a durable one. In this case it might be interesting to think about extending the persistent datastructures of Clojure in a way to keep them on disk. I am currently experimenting a bit with that on hash-maps of a commit graph, although I need it to work in ClojureScript as well and cannot just reimplement core protocols because of async io. That way changing metadata of my datatype can have constant size. Feel free to post any feedback on your progress/findings :). Christian -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQEcBAEBAgAGBQJVzITYAAoJEKel+aujRZMkZX4H/j5kpqolsS61y2IH+68Bq55/ sdiME1eXdJ4VgYQH+IF4WDTYtPGZgV7U3XnM7Bqc5SygkGxOvDc5p4piTeSfpJIh HE8GkBP8RkQNU5rqKu0M6xeSJGQdnAp/1VzQdJux9KAC2+0RG+SLKKft95zka9iz PHDy+n/m8qTrMSUjpk2tVxuglyjkGaQeBm9bfRN07Cn/96e9XcafzsekMwZiI8HU 70n5ACbBWFXz5zxe0xfoUdA48OJSXrnoQTCmA95zOLnZ9thHgs066jjXCjNtomzD NRhx7J9hi4lU54VmRcYJb4mVw5JLXQCWnARh8//o6P2SbmYFJkDIsCLtPJ9xJu4= =H4lz -END PGP SIGNATURE- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Hashing With Consistent Results
Could you use something like Redis? Use hashes as keys, fake immutability by 'popping' kv pairs and inserting new ones keyed to the (presumably different) hash of the updated map. On Thursday, August 13, 2015 at 7:52:06 AM UTC-4, Christian Weilbach wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Atamert, sorry for replying late. On 11.08.2015 10:29, Atamert Ölçgen wrote: Hi Christian, hasch looks nice, I might end up just using it. I will be hashing smaller collections (maps where keys are keywords and values are atomic data like integers). Ok, then io will probably hurt you much more than some overhead for hashing, I guess. Collisions BTW are not such a big deal for my use case. I will have a limited number of fragments (buckets, index pages, etc.) anyway. 65536 of them perhaps. The more I think about the problem the more I realize I am implementing some sort of hash map. I guess a durable one. In this case it might be interesting to think about extending the persistent datastructures of Clojure in a way to keep them on disk. I am currently experimenting a bit with that on hash-maps of a commit graph, although I need it to work in ClojureScript as well and cannot just reimplement core protocols because of async io. That way changing metadata of my datatype can have constant size. Feel free to post any feedback on your progress/findings :). Christian -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQEcBAEBAgAGBQJVzITYAAoJEKel+aujRZMkZX4H/j5kpqolsS61y2IH+68Bq55/ sdiME1eXdJ4VgYQH+IF4WDTYtPGZgV7U3XnM7Bqc5SygkGxOvDc5p4piTeSfpJIh HE8GkBP8RkQNU5rqKu0M6xeSJGQdnAp/1VzQdJux9KAC2+0RG+SLKKft95zka9iz PHDy+n/m8qTrMSUjpk2tVxuglyjkGaQeBm9bfRN07Cn/96e9XcafzsekMwZiI8HU 70n5ACbBWFXz5zxe0xfoUdA48OJSXrnoQTCmA95zOLnZ9thHgs066jjXCjNtomzD NRhx7J9hi4lU54VmRcYJb4mVw5JLXQCWnARh8//o6P2SbmYFJkDIsCLtPJ9xJu4= =H4lz -END PGP SIGNATURE- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Hashing With Consistent Results
Hi Christian, hasch looks nice, I might end up just using it. I will be hashing smaller collections (maps where keys are keywords and values are atomic data like integers). Collisions BTW are not such a big deal for my use case. I will have a limited number of fragments (buckets, index pages, etc.) anyway. 65536 of them perhaps. The more I think about the problem the more I realize I am implementing some sort of hash map. On Mon, Aug 10, 2015 at 3:49 PM, Christian Weilbach whitesp...@polyc0l0r.net wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, I am the author of https://github.com/whilo/hasch Would calling hasch.core/edn-hash satisfy your performance requirements? I tried hard to make the recursion of the protocol performant, but hashing a value is slower than the time needed to write the data to disk for big collections. You should pick a faster message-digest like you suggested, e.g. MD5: (defn ^MessageDigest md5-message-digest [] (MessageDigest/getInstance md5)) (edn-hash {:foo Bar :baz 5} md5-message-digest) You can use the criterium benchmarking snippets in platform.clj to do benchmarks. Object.hashCode() is a lot faster still and caches the result, I am not sure how much overhead the protocol dispatch causes. Note that if some collisions are ok for you, you might find a better tradeoff, since atm. commutative collections like maps and sets are hashed key-value wise and then XOR'd for safety. I am interested in your findings and decision, especially if you pick something else. Christian On 10.08.2015 09:00, Atamert Ölçgen wrote: Hi, I need a way to reduce a compound value, say {:foo bar}, into a number (like 693d9a0698aff95c in hex). I don't necessarily need a very large hash space, 7 hex digits is good enough for my purposes. But I need this hash to be consistent between runs and JVM versions etc. So I guess that rules out standard object hashes. I would like to find a sufficiently fast way to do this. I can live with MD5, but are there faster alternatives (but produce smaller hashes)? ( clj-digest https://github.com/tebeka/clj-digest provides a nice interface to what Java provides but there are only usual suspects AFAICS http://docs.oracle.com/javase/7/docs/technotes/guides/security/StandardNames.html#MessageDigest ) I will be dealing with unordered collections, but it seems hashing is consistent when the input order is changed: user= (.hashCode {:foo Bar :baz 5}) 2040536238 user= (.hashCode {:baz 5 :foo Bar}) 2040536238 (It even gave the same hash code in different runs.) I will use these hashes to build index tables. My data, that contains these things I hash is a set. I will store this as an ordered set and keep an index pointing to where records from this hash to that hash lives. This is all Clojure, but I can't keep all my data in memory. (So Clojure's persistent data structures is out of the picture. life would've been much simpler if I could.) Thanks for reading. Any insight is appreciated. -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQEcBAEBAgAGBQJVyJ3vAAoJEKel+aujRZMkbhMIAJ61DGUWM9JoN/JcIxvh2Jph VohlWbr1yw69D+x4guGOk5AXUh7HMAkmlbuc+YRRnYqGhZtc3r/6C/d/aa5faBAh NdIeDa8yNHTAuYERDktfviy+q5a/blJRdvIIe7ntyjpDZyd2gD1AwUGYOKctXipS wMPan7v7yPfPlFfnl+VVXfP8yx/LWyZbwfu0Ugv2B2NhvqPMu8joyondOz7GPcLd P7EgpIrvfQAElA4c4+UB0BEeJkn+fnpYF3QLJIy5oQny5QwbVtxgVuUNES8EolYl HkpFY1ECV/M65fvP6wrcYPihuphSYQoPkfY4ZQfzWCq9mo+3Aj1Jq2u7QfG9HxM= =1UE6 -END PGP SIGNATURE- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- Kind Regards, Atamert Ölçgen ◻◼◻ ◻◻◼ ◼◼◼ www.muhuk.com -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit
Re: Hashing With Consistent Results
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, I am the author of https://github.com/whilo/hasch Would calling hasch.core/edn-hash satisfy your performance requirements? I tried hard to make the recursion of the protocol performant, but hashing a value is slower than the time needed to write the data to disk for big collections. You should pick a faster message-digest like you suggested, e.g. MD5: (defn ^MessageDigest md5-message-digest [] (MessageDigest/getInstance md5)) (edn-hash {:foo Bar :baz 5} md5-message-digest) You can use the criterium benchmarking snippets in platform.clj to do benchmarks. Object.hashCode() is a lot faster still and caches the result, I am not sure how much overhead the protocol dispatch causes. Note that if some collisions are ok for you, you might find a better tradeoff, since atm. commutative collections like maps and sets are hashed key-value wise and then XOR'd for safety. I am interested in your findings and decision, especially if you pick something else. Christian On 10.08.2015 09:00, Atamert Ölçgen wrote: Hi, I need a way to reduce a compound value, say {:foo bar}, into a number (like 693d9a0698aff95c in hex). I don't necessarily need a very large hash space, 7 hex digits is good enough for my purposes. But I need this hash to be consistent between runs and JVM versions etc. So I guess that rules out standard object hashes. I would like to find a sufficiently fast way to do this. I can live with MD5, but are there faster alternatives (but produce smaller hashes)? ( clj-digest https://github.com/tebeka/clj-digest provides a nice interface to what Java provides but there are only usual suspects AFAICS http://docs.oracle.com/javase/7/docs/technotes/guides/security/StandardNames.html#MessageDigest ) I will be dealing with unordered collections, but it seems hashing is consistent when the input order is changed: user= (.hashCode {:foo Bar :baz 5}) 2040536238 user= (.hashCode {:baz 5 :foo Bar}) 2040536238 (It even gave the same hash code in different runs.) I will use these hashes to build index tables. My data, that contains these things I hash is a set. I will store this as an ordered set and keep an index pointing to where records from this hash to that hash lives. This is all Clojure, but I can't keep all my data in memory. (So Clojure's persistent data structures is out of the picture. life would've been much simpler if I could.) Thanks for reading. Any insight is appreciated. -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQEcBAEBAgAGBQJVyJ3vAAoJEKel+aujRZMkbhMIAJ61DGUWM9JoN/JcIxvh2Jph VohlWbr1yw69D+x4guGOk5AXUh7HMAkmlbuc+YRRnYqGhZtc3r/6C/d/aa5faBAh NdIeDa8yNHTAuYERDktfviy+q5a/blJRdvIIe7ntyjpDZyd2gD1AwUGYOKctXipS wMPan7v7yPfPlFfnl+VVXfP8yx/LWyZbwfu0Ugv2B2NhvqPMu8joyondOz7GPcLd P7EgpIrvfQAElA4c4+UB0BEeJkn+fnpYF3QLJIy5oQny5QwbVtxgVuUNES8EolYl HkpFY1ECV/M65fvP6wrcYPihuphSYQoPkfY4ZQfzWCq9mo+3Aj1Jq2u7QfG9HxM= =1UE6 -END PGP SIGNATURE- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Hashing With Consistent Results
Hi, I need a way to reduce a compound value, say {:foo bar}, into a number (like 693d9a0698aff95c in hex). I don't necessarily need a very large hash space, 7 hex digits is good enough for my purposes. But I need this hash to be consistent between runs and JVM versions etc. So I guess that rules out standard object hashes. I would like to find a sufficiently fast way to do this. I can live with MD5, but are there faster alternatives (but produce smaller hashes)? ( clj-digest https://github.com/tebeka/clj-digest provides a nice interface to what Java provides but there are only usual suspects AFAICS http://docs.oracle.com/javase/7/docs/technotes/guides/security/StandardNames.html#MessageDigest ) I will be dealing with unordered collections, but it seems hashing is consistent when the input order is changed: user= (.hashCode {:foo Bar :baz 5}) 2040536238 user= (.hashCode {:baz 5 :foo Bar}) 2040536238 (It even gave the same hash code in different runs.) I will use these hashes to build index tables. My data, that contains these things I hash is a set. I will store this as an ordered set and keep an index pointing to where records from this hash to that hash lives. This is all Clojure, but I can't keep all my data in memory. (So Clojure's persistent data structures is out of the picture. life would've been much simpler if I could.) Thanks for reading. Any insight is appreciated. -- Kind Regards, Atamert Ölçgen ◻◼◻ ◻◻◼ ◼◼◼ www.muhuk.com -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.