Re: Hashing With Consistent Results

2015-08-13 Thread Christian Weilbach
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Atamert,

sorry for replying late.

On 11.08.2015 10:29, Atamert Ölçgen wrote:
 Hi Christian,
 
 hasch looks nice, I might end up just using it. I will be hashing
 smaller collections (maps where keys are keywords and values are
 atomic data like integers).

Ok, then io will probably hurt you much more than some overhead for
hashing, I guess.

 
 Collisions BTW are not such a big deal for my use case. I will have
 a limited number of fragments (buckets, index pages, etc.) anyway.
 65536 of them perhaps. The more I think about the problem the more
 I realize I am implementing some sort of hash map.

I guess a durable one. In this case it might be interesting to think
about extending the persistent datastructures of Clojure in a way to
keep them on disk. I am currently experimenting a bit with that on
hash-maps of a commit graph, although I need it to work in
ClojureScript as well and cannot just reimplement core protocols
because of async io.
That way changing metadata of my datatype can have constant size.

Feel free to post any feedback on your progress/findings :).

Christian
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBAgAGBQJVzITYAAoJEKel+aujRZMkZX4H/j5kpqolsS61y2IH+68Bq55/
sdiME1eXdJ4VgYQH+IF4WDTYtPGZgV7U3XnM7Bqc5SygkGxOvDc5p4piTeSfpJIh
HE8GkBP8RkQNU5rqKu0M6xeSJGQdnAp/1VzQdJux9KAC2+0RG+SLKKft95zka9iz
PHDy+n/m8qTrMSUjpk2tVxuglyjkGaQeBm9bfRN07Cn/96e9XcafzsekMwZiI8HU
70n5ACbBWFXz5zxe0xfoUdA48OJSXrnoQTCmA95zOLnZ9thHgs066jjXCjNtomzD
NRhx7J9hi4lU54VmRcYJb4mVw5JLXQCWnARh8//o6P2SbmYFJkDIsCLtPJ9xJu4=
=H4lz
-END PGP SIGNATURE-

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Hashing With Consistent Results

2015-08-13 Thread Sam Raker
Could you use something like Redis? Use hashes as keys, fake immutability 
by 'popping' kv pairs and inserting new ones keyed to the (presumably 
different) hash of the updated map.

On Thursday, August 13, 2015 at 7:52:06 AM UTC-4, Christian Weilbach wrote:

 -BEGIN PGP SIGNED MESSAGE- 
 Hash: SHA1 

 Hi Atamert, 

 sorry for replying late. 

 On 11.08.2015 10:29, Atamert Ölçgen wrote: 
  Hi Christian, 
  
  hasch looks nice, I might end up just using it. I will be hashing 
  smaller collections (maps where keys are keywords and values are 
  atomic data like integers). 

 Ok, then io will probably hurt you much more than some overhead for 
 hashing, I guess. 

  
  Collisions BTW are not such a big deal for my use case. I will have 
  a limited number of fragments (buckets, index pages, etc.) anyway. 
  65536 of them perhaps. The more I think about the problem the more 
  I realize I am implementing some sort of hash map. 

 I guess a durable one. In this case it might be interesting to think 
 about extending the persistent datastructures of Clojure in a way to 
 keep them on disk. I am currently experimenting a bit with that on 
 hash-maps of a commit graph, although I need it to work in 
 ClojureScript as well and cannot just reimplement core protocols 
 because of async io. 
 That way changing metadata of my datatype can have constant size. 

 Feel free to post any feedback on your progress/findings :). 

 Christian 
 -BEGIN PGP SIGNATURE- 
 Version: GnuPG v1 

 iQEcBAEBAgAGBQJVzITYAAoJEKel+aujRZMkZX4H/j5kpqolsS61y2IH+68Bq55/ 
 sdiME1eXdJ4VgYQH+IF4WDTYtPGZgV7U3XnM7Bqc5SygkGxOvDc5p4piTeSfpJIh 
 HE8GkBP8RkQNU5rqKu0M6xeSJGQdnAp/1VzQdJux9KAC2+0RG+SLKKft95zka9iz 
 PHDy+n/m8qTrMSUjpk2tVxuglyjkGaQeBm9bfRN07Cn/96e9XcafzsekMwZiI8HU 
 70n5ACbBWFXz5zxe0xfoUdA48OJSXrnoQTCmA95zOLnZ9thHgs066jjXCjNtomzD 
 NRhx7J9hi4lU54VmRcYJb4mVw5JLXQCWnARh8//o6P2SbmYFJkDIsCLtPJ9xJu4= 
 =H4lz 
 -END PGP SIGNATURE- 


-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Hashing With Consistent Results

2015-08-11 Thread Atamert Ölçgen
Hi Christian,

hasch looks nice, I might end up just using it. I will be hashing smaller
collections
(maps where keys are keywords and values are atomic data like integers).

Collisions BTW are not such a big deal for my use case. I will have a
limited number
of fragments (buckets, index pages, etc.) anyway. 65536 of them perhaps.
The more
I think about the problem the more I realize I am implementing some sort of
hash map.


On Mon, Aug 10, 2015 at 3:49 PM, Christian Weilbach 
whitesp...@polyc0l0r.net wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hi,

 I am the author of https://github.com/whilo/hasch

 Would calling hasch.core/edn-hash satisfy your performance
 requirements? I tried hard to make the recursion of the protocol
 performant, but hashing a value is slower than the time needed to
 write the data to disk for big collections. You should pick a faster
 message-digest like you suggested, e.g. MD5:

 (defn ^MessageDigest md5-message-digest []
   (MessageDigest/getInstance md5))

 (edn-hash {:foo Bar :baz 5} md5-message-digest)

 You can use the criterium benchmarking snippets in platform.clj to do
 benchmarks. Object.hashCode() is a lot faster still and caches the
 result, I am not sure how much overhead the protocol dispatch causes.

 Note that if some collisions are ok for you, you might find a better
 tradeoff, since atm. commutative collections like maps and sets are
 hashed key-value wise and then XOR'd for safety. I am interested in
 your findings and decision, especially if you pick something else.

 Christian

 On 10.08.2015 09:00, Atamert Ölçgen wrote:
  Hi,
 
  I need a way to reduce a compound value, say {:foo bar}, into a
  number (like 693d9a0698aff95c in hex). I don't necessarily need a
  very large hash space, 7 hex digits is good enough for my purposes.
  But I need this hash to be consistent between runs and JVM versions
  etc. So I guess that rules out standard object hashes.
 
  I would like to find a sufficiently fast way to do this. I can live
  with MD5, but are there faster alternatives (but produce smaller
  hashes)? ( clj-digest https://github.com/tebeka/clj-digest
  provides a nice interface to what Java provides but there are only
  usual suspects AFAICS
  
 http://docs.oracle.com/javase/7/docs/technotes/guides/security/StandardNames.html#MessageDigest
 
 
 
 )
 
  I will be dealing with unordered collections, but it seems hashing
  is consistent when the input order is changed:
 
  user= (.hashCode {:foo Bar :baz 5}) 2040536238 user= (.hashCode
  {:baz 5 :foo Bar}) 2040536238
 
 
  (It even gave the same hash code in different runs.)
 
  I will use these hashes to build index tables. My data, that
  contains these things I hash is a set. I will store this as an
  ordered set and keep an index pointing to where records from this
  hash to that hash lives. This is all Clojure, but I can't keep all
  my data in memory. (So Clojure's persistent data structures is out
  of the picture. life would've been much simpler if I could.)
 
  Thanks for reading. Any insight is appreciated.
 


 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1

 iQEcBAEBAgAGBQJVyJ3vAAoJEKel+aujRZMkbhMIAJ61DGUWM9JoN/JcIxvh2Jph
 VohlWbr1yw69D+x4guGOk5AXUh7HMAkmlbuc+YRRnYqGhZtc3r/6C/d/aa5faBAh
 NdIeDa8yNHTAuYERDktfviy+q5a/blJRdvIIe7ntyjpDZyd2gD1AwUGYOKctXipS
 wMPan7v7yPfPlFfnl+VVXfP8yx/LWyZbwfu0Ugv2B2NhvqPMu8joyondOz7GPcLd
 P7EgpIrvfQAElA4c4+UB0BEeJkn+fnpYF3QLJIy5oQny5QwbVtxgVuUNES8EolYl
 HkpFY1ECV/M65fvP6wrcYPihuphSYQoPkfY4ZQfzWCq9mo+3Aj1Jq2u7QfG9HxM=
 =1UE6
 -END PGP SIGNATURE-

 --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with
 your first post.
 To unsubscribe from this group, send email to
 clojure+unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/clojure?hl=en
 ---
 You received this message because you are subscribed to the Google Groups
 Clojure group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to clojure+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.




-- 
Kind Regards,
Atamert Ölçgen

◻◼◻
◻◻◼
◼◼◼

www.muhuk.com

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit 

Re: Hashing With Consistent Results

2015-08-10 Thread Christian Weilbach
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

I am the author of https://github.com/whilo/hasch

Would calling hasch.core/edn-hash satisfy your performance
requirements? I tried hard to make the recursion of the protocol
performant, but hashing a value is slower than the time needed to
write the data to disk for big collections. You should pick a faster
message-digest like you suggested, e.g. MD5:

(defn ^MessageDigest md5-message-digest []
  (MessageDigest/getInstance md5))

(edn-hash {:foo Bar :baz 5} md5-message-digest)

You can use the criterium benchmarking snippets in platform.clj to do
benchmarks. Object.hashCode() is a lot faster still and caches the
result, I am not sure how much overhead the protocol dispatch causes.

Note that if some collisions are ok for you, you might find a better
tradeoff, since atm. commutative collections like maps and sets are
hashed key-value wise and then XOR'd for safety. I am interested in
your findings and decision, especially if you pick something else.

Christian

On 10.08.2015 09:00, Atamert Ölçgen wrote:
 Hi,
 
 I need a way to reduce a compound value, say {:foo bar}, into a
 number (like 693d9a0698aff95c in hex). I don't necessarily need a
 very large hash space, 7 hex digits is good enough for my purposes.
 But I need this hash to be consistent between runs and JVM versions
 etc. So I guess that rules out standard object hashes.
 
 I would like to find a sufficiently fast way to do this. I can live
 with MD5, but are there faster alternatives (but produce smaller
 hashes)? ( clj-digest https://github.com/tebeka/clj-digest
 provides a nice interface to what Java provides but there are only
 usual suspects AFAICS 
 http://docs.oracle.com/javase/7/docs/technotes/guides/security/StandardNames.html#MessageDigest

 
)
 
 I will be dealing with unordered collections, but it seems hashing
 is consistent when the input order is changed:
 
 user= (.hashCode {:foo Bar :baz 5}) 2040536238 user= (.hashCode
 {:baz 5 :foo Bar}) 2040536238
 
 
 (It even gave the same hash code in different runs.)
 
 I will use these hashes to build index tables. My data, that
 contains these things I hash is a set. I will store this as an
 ordered set and keep an index pointing to where records from this
 hash to that hash lives. This is all Clojure, but I can't keep all
 my data in memory. (So Clojure's persistent data structures is out
 of the picture. life would've been much simpler if I could.)
 
 Thanks for reading. Any insight is appreciated.
 


-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBAgAGBQJVyJ3vAAoJEKel+aujRZMkbhMIAJ61DGUWM9JoN/JcIxvh2Jph
VohlWbr1yw69D+x4guGOk5AXUh7HMAkmlbuc+YRRnYqGhZtc3r/6C/d/aa5faBAh
NdIeDa8yNHTAuYERDktfviy+q5a/blJRdvIIe7ntyjpDZyd2gD1AwUGYOKctXipS
wMPan7v7yPfPlFfnl+VVXfP8yx/LWyZbwfu0Ugv2B2NhvqPMu8joyondOz7GPcLd
P7EgpIrvfQAElA4c4+UB0BEeJkn+fnpYF3QLJIy5oQny5QwbVtxgVuUNES8EolYl
HkpFY1ECV/M65fvP6wrcYPihuphSYQoPkfY4ZQfzWCq9mo+3Aj1Jq2u7QfG9HxM=
=1UE6
-END PGP SIGNATURE-

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Hashing With Consistent Results

2015-08-10 Thread Atamert Ölçgen
Hi,

I need a way to reduce a compound value, say {:foo bar}, into a number
(like 693d9a0698aff95c in hex). I don't necessarily need a very large hash
space, 7 hex digits is good enough for my purposes. But I need
this hash to be consistent between runs and JVM versions etc. So I guess
that rules out standard object hashes.

I would like to find a sufficiently fast way to do this. I can live with
MD5, but are there faster alternatives (but produce smaller hashes)? (
clj-digest https://github.com/tebeka/clj-digest provides a nice interface
to what Java provides but there are only usual suspects AFAICS
http://docs.oracle.com/javase/7/docs/technotes/guides/security/StandardNames.html#MessageDigest
)

I will be dealing with unordered collections, but it seems hashing is
consistent when the input order is changed:

user= (.hashCode {:foo Bar :baz 5})
2040536238
user= (.hashCode {:baz 5 :foo Bar})
2040536238


(It even gave the same hash code in different runs.)

I will use these hashes to build index tables. My data, that contains these
things I hash is a set. I will store this as an ordered set and keep an
index pointing to where records from this hash to that hash lives. This is
all Clojure, but I can't keep all my data in memory. (So Clojure's
persistent data structures is out of the picture. life would've been much
simpler if I could.)

Thanks for reading. Any insight is appreciated.

-- 
Kind Regards,
Atamert Ölçgen

◻◼◻
◻◻◼
◼◼◼

www.muhuk.com

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.