Re: Duplicate key exception reading map that was written to a file

2015-11-25 Thread Ghadi Shayban
While in memory before writing, are the hash codes for the "duplicate" keys 
the same?   You can call (hash) on the keys.  I'm thinking there is perhaps 
an issue with unicode string serialization...  Are the question marks a 
particular character?

If you can find the similar strings in memory, before they are written, 
call:
(map int  the-string)
To see the actual unicode characters for the question marks.

On Wednesday, November 25, 2015 at 11:07:34 PM UTC-5, Dave Kincaid wrote:
>
> The number of keys in the map is 8,054,160.
>
> On Wednesday, November 25, 2015 at 10:04:11 PM UTC-6, Dave Kincaid wrote:
>>
>> I have something very strange going on when I try to write a map out to a 
>> file and read it back in. It's a perfectly fine hash-map with ? 
>> key/values (so it's pretty big). When I write the map out to a file using
>>
>> (spit "/tmp/mednotes6153968756847768349/repl-write.edn" (pr-str phrases))
>>
>> and then read it back in with
>>
>> (edn/read (PushbackReader. (io/reader 
>> "/tmp/mednotes6153968756847768349/repl-write.edn")))
>>
>> I am getting a duplicate key exception indicating that "? 5" is 
>> duplicated. phrases is a clojure.lang.PersistentHashMap. The keys of the 
>> map are strings and the values are numbers. When I get the value for "? 5" 
>> from the map it returns 352.
>>
>> I tried to grep the file to find the occurrences of the key "? 5" (and 
>> the 30 characters before and after it) and it seems to return 4 of them. 
>> The second one is the right one from the map, but I have no idea where the 
>> other 3 are coming from.
>>
>> [/tmp/mednotes6153968756847768349]> egrep -o ".{30}\"\? 5\" .{30}" 
>> repl-write.edn 
>> hasing a toothbrush for" 160, "? 5" 32, ". ) during his /" 32, "to
>>  "is intact with sutures" 32, "? 5" 352, "4.81 pounds" 128, "ceren
>> udden" 32, "being up all" 32, "? 5" 32, "limited financial means" 
>> , "count , everytime she" 32, "? 5" 32, "had a partial mandibulect
>>
>> Does anyone have an idea what might be happening when the map is written 
>> out to the file? How is that key getting duplicated?
>>
>> I have tried a few slightly different ways of writing to the file 
>> including
>>
>> (spit "/tmp/mednotes6153968756847768349/repl-write.edn" (binding 
>> [*print-dup* true] (pr-str phrases)))
>>
>> and
>>
>> (spit "/tmp/mednotes6153968756847768349/repl-write.edn" (.toString 
>> phrases))
>>
>> based on some StackOverflow answers I found. They all seem to do the same 
>> thing.
>>
>> Here is the exception stack trace.
>>
>> 1. Caused by java.lang.IllegalArgumentException
>>Duplicate key: ? 5
>>
>> PersistentHashMap.java:   67 
>>  clojure.lang.PersistentHashMap/createWithCheck
>>RT.java: 1538  clojure.lang.RT/map
>> EdnReader.java:  631 
>>  clojure.lang.EdnReader$MapReader/invoke
>> EdnReader.java:  142  clojure.lang.EdnReader/read
>> EdnReader.java:  108  clojure.lang.EdnReader/read
>>edn.clj:   35  clojure.edn/read
>>edn.clj:   33  clojure.edn/read
>>   AFn.java:  154  clojure.lang.AFn/applyToHelper
>>   AFn.java:  144  clojure.lang.AFn/applyTo
>>  Compiler.java: 3623 
>>  clojure.lang.Compiler$InvokeExpr/eval
>>  Compiler.java:  439  clojure.lang.Compiler$DefExpr/eval
>>  Compiler.java: 6787  clojure.lang.Compiler/eval
>>  Compiler.java: 6745  clojure.lang.Compiler/eval
>>   core.clj: 3081  clojure.core/eval
>>   main.clj:  240  clojure.main/repl/read-eval-print/fn
>>   main.clj:  240  clojure.main/repl/read-eval-print
>>   main.clj:  258  clojure.main/repl/fn
>>   main.clj:  258  clojure.main/repl
>>RestFn.java: 1523  clojure.lang.RestFn/invoke
>> interruptible_eval.clj:   58 
>>  clojure.tools.nrepl.middleware.interruptible-eval/evaluate/fn
>>   AFn.java:  152  clojure.lang.AFn/applyToHelper
>>   AFn.java:  144  clojure.lang.AFn/applyTo
>>   core.clj:  630  clojure.core/apply
>>   core.clj: 1868  clojure.core/with-bindings*
>>RestFn.java:  425  clojure.lang.RestFn/invoke
>> interruptible_eval.clj:   56 
>>  clojure.tools.nrepl.middleware.interruptible-eval/evaluate
>> interruptible_eval.clj:  191 
>>  clojure.tools.nrepl.middleware.interruptible-eval/interruptible-eval/fn/fn
>> interruptible_eval.clj:  159 
>>  clojure.tools.nrepl.middleware.interruptible-eval/run-next/fn
>>   AFn.java:   22  clojure.lang.AFn/run
>>ThreadPoolExecutor.java: 1142 
>>  java.util.concurrent.ThreadPoolExecutor/runWorker
>>ThreadPoolExecutor.java:  617 
>>  java.util.concurrent.ThreadPoolExecutor$Worker/run
>>Thread.java:  745  

Re: Duplicate key exception reading map that was written to a file

2015-11-25 Thread Dave Kincaid
The number of keys in the map is 8,054,160.

On Wednesday, November 25, 2015 at 10:04:11 PM UTC-6, Dave Kincaid wrote:
>
> I have something very strange going on when I try to write a map out to a 
> file and read it back in. It's a perfectly fine hash-map with ? 
> key/values (so it's pretty big). When I write the map out to a file using
>
> (spit "/tmp/mednotes6153968756847768349/repl-write.edn" (pr-str phrases))
>
> and then read it back in with
>
> (edn/read (PushbackReader. (io/reader 
> "/tmp/mednotes6153968756847768349/repl-write.edn")))
>
> I am getting a duplicate key exception indicating that "? 5" is 
> duplicated. phrases is a clojure.lang.PersistentHashMap. The keys of the 
> map are strings and the values are numbers. When I get the value for "? 5" 
> from the map it returns 352.
>
> I tried to grep the file to find the occurrences of the key "? 5" (and the 
> 30 characters before and after it) and it seems to return 4 of them. The 
> second one is the right one from the map, but I have no idea where the 
> other 3 are coming from.
>
> [/tmp/mednotes6153968756847768349]> egrep -o ".{30}\"\? 5\" .{30}" 
> repl-write.edn 
> hasing a toothbrush for" 160, "? 5" 32, ". ) during his /" 32, "to
>  "is intact with sutures" 32, "? 5" 352, "4.81 pounds" 128, "ceren
> udden" 32, "being up all" 32, "? 5" 32, "limited financial means" 
> , "count , everytime she" 32, "? 5" 32, "had a partial mandibulect
>
> Does anyone have an idea what might be happening when the map is written 
> out to the file? How is that key getting duplicated?
>
> I have tried a few slightly different ways of writing to the file including
>
> (spit "/tmp/mednotes6153968756847768349/repl-write.edn" (binding 
> [*print-dup* true] (pr-str phrases)))
>
> and
>
> (spit "/tmp/mednotes6153968756847768349/repl-write.edn" (.toString 
> phrases))
>
> based on some StackOverflow answers I found. They all seem to do the same 
> thing.
>
> Here is the exception stack trace.
>
> 1. Caused by java.lang.IllegalArgumentException
>Duplicate key: ? 5
>
> PersistentHashMap.java:   67 
>  clojure.lang.PersistentHashMap/createWithCheck
>RT.java: 1538  clojure.lang.RT/map
> EdnReader.java:  631 
>  clojure.lang.EdnReader$MapReader/invoke
> EdnReader.java:  142  clojure.lang.EdnReader/read
> EdnReader.java:  108  clojure.lang.EdnReader/read
>edn.clj:   35  clojure.edn/read
>edn.clj:   33  clojure.edn/read
>   AFn.java:  154  clojure.lang.AFn/applyToHelper
>   AFn.java:  144  clojure.lang.AFn/applyTo
>  Compiler.java: 3623  clojure.lang.Compiler$InvokeExpr/eval
>  Compiler.java:  439  clojure.lang.Compiler$DefExpr/eval
>  Compiler.java: 6787  clojure.lang.Compiler/eval
>  Compiler.java: 6745  clojure.lang.Compiler/eval
>   core.clj: 3081  clojure.core/eval
>   main.clj:  240  clojure.main/repl/read-eval-print/fn
>   main.clj:  240  clojure.main/repl/read-eval-print
>   main.clj:  258  clojure.main/repl/fn
>   main.clj:  258  clojure.main/repl
>RestFn.java: 1523  clojure.lang.RestFn/invoke
> interruptible_eval.clj:   58 
>  clojure.tools.nrepl.middleware.interruptible-eval/evaluate/fn
>   AFn.java:  152  clojure.lang.AFn/applyToHelper
>   AFn.java:  144  clojure.lang.AFn/applyTo
>   core.clj:  630  clojure.core/apply
>   core.clj: 1868  clojure.core/with-bindings*
>RestFn.java:  425  clojure.lang.RestFn/invoke
> interruptible_eval.clj:   56 
>  clojure.tools.nrepl.middleware.interruptible-eval/evaluate
> interruptible_eval.clj:  191 
>  clojure.tools.nrepl.middleware.interruptible-eval/interruptible-eval/fn/fn
> interruptible_eval.clj:  159 
>  clojure.tools.nrepl.middleware.interruptible-eval/run-next/fn
>   AFn.java:   22  clojure.lang.AFn/run
>ThreadPoolExecutor.java: 1142 
>  java.util.concurrent.ThreadPoolExecutor/runWorker
>ThreadPoolExecutor.java:  617 
>  java.util.concurrent.ThreadPoolExecutor$Worker/run
>Thread.java:  745  java.lang.Thread/run
>
>
>
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving 

Re: Duplicate key exception reading map that was written to a file

2015-11-25 Thread Ghadi Shayban
Does the phrases value in memory exactly match the payload roundtripped through 
Avro?

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Duplicate key exception reading map that was written to a file

2015-11-25 Thread Dave Kincaid
I just tried outputting the map to an Avro file and read it back in. This 
works fine. That tells me that there is something wrong with the way that 
I'm trying to write the EDN file somehow.

Here is the code I used to output to Avro and read back:

(def schema (avro/parse-schema {:type :map :values :long}))
(with-open [out-file (avro/data-file-writer schema 
"/tmp/mednotes6153968756847768349/repl-write.avro")] (.append out-file 
phrases))
(def ps (with-open [in-file (avro/data-file-reader 
"/tmp/mednotes6153968756847768349/repl-write.avro")] (doall (seq in-file

I'm using the excellent abracad library :refer'd as avro.


On Wednesday, November 25, 2015 at 10:40:53 PM UTC-6, Dave Kincaid wrote:
>
> The question marks are actual question marks. I'm not sure how to find the 
> "duplicate" keys in the map in memory. As far as I can tell there is only 
> one "? 5" key in the in memory map.
>
> I thought maybe computing the frequencies of the hash values of the keys 
> and looking for any with more than one would find them, but this code:
>
> read-notes> (def dupes (filter #(> (second %) 1) (frequencies (map hash 
> (keys phrases)
> #'read-notes/dupes
> read-notes> (count dupes)
> 8911
>
> seems to indicate 8,911 keys with identical hash values.
>
>


-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Duplicate key exception reading map that was written to a file

2015-11-25 Thread Dave Kincaid
The question marks are actual question marks. I'm not sure how to find the 
"duplicate" keys in the map in memory. As far as I can tell there is only 
one "? 5" key in the in memory map.

I thought maybe computing the frequencies of the hash values of the keys 
and looking for any with more than one would find them, but this code:

read-notes> (def dupes (filter #(> (second %) 1) (frequencies (map hash 
(keys phrases)
#'read-notes/dupes
read-notes> (count dupes)
8911

seems to indicate 8,911 keys with identical hash values.

On Wednesday, November 25, 2015 at 10:27:29 PM UTC-6, Ghadi Shayban wrote:
>
> While in memory before writing, are the hash codes for the "duplicate" 
> keys the same?   You can call (hash) on the keys.  I'm thinking there is 
> perhaps an issue with unicode string serialization...  Are the question 
> marks a particular character?
>
> If you can find the similar strings in memory, before they are written, 
> call:
> (map int  the-string)
> To see the actual unicode characters for the question marks.
>
> On Wednesday, November 25, 2015 at 11:07:34 PM UTC-5, Dave Kincaid wrote:
>>
>> The number of keys in the map is 8,054,160.
>>
>> On Wednesday, November 25, 2015 at 10:04:11 PM UTC-6, Dave Kincaid wrote:
>>>
>>> I have something very strange going on when I try to write a map out to 
>>> a file and read it back in. It's a perfectly fine hash-map with ? 
>>> key/values (so it's pretty big). When I write the map out to a file using
>>>
>>> (spit "/tmp/mednotes6153968756847768349/repl-write.edn" (pr-str phrases
>>> ))
>>>
>>> and then read it back in with
>>>
>>> (edn/read (PushbackReader. (io/reader 
>>> "/tmp/mednotes6153968756847768349/repl-write.edn")))
>>>
>>> I am getting a duplicate key exception indicating that "? 5" is 
>>> duplicated. phrases is a clojure.lang.PersistentHashMap. The keys of the 
>>> map are strings and the values are numbers. When I get the value for "? 5" 
>>> from the map it returns 352.
>>>
>>> I tried to grep the file to find the occurrences of the key "? 5" (and 
>>> the 30 characters before and after it) and it seems to return 4 of them. 
>>> The second one is the right one from the map, but I have no idea where the 
>>> other 3 are coming from.
>>>
>>> [/tmp/mednotes6153968756847768349]> egrep -o ".{30}\"\? 5\" .{30}" 
>>> repl-write.edn 
>>> hasing a toothbrush for" 160, "? 5" 32, ". ) during his /" 32, "to
>>>  "is intact with sutures" 32, "? 5" 352, "4.81 pounds" 128, "ceren
>>> udden" 32, "being up all" 32, "? 5" 32, "limited financial means" 
>>> , "count , everytime she" 32, "? 5" 32, "had a partial mandibulect
>>>
>>> Does anyone have an idea what might be happening when the map is written 
>>> out to the file? How is that key getting duplicated?
>>>
>>> I have tried a few slightly different ways of writing to the file 
>>> including
>>>
>>> (spit "/tmp/mednotes6153968756847768349/repl-write.edn" (binding 
>>> [*print-dup* true] (pr-str phrases)))
>>>
>>> and
>>>
>>> (spit "/tmp/mednotes6153968756847768349/repl-write.edn" (.toString 
>>> phrases))
>>>
>>> based on some StackOverflow answers I found. They all seem to do the 
>>> same thing.
>>>
>>> Here is the exception stack trace.
>>>
>>> 1. Caused by java.lang.IllegalArgumentException
>>>Duplicate key: ? 5
>>>
>>> PersistentHashMap.java:   67 
>>>  clojure.lang.PersistentHashMap/createWithCheck
>>>RT.java: 1538  clojure.lang.RT/map
>>> EdnReader.java:  631 
>>>  clojure.lang.EdnReader$MapReader/invoke
>>> EdnReader.java:  142  clojure.lang.EdnReader/read
>>> EdnReader.java:  108  clojure.lang.EdnReader/read
>>>edn.clj:   35  clojure.edn/read
>>>edn.clj:   33  clojure.edn/read
>>>   AFn.java:  154  clojure.lang.AFn/applyToHelper
>>>   AFn.java:  144  clojure.lang.AFn/applyTo
>>>  Compiler.java: 3623 
>>>  clojure.lang.Compiler$InvokeExpr/eval
>>>  Compiler.java:  439  clojure.lang.Compiler$DefExpr/eval
>>>  Compiler.java: 6787  clojure.lang.Compiler/eval
>>>  Compiler.java: 6745  clojure.lang.Compiler/eval
>>>   core.clj: 3081  clojure.core/eval
>>>   main.clj:  240 
>>>  clojure.main/repl/read-eval-print/fn
>>>   main.clj:  240  clojure.main/repl/read-eval-print
>>>   main.clj:  258  clojure.main/repl/fn
>>>   main.clj:  258  clojure.main/repl
>>>RestFn.java: 1523  clojure.lang.RestFn/invoke
>>> interruptible_eval.clj:   58 
>>>  clojure.tools.nrepl.middleware.interruptible-eval/evaluate/fn
>>>   AFn.java:  152  clojure.lang.AFn/applyToHelper
>>>   AFn.java:  144  clojure.lang.AFn/applyTo
>>>   core.clj:  630  clojure.core/apply
>>>   core.clj: 1868