Re: Duplicate key exception reading map that was written to a file
While in memory before writing, are the hash codes for the "duplicate" keys the same? You can call (hash) on the keys. I'm thinking there is perhaps an issue with unicode string serialization... Are the question marks a particular character? If you can find the similar strings in memory, before they are written, call: (map int the-string) To see the actual unicode characters for the question marks. On Wednesday, November 25, 2015 at 11:07:34 PM UTC-5, Dave Kincaid wrote: > > The number of keys in the map is 8,054,160. > > On Wednesday, November 25, 2015 at 10:04:11 PM UTC-6, Dave Kincaid wrote: >> >> I have something very strange going on when I try to write a map out to a >> file and read it back in. It's a perfectly fine hash-map with ? >> key/values (so it's pretty big). When I write the map out to a file using >> >> (spit "/tmp/mednotes6153968756847768349/repl-write.edn" (pr-str phrases)) >> >> and then read it back in with >> >> (edn/read (PushbackReader. (io/reader >> "/tmp/mednotes6153968756847768349/repl-write.edn"))) >> >> I am getting a duplicate key exception indicating that "? 5" is >> duplicated. phrases is a clojure.lang.PersistentHashMap. The keys of the >> map are strings and the values are numbers. When I get the value for "? 5" >> from the map it returns 352. >> >> I tried to grep the file to find the occurrences of the key "? 5" (and >> the 30 characters before and after it) and it seems to return 4 of them. >> The second one is the right one from the map, but I have no idea where the >> other 3 are coming from. >> >> [/tmp/mednotes6153968756847768349]> egrep -o ".{30}\"\? 5\" .{30}" >> repl-write.edn >> hasing a toothbrush for" 160, "? 5" 32, ". ) during his /" 32, "to >> "is intact with sutures" 32, "? 5" 352, "4.81 pounds" 128, "ceren >> udden" 32, "being up all" 32, "? 5" 32, "limited financial means" >> , "count , everytime she" 32, "? 5" 32, "had a partial mandibulect >> >> Does anyone have an idea what might be happening when the map is written >> out to the file? How is that key getting duplicated? >> >> I have tried a few slightly different ways of writing to the file >> including >> >> (spit "/tmp/mednotes6153968756847768349/repl-write.edn" (binding >> [*print-dup* true] (pr-str phrases))) >> >> and >> >> (spit "/tmp/mednotes6153968756847768349/repl-write.edn" (.toString >> phrases)) >> >> based on some StackOverflow answers I found. They all seem to do the same >> thing. >> >> Here is the exception stack trace. >> >> 1. Caused by java.lang.IllegalArgumentException >>Duplicate key: ? 5 >> >> PersistentHashMap.java: 67 >> clojure.lang.PersistentHashMap/createWithCheck >>RT.java: 1538 clojure.lang.RT/map >> EdnReader.java: 631 >> clojure.lang.EdnReader$MapReader/invoke >> EdnReader.java: 142 clojure.lang.EdnReader/read >> EdnReader.java: 108 clojure.lang.EdnReader/read >>edn.clj: 35 clojure.edn/read >>edn.clj: 33 clojure.edn/read >> AFn.java: 154 clojure.lang.AFn/applyToHelper >> AFn.java: 144 clojure.lang.AFn/applyTo >> Compiler.java: 3623 >> clojure.lang.Compiler$InvokeExpr/eval >> Compiler.java: 439 clojure.lang.Compiler$DefExpr/eval >> Compiler.java: 6787 clojure.lang.Compiler/eval >> Compiler.java: 6745 clojure.lang.Compiler/eval >> core.clj: 3081 clojure.core/eval >> main.clj: 240 clojure.main/repl/read-eval-print/fn >> main.clj: 240 clojure.main/repl/read-eval-print >> main.clj: 258 clojure.main/repl/fn >> main.clj: 258 clojure.main/repl >>RestFn.java: 1523 clojure.lang.RestFn/invoke >> interruptible_eval.clj: 58 >> clojure.tools.nrepl.middleware.interruptible-eval/evaluate/fn >> AFn.java: 152 clojure.lang.AFn/applyToHelper >> AFn.java: 144 clojure.lang.AFn/applyTo >> core.clj: 630 clojure.core/apply >> core.clj: 1868 clojure.core/with-bindings* >>RestFn.java: 425 clojure.lang.RestFn/invoke >> interruptible_eval.clj: 56 >> clojure.tools.nrepl.middleware.interruptible-eval/evaluate >> interruptible_eval.clj: 191 >> clojure.tools.nrepl.middleware.interruptible-eval/interruptible-eval/fn/fn >> interruptible_eval.clj: 159 >> clojure.tools.nrepl.middleware.interruptible-eval/run-next/fn >> AFn.java: 22 clojure.lang.AFn/run >>ThreadPoolExecutor.java: 1142 >> java.util.concurrent.ThreadPoolExecutor/runWorker >>ThreadPoolExecutor.java: 617 >> java.util.concurrent.ThreadPoolExecutor$Worker/run >>Thread.java: 745
Re: Duplicate key exception reading map that was written to a file
The number of keys in the map is 8,054,160. On Wednesday, November 25, 2015 at 10:04:11 PM UTC-6, Dave Kincaid wrote: > > I have something very strange going on when I try to write a map out to a > file and read it back in. It's a perfectly fine hash-map with ? > key/values (so it's pretty big). When I write the map out to a file using > > (spit "/tmp/mednotes6153968756847768349/repl-write.edn" (pr-str phrases)) > > and then read it back in with > > (edn/read (PushbackReader. (io/reader > "/tmp/mednotes6153968756847768349/repl-write.edn"))) > > I am getting a duplicate key exception indicating that "? 5" is > duplicated. phrases is a clojure.lang.PersistentHashMap. The keys of the > map are strings and the values are numbers. When I get the value for "? 5" > from the map it returns 352. > > I tried to grep the file to find the occurrences of the key "? 5" (and the > 30 characters before and after it) and it seems to return 4 of them. The > second one is the right one from the map, but I have no idea where the > other 3 are coming from. > > [/tmp/mednotes6153968756847768349]> egrep -o ".{30}\"\? 5\" .{30}" > repl-write.edn > hasing a toothbrush for" 160, "? 5" 32, ". ) during his /" 32, "to > "is intact with sutures" 32, "? 5" 352, "4.81 pounds" 128, "ceren > udden" 32, "being up all" 32, "? 5" 32, "limited financial means" > , "count , everytime she" 32, "? 5" 32, "had a partial mandibulect > > Does anyone have an idea what might be happening when the map is written > out to the file? How is that key getting duplicated? > > I have tried a few slightly different ways of writing to the file including > > (spit "/tmp/mednotes6153968756847768349/repl-write.edn" (binding > [*print-dup* true] (pr-str phrases))) > > and > > (spit "/tmp/mednotes6153968756847768349/repl-write.edn" (.toString > phrases)) > > based on some StackOverflow answers I found. They all seem to do the same > thing. > > Here is the exception stack trace. > > 1. Caused by java.lang.IllegalArgumentException >Duplicate key: ? 5 > > PersistentHashMap.java: 67 > clojure.lang.PersistentHashMap/createWithCheck >RT.java: 1538 clojure.lang.RT/map > EdnReader.java: 631 > clojure.lang.EdnReader$MapReader/invoke > EdnReader.java: 142 clojure.lang.EdnReader/read > EdnReader.java: 108 clojure.lang.EdnReader/read >edn.clj: 35 clojure.edn/read >edn.clj: 33 clojure.edn/read > AFn.java: 154 clojure.lang.AFn/applyToHelper > AFn.java: 144 clojure.lang.AFn/applyTo > Compiler.java: 3623 clojure.lang.Compiler$InvokeExpr/eval > Compiler.java: 439 clojure.lang.Compiler$DefExpr/eval > Compiler.java: 6787 clojure.lang.Compiler/eval > Compiler.java: 6745 clojure.lang.Compiler/eval > core.clj: 3081 clojure.core/eval > main.clj: 240 clojure.main/repl/read-eval-print/fn > main.clj: 240 clojure.main/repl/read-eval-print > main.clj: 258 clojure.main/repl/fn > main.clj: 258 clojure.main/repl >RestFn.java: 1523 clojure.lang.RestFn/invoke > interruptible_eval.clj: 58 > clojure.tools.nrepl.middleware.interruptible-eval/evaluate/fn > AFn.java: 152 clojure.lang.AFn/applyToHelper > AFn.java: 144 clojure.lang.AFn/applyTo > core.clj: 630 clojure.core/apply > core.clj: 1868 clojure.core/with-bindings* >RestFn.java: 425 clojure.lang.RestFn/invoke > interruptible_eval.clj: 56 > clojure.tools.nrepl.middleware.interruptible-eval/evaluate > interruptible_eval.clj: 191 > clojure.tools.nrepl.middleware.interruptible-eval/interruptible-eval/fn/fn > interruptible_eval.clj: 159 > clojure.tools.nrepl.middleware.interruptible-eval/run-next/fn > AFn.java: 22 clojure.lang.AFn/run >ThreadPoolExecutor.java: 1142 > java.util.concurrent.ThreadPoolExecutor/runWorker >ThreadPoolExecutor.java: 617 > java.util.concurrent.ThreadPoolExecutor$Worker/run >Thread.java: 745 java.lang.Thread/run > > > > > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving
Re: Duplicate key exception reading map that was written to a file
Does the phrases value in memory exactly match the payload roundtripped through Avro? -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Duplicate key exception reading map that was written to a file
I just tried outputting the map to an Avro file and read it back in. This works fine. That tells me that there is something wrong with the way that I'm trying to write the EDN file somehow. Here is the code I used to output to Avro and read back: (def schema (avro/parse-schema {:type :map :values :long})) (with-open [out-file (avro/data-file-writer schema "/tmp/mednotes6153968756847768349/repl-write.avro")] (.append out-file phrases)) (def ps (with-open [in-file (avro/data-file-reader "/tmp/mednotes6153968756847768349/repl-write.avro")] (doall (seq in-file I'm using the excellent abracad library :refer'd as avro. On Wednesday, November 25, 2015 at 10:40:53 PM UTC-6, Dave Kincaid wrote: > > The question marks are actual question marks. I'm not sure how to find the > "duplicate" keys in the map in memory. As far as I can tell there is only > one "? 5" key in the in memory map. > > I thought maybe computing the frequencies of the hash values of the keys > and looking for any with more than one would find them, but this code: > > read-notes> (def dupes (filter #(> (second %) 1) (frequencies (map hash > (keys phrases) > #'read-notes/dupes > read-notes> (count dupes) > 8911 > > seems to indicate 8,911 keys with identical hash values. > > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Duplicate key exception reading map that was written to a file
The question marks are actual question marks. I'm not sure how to find the "duplicate" keys in the map in memory. As far as I can tell there is only one "? 5" key in the in memory map. I thought maybe computing the frequencies of the hash values of the keys and looking for any with more than one would find them, but this code: read-notes> (def dupes (filter #(> (second %) 1) (frequencies (map hash (keys phrases) #'read-notes/dupes read-notes> (count dupes) 8911 seems to indicate 8,911 keys with identical hash values. On Wednesday, November 25, 2015 at 10:27:29 PM UTC-6, Ghadi Shayban wrote: > > While in memory before writing, are the hash codes for the "duplicate" > keys the same? You can call (hash) on the keys. I'm thinking there is > perhaps an issue with unicode string serialization... Are the question > marks a particular character? > > If you can find the similar strings in memory, before they are written, > call: > (map int the-string) > To see the actual unicode characters for the question marks. > > On Wednesday, November 25, 2015 at 11:07:34 PM UTC-5, Dave Kincaid wrote: >> >> The number of keys in the map is 8,054,160. >> >> On Wednesday, November 25, 2015 at 10:04:11 PM UTC-6, Dave Kincaid wrote: >>> >>> I have something very strange going on when I try to write a map out to >>> a file and read it back in. It's a perfectly fine hash-map with ? >>> key/values (so it's pretty big). When I write the map out to a file using >>> >>> (spit "/tmp/mednotes6153968756847768349/repl-write.edn" (pr-str phrases >>> )) >>> >>> and then read it back in with >>> >>> (edn/read (PushbackReader. (io/reader >>> "/tmp/mednotes6153968756847768349/repl-write.edn"))) >>> >>> I am getting a duplicate key exception indicating that "? 5" is >>> duplicated. phrases is a clojure.lang.PersistentHashMap. The keys of the >>> map are strings and the values are numbers. When I get the value for "? 5" >>> from the map it returns 352. >>> >>> I tried to grep the file to find the occurrences of the key "? 5" (and >>> the 30 characters before and after it) and it seems to return 4 of them. >>> The second one is the right one from the map, but I have no idea where the >>> other 3 are coming from. >>> >>> [/tmp/mednotes6153968756847768349]> egrep -o ".{30}\"\? 5\" .{30}" >>> repl-write.edn >>> hasing a toothbrush for" 160, "? 5" 32, ". ) during his /" 32, "to >>> "is intact with sutures" 32, "? 5" 352, "4.81 pounds" 128, "ceren >>> udden" 32, "being up all" 32, "? 5" 32, "limited financial means" >>> , "count , everytime she" 32, "? 5" 32, "had a partial mandibulect >>> >>> Does anyone have an idea what might be happening when the map is written >>> out to the file? How is that key getting duplicated? >>> >>> I have tried a few slightly different ways of writing to the file >>> including >>> >>> (spit "/tmp/mednotes6153968756847768349/repl-write.edn" (binding >>> [*print-dup* true] (pr-str phrases))) >>> >>> and >>> >>> (spit "/tmp/mednotes6153968756847768349/repl-write.edn" (.toString >>> phrases)) >>> >>> based on some StackOverflow answers I found. They all seem to do the >>> same thing. >>> >>> Here is the exception stack trace. >>> >>> 1. Caused by java.lang.IllegalArgumentException >>>Duplicate key: ? 5 >>> >>> PersistentHashMap.java: 67 >>> clojure.lang.PersistentHashMap/createWithCheck >>>RT.java: 1538 clojure.lang.RT/map >>> EdnReader.java: 631 >>> clojure.lang.EdnReader$MapReader/invoke >>> EdnReader.java: 142 clojure.lang.EdnReader/read >>> EdnReader.java: 108 clojure.lang.EdnReader/read >>>edn.clj: 35 clojure.edn/read >>>edn.clj: 33 clojure.edn/read >>> AFn.java: 154 clojure.lang.AFn/applyToHelper >>> AFn.java: 144 clojure.lang.AFn/applyTo >>> Compiler.java: 3623 >>> clojure.lang.Compiler$InvokeExpr/eval >>> Compiler.java: 439 clojure.lang.Compiler$DefExpr/eval >>> Compiler.java: 6787 clojure.lang.Compiler/eval >>> Compiler.java: 6745 clojure.lang.Compiler/eval >>> core.clj: 3081 clojure.core/eval >>> main.clj: 240 >>> clojure.main/repl/read-eval-print/fn >>> main.clj: 240 clojure.main/repl/read-eval-print >>> main.clj: 258 clojure.main/repl/fn >>> main.clj: 258 clojure.main/repl >>>RestFn.java: 1523 clojure.lang.RestFn/invoke >>> interruptible_eval.clj: 58 >>> clojure.tools.nrepl.middleware.interruptible-eval/evaluate/fn >>> AFn.java: 152 clojure.lang.AFn/applyToHelper >>> AFn.java: 144 clojure.lang.AFn/applyTo >>> core.clj: 630 clojure.core/apply >>> core.clj: 1868