Re: CLJS: How can you find the version of clojurescript that you're running?
**clojure-version* * {:major 1, :minor 4, :incremental 0, :qualifier nil} On Sun, Oct 21, 2012 at 2:16 AM, Frank Siebenlist frank.siebenl...@gmail.com wrote: When you have different versions of clojurescript in the dependencies of your main project, how do you ask the repl what version it is running with… is there any easy function/var that I overlooked? Thanks, FrankS. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -- I may be wrong or incomplete. Please express any corrections / additions, they are encouraged and appreciated. At least one entity is bound to be transformed if you do ;) -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Parsing Log Files with Interspersed Multi-line Log Statements
Clojurists - I'm fairly new to Clojure and didn't realize how broken I've become using imperative languages all my life. I'm stumped as to how to parse a Varnish (www.varnish-cache.org) log file using Clojure. The main problem is that for a single request a varnish log file generates multiple log lines and each line is interspersed with lines from other threads. These log files can be several gigabytes in size (so using a stable sort of the entire log by thread id is out of the question). Below I've included a small example log file and an example output Clojure data structure. Let me thank everyone in advance for any hints / help they can provide on this seemingly simple problem. *Rules of the Varnish Log File* - The first number on each line is the thread id (not unique and gets reused frequently) - Each ReqStart marks the start of a request and the last number on the line is the unique transaction id (e.g. 118591777) - ReqEnd denote the end of the processing of the request by the thread - Each line is atomically written, however many threads generate log lines that are interspersed with other requests (threads) - These log files can be VERY large (10+ Gigabytes in the case of my application) so using a stable sort by thread id or anything that loads the entire file into memory is out of the question. *Example Varnish Log file* 40 ReqEnd c 118591771 1350759605.775758028 1350759611.249602079 5.866879225 5.473801851 0.42200 15 ReqStart c 10.102.41.121 4187 118591777 15 RxRequestc GET 15 RxURLc /json/engagement 15 RxHeader c host: www.example.com 30 ReqStart c 10.102.41.121 3906 118591802 15 RxHeader c Accept: application/json 30 RxRequestc GET 30 RxURLc /ws/boxtops/user/ 30 RxHeader c host: www.example.com 15 ReqEnd c 118591777 1350759605.775758028 1350759611.249602079 5.866879225 5.473801851 0.42200 30 RxHeader c Accept: application/xml 30 ReqEnd c 118591802 1350759611.326084614 1350759611.329720259 0.005002737 0.003598213 0.37432 15 ReqStart c 10.102.41.121 4187 118591808 15 RxRequestc GET 15 RxURLc /ws/boxtops/user/ 30 ReqStart c 10.102.41.121 3906 118591810 15 RxHeader c host: www.example.com 15 RxHeader c Accept: application/xml 30 RxRequestc GET 30 RxURLc /registration/success 30 RxHeader c host: www.example.com 46 TxRequest- GET 30 RxHeader c Accept: text/html 46 TxURL- /registration/success 15 ReqEnd c 118591808 1350759611.442447424 1350759611.444925785 0.016906023 0.002441406 0.36955 30 ReqEnd c 118591810 1350759611.521781683 1350759611.525400877 0.098322868 0.003532171 0.87023 *Desired Output* { 118591802 { :ReqStart [10.102.41.121 3906 118591802] :RxRequest [GET] :RxURL [/ws/boxtops/user/] :RxHeader [host: www.example.com Accept: application/xml] or better yet :RxHeader {:host www.example.com :Accept application/xml} :ReqEnd [118591802 1350759611.326084614 1350759611.329720259 0.005002737 0.003598213 0.37432] } 118591777 { :ReqStart [10.102.41.121 4187 118591777] :RxRequest [GET] :RxURL [/json/engagement] :RxHeader [host: www.example.com Accept: application/json] :ReqEnd [118591777 1350759605.775758028 1350759611.249602079 5.866879225 5.473801851 0.42200 ]} 118591808 { :ReqStart [10.102.41.121 4187 118591808] :RxRequest [GET] :RxURL [/ws/boxtops/user/] :RxHeader [host: www.example.com Accept: application/xml] :ReqEnd [118591808 1350759611.442447424 1350759611.444925785 0.016906023 0.002441406 0.36955] } 118591810 { :ReqStart [10.102.41.121 3906 118591810] :RxRequest [GET] :RxURL [/registration/success] :RxHeader [host: www.example.com Accept: text/html] :ReqEnd [118591810 1350759611.521781683 1350759611.525400877 0.098322868 0.003532171 0.87023] } } -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: Parsing Log Files with Interspersed Multi-line Log Statements
Look at: http://clojuredocs.org/clojure_core/clojure.core/line-seq to get a lazy sequence of lines of the file. I don't think that there is any need to sort here. (I think sorting wont help anyway because some lines are seem to be only identifiable based on the thread id in the current sequence) Process each line and build up the result data structure as you go. http://clojuredocs.org/clojure_core/clojure.core/assoc-in may help you for a nested map. Do not hold on to the head of the line seq and you should be able to process this without too much problem. (if the data structure you build up is itself too large, you may need to write out result data as you see each ReqEnd. In this case, dissoc that record so it can be garbage collected) Dave -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: Parsing Log Files with Interspersed Multi-line Log Statements
btw, start simple and just see if you can scan the lines without doing anything in particular. Then take some sub sequence: (take 100 my-line-seq) Play in the repl to start building up the data you want. See what you get and work from there. D On Sunday, 21 October 2012 19:05:42 UTC+11, Dave Sann wrote: Look at: http://clojuredocs.org/clojure_core/clojure.core/line-seq to get a lazy sequence of lines of the file. I don't think that there is any need to sort here. (I think sorting wont help anyway because some lines are seem to be only identifiable based on the thread id in the current sequence) Process each line and build up the result data structure as you go. http://clojuredocs.org/clojure_core/clojure.core/assoc-in may help you for a nested map. Do not hold on to the head of the line seq and you should be able to process this without too much problem. (if the data structure you build up is itself too large, you may need to write out result data as you see each ReqEnd. In this case, dissoc that record so it can be garbage collected) Dave -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: Parsing Log Files with Interspersed Multi-line Log Statements
As Dave says, you can do this using line-seq, but you'll have to accumulate some state as you read the lines so you can return all the lines for a given thread's ReqStart to ReqEnd. Once you've returned that block, you can delete the state for that thread-id, so your accumulated state will only contain the 'active' requests. If you're processing a very large file, you're best returning a lazy sequence of the data. Something like this should get you started: (require [clojure.java.io :as io]) (defn parse-line [s] (let [[_ thread_id tag value] (re-find #^(\d+)\s+(\S+)\s+(.+)$ s)] [thread_id tag value])) (defn parse-lines ([lines] (parse-lines lines {})) ([lines state] (lazy-seq (when (seq lines) (let [[thread-id tag value] (parse-line (first lines)) state (assoc-in state [thread-id tag] (conj (get-in state [thread-id tag] []) value))] (if (= tag ReqEnd) (cons (get state thread-id) (parse-lines (rest lines) (dissoc state thread-id))) (parse-lines (rest lines) state))) (defn parse-log-file [file] (with-open [logfile (io/reader file)] (doall (filter #(and (get % ReqStart) (get % ReqEnd)) (parse-lines (line-seq logfile)) On 21 October 2012 02:54, Dave d...@sevenventures.net wrote: Clojurists - I'm fairly new to Clojure and didn't realize how broken I've become using imperative languages all my life. I'm stumped as to how to parse a Varnish (www.varnish-cache.org) log file using Clojure. The main problem is that for a single request a varnish log file generates multiple log lines and each line is interspersed with lines from other threads. These log files can be several gigabytes in size (so using a stable sort of the entire log by thread id is out of the question). Below I've included a small example log file and an example output Clojure data structure. Let me thank everyone in advance for any hints / help they can provide on this seemingly simple problem. Rules of the Varnish Log File The first number on each line is the thread id (not unique and gets reused frequently) Each ReqStart marks the start of a request and the last number on the line is the unique transaction id (e.g. 118591777) ReqEnd denote the end of the processing of the request by the thread Each line is atomically written, however many threads generate log lines that are interspersed with other requests (threads) These log files can be VERY large (10+ Gigabytes in the case of my application) so using a stable sort by thread id or anything that loads the entire file into memory is out of the question. Example Varnish Log file 40 ReqEnd c 118591771 1350759605.775758028 1350759611.249602079 5.866879225 5.473801851 0.42200 15 ReqStart c 10.102.41.121 4187 118591777 15 RxRequestc GET 15 RxURLc /json/engagement 15 RxHeader c host: www.example.com 30 ReqStart c 10.102.41.121 3906 118591802 15 RxHeader c Accept: application/json 30 RxRequestc GET 30 RxURLc /ws/boxtops/user/ 30 RxHeader c host: www.example.com 15 ReqEnd c 118591777 1350759605.775758028 1350759611.249602079 5.866879225 5.473801851 0.42200 30 RxHeader c Accept: application/xml 30 ReqEnd c 118591802 1350759611.326084614 1350759611.329720259 0.005002737 0.003598213 0.37432 15 ReqStart c 10.102.41.121 4187 118591808 15 RxRequestc GET 15 RxURLc /ws/boxtops/user/ 30 ReqStart c 10.102.41.121 3906 118591810 15 RxHeader c host: www.example.com 15 RxHeader c Accept: application/xml 30 RxRequestc GET 30 RxURLc /registration/success 30 RxHeader c host: www.example.com 46 TxRequest- GET 30 RxHeader c Accept: text/html 46 TxURL- /registration/success 15 ReqEnd c 118591808 1350759611.442447424 1350759611.444925785 0.016906023 0.002441406 0.36955 30 ReqEnd c 118591810 1350759611.521781683 1350759611.525400877 0.098322868 0.003532171 0.87023 Desired Output { 118591802 { :ReqStart [10.102.41.121 3906 118591802] :RxRequest [GET] :RxURL [/ws/boxtops/user/] :RxHeader [host: www.example.com Accept: application/xml] or better yet :RxHeader {:host www.example.com :Accept application/xml} :ReqEnd [118591802 1350759611.326084614 1350759611.329720259 0.005002737 0.003598213 0.37432] } 118591777 { :ReqStart [10.102.41.121 4187 118591777] :RxRequest [GET] :RxURL [/json/engagement] :RxHeader [host: www.example.com Accept: application/json] :ReqEnd [118591777 1350759605.775758028 1350759611.249602079 5.866879225 5.473801851 0.42200 ]} 118591808 { :ReqStart [10.102.41.121 4187 118591808] :RxRequest [GET] :RxURL [/ws/boxtops/user/] :RxHeader [host: www.example.com
Re: Write stream to two outputs
Hi Herwig, Thanks for your help. Herwig Hochleitner writes: (defn copy-multi ([input outputs] (copy-multi input outputs (make-array Byte/TYPE 1024))) ([^java.io.InputStream input outputs buffer] (let [size (.read input buffer)] (when (pos? size) (doseq [^java.io.OutputStrean output outputs] (.write output buffer 0 size)) (recur input outputs buffer) I tried using the above as follows: (defn copy-multi ([input outputs] (copy-multi input outputs (make-array Byte/TYPE 1024))) ([^java.io.OutputStream input outputs buffer] (let [size (.read input buffer)] (when (pos? size) (doseq [^java.io.OutputStream output outputs] (.write output buffer 0 size)) (recur input outputs buffer) (defn save-to-disk [file] Saves a temporary file to disk to further analyse it (info Saving file to disk) (let [pipe-in (java.io.PipedInputStream.) pipe-out (java.io.PipedOutputStream. pipe-in)] (future (with-open [in (:body file) file-out (io/output-stream /tmp/test.txt)] (copy-multi in [file-out pipe-out]))) (update-in file [:body] pipe-in))) I added `future` so that the pipe-in is read and the pipe-out doesn't block. However, the program returns before anything is copied. Without `future` the pipe-in is never read out and the pipe-out blocks. So either to fast or it never completes. I'm beginning to think that I'm trying to solve the problem (streaming one input to file and S3 simultaneously) in a wrong way. Any suggestions? -- Petar Radosevic | @wunki -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: CLJS: How can you find the version of clojurescript that you're running?
There is not. That would be useful. On Saturday, October 20, 2012, Frank Siebenlist wrote: When you have different versions of clojurescript in the dependencies of your main project, how do you ask the repl what version it is running with… is there any easy function/var that I overlooked? Thanks, FrankS. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.comjavascript:; Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com javascript:; For more options, visit this group at http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: Write stream to two outputs
Streaming simultaneously can result in all kinds of problems due differences in speed and reliablility between a disk and network connection. So if you can, buffer the data for the write to S3. Maybe you could just stream to S3 from the file after you've written it? If you must stream simultaneously, don't try any magic with pipedinputstream. Just use: (with-open [in (open-input) file (open-file) s3 (open-s3)] (copy-multi in [file s3])) And beware that if someone pulls the network plug during the operation, the file will not be fully written either. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: finding intervals quickly
Answering my own question, my issues were 1) overlooking a couple seqs that were thousands of times slower than vectors, and 2) using / instead of quot. Just found this video: http://www.infoq.com/presentations/Crunching-Numbers-Clojure I'm finding that seqs are pretty much always too slow, but they are returned by many of the core functions. Consequently my code is getting littered with (vec ...) calls. Am I doing this wrong? Does everyone have (vec ...) calls all over their code? On Saturday, October 20, 2012 7:27:56 PM UTC-7, Brian Craft wrote: I have a vector of maps representing intervals with (start, end) coords in one dimension, that are non-overlapping, and sorted. At the moment I'm not using a sorted data type, just a vector. The bottleneck in my code is in searching for intervals that overlap a region, like so: (defn filter-probe-range [probes start end] (filter #(and ( (:end %) start) ( (:start %) end)) probes)) My first thought to speed it up was to add a binary search on the vector, since it's sorted. However, that is wildly slower than doing a full scan with filter. I'm not sure why that is, or how to investigate it. I've looked over the section on performance in The Joy of.., but nothing is really popping out. Should I start with jvisualvm? Is there something better that would identify what I'm doing that's so slow? Also wondering if sorted data types would do this for me for free. Is there a way to do a binary search on a sorted collection? -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: ANN: ClojureScript release 0.0-1513
On Oct 20, 12:52 am, Stuart Sierra the.stuart.sie...@gmail.com wrote: ClojureScript release 0.0-1513 is on its way to the Maven Central Repository. Changes:http://build.clojure.org/job/clojurescript-release/18/ This release (via lein-cljsbuild 0.2.9) broke one of my projects, which worked with 0.0-1503 (lein-cljsbuild 0.2.8). I am not yet able to isolate the issue into a small reproducible test-case, but below is how you can see it in effect: 1. $ git clone g...@github.com:kumarshantanu/basil.git 2. $ cd basil 3. $ git checkout 06cd1f0 # latest commit as of now 4. $ # make sure you have PhantomJS 1.6 or higher 5. $ lein dev do clean, test # stack traces may help 6. Open `run-tests.html` in Firefox and see console If you edit project.clj and replace lein-cljsbuild version 0.2.9 with 0.2.8, all tests pass. I tried debugging the issue using Firebug and it mentioned 'variable scoping error', but I don't think I am close to the actual cause. Sharing it here hoping someone can throw pointers. Shantanu -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: ANN: ClojureScript release 0.0-1513
1. $ git clone g...@github.com:kumarshantanu/basil.git Or whichever URL is convenient for this project: https://github.com/kumarshantanu/basil Shantanu -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: ANN: ClojureScript release 0.0-1513
Please isolate the commit by using the lein checkouts feature - you can use the ClojureScript repo directly then and use git bisect to determine the exact commit that broke your build. Thanks. On Sunday, October 21, 2012, Shantanu Kumar wrote: On Oct 20, 12:52 am, Stuart Sierra the.stuart.sie...@gmail.comjavascript:; wrote: ClojureScript release 0.0-1513 is on its way to the Maven Central Repository. Changes:http://build.clojure.org/job/clojurescript-release/18/ This release (via lein-cljsbuild 0.2.9) broke one of my projects, which worked with 0.0-1503 (lein-cljsbuild 0.2.8). I am not yet able to isolate the issue into a small reproducible test-case, but below is how you can see it in effect: 1. $ git clone g...@github.com:kumarshantanu/basil.git 2. $ cd basil 3. $ git checkout 06cd1f0 # latest commit as of now 4. $ # make sure you have PhantomJS 1.6 or higher 5. $ lein dev do clean, test # stack traces may help 6. Open `run-tests.html` in Firefox and see console If you edit project.clj and replace lein-cljsbuild version 0.2.9 with 0.2.8, all tests pass. I tried debugging the issue using Firebug and it mentioned 'variable scoping error', but I don't think I am close to the actual cause. Sharing it here hoping someone can throw pointers. Shantanu -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.comjavascript:; Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com javascript:; For more options, visit this group at http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
CLJS-402: Re: CLJS: How can you find the version of clojurescript that you're running?
I've created an JIRA issue for this: http://dev.clojure.org/jira/browse/CLJS-402 and added a patch for the build script: --- Auto-generation from the build script of version_autogen.clj and version_autogen.cljs files that both define the cljs.version-autogen/clojurescript-version with the current version info for both the clj and cljs environments --- Feels a little crude, but it works…: --- user= (require 'cljs.version-autogen) nil user= cljs.version-autogen/*clojurescript-version* {:minor 0, :incremental 1514, :major 0} user= (run-repl-listen) Type: :cljs/quit to quit ClojureScript:cljs.user (load-namespace 'cljs.version-autogen) ClojureScript:cljs.user cljs.version-autogen/*clojurescript-version* {:incremental 1514, :major 0, :minor 0} ClojureScript:cljs.user --- Any alternative approaches that would be more elegant? -FrankS. On Oct 21, 2012, at 8:02 AM, David Nolen dnolen.li...@gmail.com wrote: There is not. That would be useful. On Saturday, October 20, 2012, Frank Siebenlist wrote: When you have different versions of clojurescript in the dependencies of your main project, how do you ask the repl what version it is running with… is there any easy function/var that I overlooked? Thanks, FrankS. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: Parsing Log Files with Interspersed Multi-line Log Statements
I assumed he needs the results sorted by transaction id, if he doesn't then it should be quite simple On Sun, Oct 21, 2012 at 10:05 AM, Dave Sann daves...@gmail.com wrote: Look at: http://clojuredocs.org/clojure_core/clojure.core/line-seq to get a lazy sequence of lines of the file. I don't think that there is any need to sort here. (I think sorting wont help anyway because some lines are seem to be only identifiable based on the thread id in the current sequence) Process each line and build up the result data structure as you go. http://clojuredocs.org/clojure_core/clojure.core/assoc-in may help you for a nested map. Do not hold on to the head of the line seq and you should be able to process this without too much problem. (if the data structure you build up is itself too large, you may need to write out result data as you see each ReqEnd. In this case, dissoc that record so it can be garbage collected) Dave -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -- I may be wrong or incomplete. Please express any corrections / additions, they are encouraged and appreciated. At least one entity is bound to be transformed if you do ;) -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: Parsing Log Files with Interspersed Multi-line Log Statements
Looks like something I would do in datomic. Parse the log one time, and concurrently put the data into a datomic database and then ... wait I just realized, I don't think we can get the results sorted from the database, because the return from a query is a set of results and your transaction ids are certainly not consecutive - I must investigate On Sun, Oct 21, 2012 at 3:54 AM, Dave d...@sevenventures.net wrote: Clojurists - I'm fairly new to Clojure and didn't realize how broken I've become using imperative languages all my life. I'm stumped as to how to parse a Varnish (www.varnish-cache.org) log file using Clojure. The main problem is that for a single request a varnish log file generates multiple log lines and each line is interspersed with lines from other threads. These log files can be several gigabytes in size (so using a stable sort of the entire log by thread id is out of the question). Below I've included a small example log file and an example output Clojure data structure. Let me thank everyone in advance for any hints / help they can provide on this seemingly simple problem. *Rules of the Varnish Log File* - The first number on each line is the thread id (not unique and gets reused frequently) - Each ReqStart marks the start of a request and the last number on the line is the unique transaction id (e.g. 118591777) - ReqEnd denote the end of the processing of the request by the thread - Each line is atomically written, however many threads generate log lines that are interspersed with other requests (threads) - These log files can be VERY large (10+ Gigabytes in the case of my application) so using a stable sort by thread id or anything that loads the entire file into memory is out of the question. *Example Varnish Log file* 40 ReqEnd c 118591771 1350759605.775758028 1350759611.249602079 5.866879225 5.473801851 0.42200 15 ReqStart c 10.102.41.121 4187 118591777 15 RxRequestc GET 15 RxURLc /json/engagement 15 RxHeader c host: www.example.com 30 ReqStart c 10.102.41.121 3906 118591802 15 RxHeader c Accept: application/json 30 RxRequestc GET 30 RxURLc /ws/boxtops/user/ 30 RxHeader c host: www.example.com 15 ReqEnd c 118591777 1350759605.775758028 1350759611.249602079 5.866879225 5.473801851 0.42200 30 RxHeader c Accept: application/xml 30 ReqEnd c 118591802 1350759611.326084614 1350759611.329720259 0.005002737 0.003598213 0.37432 15 ReqStart c 10.102.41.121 4187 118591808 15 RxRequestc GET 15 RxURLc /ws/boxtops/user/ 30 ReqStart c 10.102.41.121 3906 118591810 15 RxHeader c host: www.example.com 15 RxHeader c Accept: application/xml 30 RxRequestc GET 30 RxURLc /registration/success 30 RxHeader c host: www.example.com 46 TxRequest- GET 30 RxHeader c Accept: text/html 46 TxURL- /registration/success 15 ReqEnd c 118591808 1350759611.442447424 1350759611.444925785 0.016906023 0.002441406 0.36955 30 ReqEnd c 118591810 1350759611.521781683 1350759611.525400877 0.098322868 0.003532171 0.87023 *Desired Output* { 118591802 { :ReqStart [10.102.41.121 3906 118591802] :RxRequest [GET] :RxURL [/ws/boxtops/user/] :RxHeader [host: www.example.com Accept: application/xml] or better yet :RxHeader {:host www.example.com :Accept application/xml} :ReqEnd [118591802 1350759611.326084614 1350759611.329720259 0.005002737 0.003598213 0.37432] } 118591777 { :ReqStart [10.102.41.121 4187 118591777] :RxRequest [GET] :RxURL [/json/engagement] :RxHeader [host: www.example.com Accept: application/json] :ReqEnd [118591777 1350759605.775758028 1350759611.249602079 5.866879225 5.473801851 0.42200 ]} 118591808 { :ReqStart [10.102.41.121 4187 118591808] :RxRequest [GET] :RxURL [/ws/boxtops/user/] :RxHeader [host: www.example.com Accept: application/xml] :ReqEnd [118591808 1350759611.442447424 1350759611.444925785 0.016906023 0.002441406 0.36955] } 118591810 { :ReqStart [10.102.41.121 3906 118591810] :RxRequest [GET] :RxURL [/registration/success] :RxHeader [host: www.example.com Accept: text/html] :ReqEnd [118591810 1350759611.521781683 1350759611.525400877 0.098322868 0.003532171 0.87023] } } -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -- I may be wrong or incomplete. Please
Re: finding intervals quickly
This comes to mind: (set! *warn-on-reflection* true) But you probably already considered it... On Sun, Oct 21, 2012 at 4:27 AM, Brian Craft craft.br...@gmail.com wrote: I have a vector of maps representing intervals with (start, end) coords in one dimension, that are non-overlapping, and sorted. At the moment I'm not using a sorted data type, just a vector. The bottleneck in my code is in searching for intervals that overlap a region, like so: (defn filter-probe-range [probes start end] (filter #(and ( (:end %) start) ( (:start %) end)) probes)) My first thought to speed it up was to add a binary search on the vector, since it's sorted. However, that is wildly slower than doing a full scan with filter. I'm not sure why that is, or how to investigate it. I've looked over the section on performance in The Joy of.., but nothing is really popping out. Should I start with jvisualvm? Is there something better that would identify what I'm doing that's so slow? Also wondering if sorted data types would do this for me for free. Is there a way to do a binary search on a sorted collection? -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -- I may be wrong or incomplete. Please express any corrections / additions, they are encouraged and appreciated. At least one entity is bound to be transformed if you do ;) -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: finding intervals quickly
If you are happy with sorting the data already, then put the pairs into a sorted set, and use `subseq` and `rsubseq` to query. Depending on the size of the data (i.e. how many coordinate pairs you have) and the specifics of your usage (i.e. mostly, how dynamic the dataset is), using a specialized data structure like an interval tree (or k-d or R*/R+ tree if you end up working with multidimensional data) is a must. Cheers, - Chas On Oct 20, 2012, at 10:27 PM, Brian Craft wrote: I have a vector of maps representing intervals with (start, end) coords in one dimension, that are non-overlapping, and sorted. At the moment I'm not using a sorted data type, just a vector. The bottleneck in my code is in searching for intervals that overlap a region, like so: (defn filter-probe-range [probes start end] (filter #(and ( (:end %) start) ( (:start %) end)) probes)) My first thought to speed it up was to add a binary search on the vector, since it's sorted. However, that is wildly slower than doing a full scan with filter. I'm not sure why that is, or how to investigate it. I've looked over the section on performance in The Joy of.., but nothing is really popping out. Should I start with jvisualvm? Is there something better that would identify what I'm doing that's so slow? Also wondering if sorted data types would do this for me for free. Is there a way to do a binary search on a sorted collection? -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
How to Aliasing Java Class Names?
I'm trying to import some classes from a library (jpl), which have names already taken by java.lang classes. When I try to import jpl.Float and jpl.Integer, I get exceptions due to name clashes. Is there a way to alias the class names so that I can use them? -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: How to Aliasing Java Class Names?
As mentioned in a recent thread class name clashes on importing classes of same name from different Java package, you can use the classes with fully qualified names without importing them at all. I am not aware of any way to alias them. Andy On Oct 21, 2012, at 6:57 PM, JvJ wrote: I'm trying to import some classes from a library (jpl), which have names already taken by java.lang classes. When I try to import jpl.Float and jpl.Integer, I get exceptions due to name clashes. Is there a way to alias the class names so that I can use them? -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: Parsing Log Files with Interspersed Multi-line Log Statements
raym - Thanks so much for the code snippet. That was just what I needed to get unstuck and start playing around with the parser. I really appreciate the help. - Dave On Sunday, October 21, 2012 6:10:51 AM UTC-5, raym wrote: As Dave says, you can do this using line-seq, but you'll have to accumulate some state as you read the lines so you can return all the lines for a given thread's ReqStart to ReqEnd. Once you've returned that block, you can delete the state for that thread-id, so your accumulated state will only contain the 'active' requests. If you're processing a very large file, you're best returning a lazy sequence of the data. Something like this should get you started: (require [clojure.java.io :as io]) (defn parse-line [s] (let [[_ thread_id tag value] (re-find #^(\d+)\s+(\S+)\s+(.+)$ s)] [thread_id tag value])) (defn parse-lines ([lines] (parse-lines lines {})) ([lines state] (lazy-seq (when (seq lines) (let [[thread-id tag value] (parse-line (first lines)) state (assoc-in state [thread-id tag] (conj (get-in state [thread-id tag] []) value))] (if (= tag ReqEnd) (cons (get state thread-id) (parse-lines (rest lines) (dissoc state thread-id))) (parse-lines (rest lines) state))) (defn parse-log-file [file] (with-open [logfile (io/reader file)] (doall (filter #(and (get % ReqStart) (get % ReqEnd)) (parse-lines (line-seq logfile)) On 21 October 2012 02:54, Dave da...@sevenventures.net javascript: wrote: Clojurists - I'm fairly new to Clojure and didn't realize how broken I've become using imperative languages all my life. I'm stumped as to how to parse a Varnish (www.varnish-cache.org) log file using Clojure. The main problem is that for a single request a varnish log file generates multiple log lines and each line is interspersed with lines from other threads. These log files can be several gigabytes in size (so using a stable sort of the entire log by thread id is out of the question). Below I've included a small example log file and an example output Clojure data structure. Let me thank everyone in advance for any hints / help they can provide on this seemingly simple problem. Rules of the Varnish Log File The first number on each line is the thread id (not unique and gets reused frequently) Each ReqStart marks the start of a request and the last number on the line is the unique transaction id (e.g. 118591777) ReqEnd denote the end of the processing of the request by the thread Each line is atomically written, however many threads generate log lines that are interspersed with other requests (threads) These log files can be VERY large (10+ Gigabytes in the case of my application) so using a stable sort by thread id or anything that loads the entire file into memory is out of the question. Example Varnish Log file 40 ReqEnd c 118591771 1350759605.775758028 1350759611.249602079 5.866879225 5.473801851 0.42200 15 ReqStart c 10.102.41.121 4187 118591777 15 RxRequestc GET 15 RxURLc /json/engagement 15 RxHeader c host: www.example.com 30 ReqStart c 10.102.41.121 3906 118591802 15 RxHeader c Accept: application/json 30 RxRequestc GET 30 RxURLc /ws/boxtops/user/ 30 RxHeader c host: www.example.com 15 ReqEnd c 118591777 1350759605.775758028 1350759611.249602079 5.866879225 5.473801851 0.42200 30 RxHeader c Accept: application/xml 30 ReqEnd c 118591802 1350759611.326084614 1350759611.329720259 0.005002737 0.003598213 0.37432 15 ReqStart c 10.102.41.121 4187 118591808 15 RxRequestc GET 15 RxURLc /ws/boxtops/user/ 30 ReqStart c 10.102.41.121 3906 118591810 15 RxHeader c host: www.example.com 15 RxHeader c Accept: application/xml 30 RxRequestc GET 30 RxURLc /registration/success 30 RxHeader c host: www.example.com 46 TxRequest- GET 30 RxHeader c Accept: text/html 46 TxURL- /registration/success 15 ReqEnd c 118591808 1350759611.442447424 1350759611.444925785 0.016906023 0.002441406 0.36955 30 ReqEnd c 118591810 1350759611.521781683 1350759611.525400877 0.098322868 0.003532171 0.87023 Desired Output { 118591802 { :ReqStart [10.102.41.121 3906 118591802] :RxRequest [GET] :RxURL [/ws/boxtops/user/] :RxHeader [host: www.example.com Accept: application/xml] or better yet :RxHeader {:host www.example.com :Accept application/xml} :ReqEnd [118591802 1350759611.326084614 1350759611.329720259