Re: CLJS: How can you find the version of clojurescript that you're running?

2012-10-21 Thread AtKaaZ
**clojure-version* *
{:major 1, :minor 4, :incremental 0, :qualifier nil}


On Sun, Oct 21, 2012 at 2:16 AM, Frank Siebenlist 
frank.siebenl...@gmail.com wrote:

 When you have different versions of clojurescript in the dependencies of
 your main project, how do you ask the repl what version it is running with…
 is there any easy function/var that I overlooked?

 Thanks, FrankS.

 --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with
 your first post.
 To unsubscribe from this group, send email to
 clojure+unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/clojure?hl=en




-- 
I may be wrong or incomplete.
Please express any corrections / additions,
they are encouraged and appreciated.
At least one entity is bound to be transformed if you do ;)

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Parsing Log Files with Interspersed Multi-line Log Statements

2012-10-21 Thread Dave
Clojurists - I'm fairly new to Clojure and didn't realize how broken I've 
become using imperative languages all my life.  I'm stumped as to how to 
parse a Varnish (www.varnish-cache.org) log file using Clojure.  The main 
problem is that for a single request a varnish log file generates multiple 
log lines and each line is interspersed with lines from other threads. 
 These log files can be several gigabytes in size (so using a stable sort 
of the entire log by thread id is out of the question).  

Below I've included a small example log file and an example output Clojure 
data structure.  Let me thank everyone in advance for any hints / help they 
can provide on this seemingly simple problem.

*Rules of the Varnish Log File*

   - The first number on each line is the thread id (not unique and gets 
   reused frequently)
   - Each ReqStart marks the start of a request and the last number on the 
   line is the unique transaction id (e.g. 118591777)
   - ReqEnd denote the end of the processing of the request by the thread
   - Each line is atomically written, however many threads generate log 
   lines that are interspersed with other requests (threads)
   - These log files can be VERY large (10+ Gigabytes in the case of my 
   application) so using a stable sort by thread id or anything that loads the 
   entire file into memory is out of the question.


*Example Varnish Log file*
   40 ReqEnd   c 118591771 1350759605.775758028 1350759611.249602079 
5.866879225 5.473801851 0.42200
   15 ReqStart c 10.102.41.121 4187 118591777
   15 RxRequestc GET
   15 RxURLc /json/engagement
   15 RxHeader c host: www.example.com
   30 ReqStart c 10.102.41.121 3906 118591802
   15 RxHeader c Accept: application/json
   30 RxRequestc GET
   30 RxURLc /ws/boxtops/user/
   30 RxHeader c host: www.example.com
   15 ReqEnd   c 118591777 1350759605.775758028 1350759611.249602079 
5.866879225 5.473801851 0.42200
   30 RxHeader c Accept: application/xml
   30 ReqEnd   c 118591802 1350759611.326084614 1350759611.329720259 
0.005002737 0.003598213 0.37432
   15 ReqStart c 10.102.41.121 4187 118591808
   15 RxRequestc GET
   15 RxURLc /ws/boxtops/user/
   30 ReqStart c 10.102.41.121 3906 118591810
   15 RxHeader c host: www.example.com
   15 RxHeader c Accept: application/xml
   30 RxRequestc GET
   30 RxURLc /registration/success
   30 RxHeader c host: www.example.com
   46 TxRequest- GET
   30 RxHeader c Accept: text/html
   46 TxURL- /registration/success
   15 ReqEnd   c 118591808 1350759611.442447424 1350759611.444925785 
0.016906023 0.002441406 0.36955
   30 ReqEnd   c 118591810 1350759611.521781683 1350759611.525400877 
0.098322868 0.003532171 0.87023

*Desired Output*
{ 
  118591802 
  { :ReqStart [10.102.41.121 3906 118591802] 
:RxRequest [GET]
:RxURL [/ws/boxtops/user/]
:RxHeader [host: www.example.com Accept: application/xml] 
  or better yet
:RxHeader {:host www.example.com :Accept application/xml}
:ReqEnd [118591802 1350759611.326084614 1350759611.329720259 
0.005002737 0.003598213 0.37432] }
  118591777
  { :ReqStart [10.102.41.121 4187 118591777]
:RxRequest [GET]
:RxURL [/json/engagement]
:RxHeader [host: www.example.com Accept: application/json] 
:ReqEnd [118591777 1350759605.775758028 1350759611.249602079 
5.866879225 5.473801851 0.42200 ]}
  118591808
  { :ReqStart [10.102.41.121 4187 118591808]
:RxRequest [GET]
:RxURL [/ws/boxtops/user/]
:RxHeader [host: www.example.com Accept: application/xml] 
:ReqEnd [118591808 1350759611.442447424 1350759611.444925785 
0.016906023 0.002441406 0.36955] }
  118591810
  { :ReqStart [10.102.41.121 3906 118591810]
:RxRequest [GET]
:RxURL [/registration/success]
:RxHeader [host: www.example.com Accept: text/html] 
:ReqEnd [118591810 1350759611.521781683 1350759611.525400877 
0.098322868 0.003532171 0.87023] }
}

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Parsing Log Files with Interspersed Multi-line Log Statements

2012-10-21 Thread Dave Sann
Look at:

http://clojuredocs.org/clojure_core/clojure.core/line-seq to get a lazy 
sequence of lines of the file.

I don't think that there is any need to sort here. (I think sorting wont 
help anyway because some lines are seem to be only identifiable based on 
the thread id in the current sequence)

Process each line and build up the result data structure as you go. 
http://clojuredocs.org/clojure_core/clojure.core/assoc-in may help you for 
a nested map.
Do not hold on to the head of the line seq and you should be able to 
process this without too much problem.

(if the data structure you build up is itself too large, you may need to 
write out result data as you see each ReqEnd. In this case, dissoc that 
record so it can be garbage collected)

Dave


-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Parsing Log Files with Interspersed Multi-line Log Statements

2012-10-21 Thread Dave Sann
btw,

start simple and just see if you can scan the lines without doing anything 
in particular.
Then take some sub sequence:  (take 100 my-line-seq)
Play in the repl to start building up the data you want. See what you get 
and work from there.

D

On Sunday, 21 October 2012 19:05:42 UTC+11, Dave Sann wrote:

 Look at:

 http://clojuredocs.org/clojure_core/clojure.core/line-seq to get a lazy 
 sequence of lines of the file.

 I don't think that there is any need to sort here. (I think sorting wont 
 help anyway because some lines are seem to be only identifiable based on 
 the thread id in the current sequence)

 Process each line and build up the result data structure as you go. 
 http://clojuredocs.org/clojure_core/clojure.core/assoc-in may help you 
 for a nested map.
 Do not hold on to the head of the line seq and you should be able to 
 process this without too much problem.

 (if the data structure you build up is itself too large, you may need to 
 write out result data as you see each ReqEnd. In this case, dissoc that 
 record so it can be garbage collected)

 Dave




-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Parsing Log Files with Interspersed Multi-line Log Statements

2012-10-21 Thread Ray Miller
As Dave says, you can do this using line-seq, but you'll have to
accumulate some state as you read the lines so you can return all the
lines for a given thread's ReqStart to ReqEnd. Once you've returned
that block, you can delete the state for that thread-id, so your
accumulated state will only contain the 'active' requests. If you're
processing a very large file, you're best returning a lazy sequence of
the data.

Something like this should get you started:

(require [clojure.java.io :as io])

(defn parse-line
  [s]
  (let [[_ thread_id tag value] (re-find #^(\d+)\s+(\S+)\s+(.+)$ s)]
[thread_id tag value]))

(defn parse-lines
  ([lines]
 (parse-lines lines {}))
  ([lines state]
 (lazy-seq
  (when (seq lines)
(let [[thread-id tag value] (parse-line (first lines))
  state (assoc-in state [thread-id tag]
  (conj (get-in state [thread-id tag] []) value))]
  (if (= tag ReqEnd)
(cons (get state thread-id) (parse-lines (rest lines)
(dissoc state thread-id)))
(parse-lines (rest lines) state)))

(defn parse-log-file
  [file]
  (with-open [logfile (io/reader file)]
(doall
 (filter #(and (get % ReqStart) (get % ReqEnd))
 (parse-lines (line-seq logfile))


On 21 October 2012 02:54, Dave d...@sevenventures.net wrote:
 Clojurists - I'm fairly new to Clojure and didn't realize how broken I've
 become using imperative languages all my life.  I'm stumped as to how to
 parse a Varnish (www.varnish-cache.org) log file using Clojure.  The main
 problem is that for a single request a varnish log file generates multiple
 log lines and each line is interspersed with lines from other threads.
 These log files can be several gigabytes in size (so using a stable sort of
 the entire log by thread id is out of the question).

 Below I've included a small example log file and an example output Clojure
 data structure.  Let me thank everyone in advance for any hints / help they
 can provide on this seemingly simple problem.

 Rules of the Varnish Log File

 The first number on each line is the thread id (not unique and gets reused
 frequently)
 Each ReqStart marks the start of a request and the last number on the line
 is the unique transaction id (e.g. 118591777)
 ReqEnd denote the end of the processing of the request by the thread
 Each line is atomically written, however many threads generate log lines
 that are interspersed with other requests (threads)
 These log files can be VERY large (10+ Gigabytes in the case of my
 application) so using a stable sort by thread id or anything that loads the
 entire file into memory is out of the question.


 Example Varnish Log file
40 ReqEnd   c 118591771 1350759605.775758028 1350759611.249602079
 5.866879225 5.473801851 0.42200
15 ReqStart c 10.102.41.121 4187 118591777
15 RxRequestc GET
15 RxURLc /json/engagement
15 RxHeader c host: www.example.com
30 ReqStart c 10.102.41.121 3906 118591802
15 RxHeader c Accept: application/json
30 RxRequestc GET
30 RxURLc /ws/boxtops/user/
30 RxHeader c host: www.example.com
15 ReqEnd   c 118591777 1350759605.775758028 1350759611.249602079
 5.866879225 5.473801851 0.42200
30 RxHeader c Accept: application/xml
30 ReqEnd   c 118591802 1350759611.326084614 1350759611.329720259
 0.005002737 0.003598213 0.37432
15 ReqStart c 10.102.41.121 4187 118591808
15 RxRequestc GET
15 RxURLc /ws/boxtops/user/
30 ReqStart c 10.102.41.121 3906 118591810
15 RxHeader c host: www.example.com
15 RxHeader c Accept: application/xml
30 RxRequestc GET
30 RxURLc /registration/success
30 RxHeader c host: www.example.com
46 TxRequest- GET
30 RxHeader c Accept: text/html
46 TxURL- /registration/success
15 ReqEnd   c 118591808 1350759611.442447424 1350759611.444925785
 0.016906023 0.002441406 0.36955
30 ReqEnd   c 118591810 1350759611.521781683 1350759611.525400877
 0.098322868 0.003532171 0.87023

 Desired Output
 {
   118591802
   { :ReqStart [10.102.41.121 3906 118591802]
 :RxRequest [GET]
 :RxURL [/ws/boxtops/user/]
 :RxHeader [host: www.example.com Accept: application/xml]
   or better yet
 :RxHeader {:host www.example.com :Accept application/xml}
 :ReqEnd [118591802 1350759611.326084614 1350759611.329720259
 0.005002737 0.003598213 0.37432] }
   118591777
   { :ReqStart [10.102.41.121 4187 118591777]
 :RxRequest [GET]
 :RxURL [/json/engagement]
 :RxHeader [host: www.example.com Accept: application/json]
 :ReqEnd [118591777 1350759605.775758028 1350759611.249602079
 5.866879225 5.473801851 0.42200 ]}
   118591808
   { :ReqStart [10.102.41.121 4187 118591808]
 :RxRequest [GET]
 :RxURL [/ws/boxtops/user/]
 :RxHeader [host: www.example.com 

Re: Write stream to two outputs

2012-10-21 Thread Petar Radosevic
Hi Herwig,

Thanks for your help.

Herwig Hochleitner writes:
 (defn copy-multi
   ([input outputs] (copy-multi input outputs (make-array Byte/TYPE 1024)))
   ([^java.io.InputStream input outputs buffer]
  (let [size (.read input buffer)]
(when (pos? size)
  (doseq [^java.io.OutputStrean output outputs]
(.write output buffer 0 size))
  (recur input outputs buffer)

I tried using the above as follows:

(defn copy-multi
  ([input outputs] (copy-multi input outputs (make-array Byte/TYPE 1024)))
  ([^java.io.OutputStream input outputs buffer]
 (let [size (.read input buffer)]
   (when (pos? size)
 (doseq [^java.io.OutputStream output outputs]
   (.write output buffer 0 size))
 (recur input outputs buffer)

(defn save-to-disk [file]
  Saves a temporary file to disk to further analyse it
  (info Saving file to disk)
  (let [pipe-in (java.io.PipedInputStream.)
pipe-out (java.io.PipedOutputStream. pipe-in)]
(future
  (with-open [in (:body file)
  file-out (io/output-stream /tmp/test.txt)]
(copy-multi in [file-out pipe-out])))
(update-in file [:body] pipe-in)))

I added `future` so that the pipe-in is read and the pipe-out doesn't
block. However, the program returns before anything is copied. Without
`future` the pipe-in is never read out and the pipe-out blocks. So
either to fast or it never completes.

I'm beginning to think that I'm trying to solve the problem (streaming
one input to file and S3 simultaneously) in a wrong way. Any suggestions?
-- 
Petar Radosevic | @wunki

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: CLJS: How can you find the version of clojurescript that you're running?

2012-10-21 Thread David Nolen
There is not. That would be useful.

On Saturday, October 20, 2012, Frank Siebenlist wrote:

 When you have different versions of clojurescript in the dependencies of
 your main project, how do you ask the repl what version it is running with…
 is there any easy function/var that I overlooked?

 Thanks, FrankS.

 --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.comjavascript:;
 Note that posts from new members are moderated - please be patient with
 your first post.
 To unsubscribe from this group, send email to
 clojure+unsubscr...@googlegroups.com javascript:;
 For more options, visit this group at
 http://groups.google.com/group/clojure?hl=en


-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Write stream to two outputs

2012-10-21 Thread Herwig Hochleitner
Streaming simultaneously can result in all kinds of problems due
differences in speed and reliablility between a disk and network
connection. So if you can, buffer the data for the write to S3. Maybe you
could just stream to S3 from the file after you've written it?

If you must stream simultaneously, don't try any magic with
pipedinputstream. Just use:

(with-open [in (open-input)
file (open-file)
s3 (open-s3)]
  (copy-multi in [file s3]))

And beware that if someone pulls the network plug during the operation, the
file will not be fully written either.

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: finding intervals quickly

2012-10-21 Thread Brian Craft
Answering my own question, my issues were 1) overlooking a couple seqs that 
were thousands of times slower than vectors, and 2) using / instead of 
quot. Just found this video:

http://www.infoq.com/presentations/Crunching-Numbers-Clojure

I'm finding that seqs are pretty much always too slow, but they are 
returned by many of the core functions. Consequently my code is getting 
littered with (vec ...) calls. Am I doing this wrong? Does everyone have 
(vec ...) calls all over their code? 

On Saturday, October 20, 2012 7:27:56 PM UTC-7, Brian Craft wrote:

 I have a vector of maps representing intervals with (start, end) coords in 
 one dimension, that are non-overlapping, and sorted. At the moment I'm not 
 using a sorted data type, just a vector.

 The bottleneck in my code is in searching for intervals that overlap a 
 region, like so:

 (defn filter-probe-range [probes start end]
   (filter #(and ( (:end %) start) ( (:start %) end)) probes))

 My first thought to speed it up was to add a binary search on the vector, 
 since it's sorted. However, that is wildly slower than doing a full scan 
 with filter. I'm not sure why that is, or how to investigate it. I've 
 looked over the section on performance in The Joy of.., but nothing is 
 really popping out. Should I start with jvisualvm? Is there something 
 better that would identify what I'm doing that's so slow?

 Also wondering if sorted data types would do this for me for free. Is 
 there a way to do a binary search on a sorted collection?


-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: ANN: ClojureScript release 0.0-1513

2012-10-21 Thread Shantanu Kumar


On Oct 20, 12:52 am, Stuart Sierra the.stuart.sie...@gmail.com
wrote:
 ClojureScript release 0.0-1513 is on its way to the Maven Central
 Repository.

 Changes:http://build.clojure.org/job/clojurescript-release/18/

This release (via lein-cljsbuild 0.2.9) broke one of my projects,
which worked with 0.0-1503 (lein-cljsbuild 0.2.8). I am not yet able
to isolate the issue into a small reproducible test-case, but below is
how you can see it in effect:

1. $ git clone g...@github.com:kumarshantanu/basil.git
2. $ cd basil
3. $ git checkout 06cd1f0  # latest commit as of now
4. $ # make sure you have PhantomJS 1.6 or higher
5. $ lein dev do clean, test  # stack traces may help
6. Open `run-tests.html` in Firefox and see console

If you edit project.clj and replace lein-cljsbuild version 0.2.9 with
0.2.8, all tests pass. I tried debugging the issue using Firebug and
it mentioned 'variable scoping error', but I don't think I am close to
the actual cause. Sharing it here hoping someone can throw pointers.

Shantanu

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: ANN: ClojureScript release 0.0-1513

2012-10-21 Thread Shantanu Kumar
 1. $ git clone g...@github.com:kumarshantanu/basil.git

Or whichever URL is convenient for this project:

https://github.com/kumarshantanu/basil

Shantanu

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: ANN: ClojureScript release 0.0-1513

2012-10-21 Thread David Nolen
Please isolate the commit by using the lein checkouts feature - you can use
the ClojureScript repo directly then and use git bisect to determine the
exact commit that broke your build. Thanks.

On Sunday, October 21, 2012, Shantanu Kumar wrote:



 On Oct 20, 12:52 am, Stuart Sierra the.stuart.sie...@gmail.comjavascript:;
 
 wrote:
  ClojureScript release 0.0-1513 is on its way to the Maven Central
  Repository.
 
  Changes:http://build.clojure.org/job/clojurescript-release/18/

 This release (via lein-cljsbuild 0.2.9) broke one of my projects,
 which worked with 0.0-1503 (lein-cljsbuild 0.2.8). I am not yet able
 to isolate the issue into a small reproducible test-case, but below is
 how you can see it in effect:

 1. $ git clone g...@github.com:kumarshantanu/basil.git
 2. $ cd basil
 3. $ git checkout 06cd1f0  # latest commit as of now
 4. $ # make sure you have PhantomJS 1.6 or higher
 5. $ lein dev do clean, test  # stack traces may help
 6. Open `run-tests.html` in Firefox and see console

 If you edit project.clj and replace lein-cljsbuild version 0.2.9 with
 0.2.8, all tests pass. I tried debugging the issue using Firebug and
 it mentioned 'variable scoping error', but I don't think I am close to
 the actual cause. Sharing it here hoping someone can throw pointers.

 Shantanu

 --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.comjavascript:;
 Note that posts from new members are moderated - please be patient with
 your first post.
 To unsubscribe from this group, send email to
 clojure+unsubscr...@googlegroups.com javascript:;
 For more options, visit this group at
 http://groups.google.com/group/clojure?hl=en


-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

CLJS-402: Re: CLJS: How can you find the version of clojurescript that you're running?

2012-10-21 Thread Frank Siebenlist
I've created an JIRA issue for this:

http://dev.clojure.org/jira/browse/CLJS-402

and added a patch for the build script: 
---
Auto-generation from the build script of version_autogen.clj and 
version_autogen.cljs files
that both define the cljs.version-autogen/clojurescript-version 
with the current version info for both the clj and cljs environments
---

Feels a little crude, but it works…:

---
user= (require 'cljs.version-autogen)
nil
user= cljs.version-autogen/*clojurescript-version*
{:minor 0, :incremental 1514, :major 0}
user= (run-repl-listen)
Type:  :cljs/quit  to quit
ClojureScript:cljs.user (load-namespace 'cljs.version-autogen)

ClojureScript:cljs.user cljs.version-autogen/*clojurescript-version*
{:incremental 1514, :major 0, :minor 0}
ClojureScript:cljs.user
---

Any alternative approaches that would be more elegant?

-FrankS.


On Oct 21, 2012, at 8:02 AM, David Nolen dnolen.li...@gmail.com wrote:

 There is not. That would be useful.
 
 On Saturday, October 20, 2012, Frank Siebenlist wrote:
 When you have different versions of clojurescript in the dependencies of your 
 main project, how do you ask the repl what version it is running with… is 
 there any easy function/var that I overlooked?
 
 Thanks, FrankS.
 
 --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with your 
 first post.
 To unsubscribe from this group, send email to
 clojure+unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/clojure?hl=en
 
 -- 
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with your 
 first post.
 To unsubscribe from this group, send email to
 clojure+unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/clojure?hl=en

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: Parsing Log Files with Interspersed Multi-line Log Statements

2012-10-21 Thread AtKaaZ
I assumed he needs the results sorted by transaction id, if he doesn't then
it should be quite simple

On Sun, Oct 21, 2012 at 10:05 AM, Dave Sann daves...@gmail.com wrote:

 Look at:

 http://clojuredocs.org/clojure_core/clojure.core/line-seq to get a lazy
 sequence of lines of the file.

 I don't think that there is any need to sort here. (I think sorting wont
 help anyway because some lines are seem to be only identifiable based on
 the thread id in the current sequence)

 Process each line and build up the result data structure as you go.
 http://clojuredocs.org/clojure_core/clojure.core/assoc-in may help you
 for a nested map.
 Do not hold on to the head of the line seq and you should be able to
 process this without too much problem.

 (if the data structure you build up is itself too large, you may need to
 write out result data as you see each ReqEnd. In this case, dissoc that
 record so it can be garbage collected)

 Dave


  --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with
 your first post.
 To unsubscribe from this group, send email to
 clojure+unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/clojure?hl=en




-- 
I may be wrong or incomplete.
Please express any corrections / additions,
they are encouraged and appreciated.
At least one entity is bound to be transformed if you do ;)

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Parsing Log Files with Interspersed Multi-line Log Statements

2012-10-21 Thread AtKaaZ
Looks like something I would do in datomic. Parse the log one time, and
concurrently put the data into a datomic database and then ... wait I just
realized, I don't think we can get the results sorted from the database,
because the return from a query is a set of results and your transaction
ids are certainly not consecutive - I must investigate



On Sun, Oct 21, 2012 at 3:54 AM, Dave d...@sevenventures.net wrote:

 Clojurists - I'm fairly new to Clojure and didn't realize how broken I've
 become using imperative languages all my life.  I'm stumped as to how to
 parse a Varnish (www.varnish-cache.org) log file using Clojure.  The main
 problem is that for a single request a varnish log file generates multiple
 log lines and each line is interspersed with lines from other threads.
  These log files can be several gigabytes in size (so using a stable sort
 of the entire log by thread id is out of the question).

 Below I've included a small example log file and an example output Clojure
 data structure.  Let me thank everyone in advance for any hints / help they
 can provide on this seemingly simple problem.

 *Rules of the Varnish Log File*

- The first number on each line is the thread id (not unique and gets
reused frequently)
- Each ReqStart marks the start of a request and the last number on
the line is the unique transaction id (e.g. 118591777)
- ReqEnd denote the end of the processing of the request by the thread
- Each line is atomically written, however many threads generate log
lines that are interspersed with other requests (threads)
- These log files can be VERY large (10+ Gigabytes in the case of my
application) so using a stable sort by thread id or anything that loads the
entire file into memory is out of the question.


 *Example Varnish Log file*
40 ReqEnd   c 118591771 1350759605.775758028 1350759611.249602079
 5.866879225 5.473801851 0.42200
15 ReqStart c 10.102.41.121 4187 118591777
15 RxRequestc GET
15 RxURLc /json/engagement
15 RxHeader c host: www.example.com
30 ReqStart c 10.102.41.121 3906 118591802
15 RxHeader c Accept: application/json
30 RxRequestc GET
30 RxURLc /ws/boxtops/user/
30 RxHeader c host: www.example.com
15 ReqEnd   c 118591777 1350759605.775758028 1350759611.249602079
 5.866879225 5.473801851 0.42200
30 RxHeader c Accept: application/xml
30 ReqEnd   c 118591802 1350759611.326084614 1350759611.329720259
 0.005002737 0.003598213 0.37432
15 ReqStart c 10.102.41.121 4187 118591808
15 RxRequestc GET
15 RxURLc /ws/boxtops/user/
30 ReqStart c 10.102.41.121 3906 118591810
15 RxHeader c host: www.example.com
15 RxHeader c Accept: application/xml
30 RxRequestc GET
30 RxURLc /registration/success
30 RxHeader c host: www.example.com
46 TxRequest- GET
30 RxHeader c Accept: text/html
46 TxURL- /registration/success
15 ReqEnd   c 118591808 1350759611.442447424 1350759611.444925785
 0.016906023 0.002441406 0.36955
30 ReqEnd   c 118591810 1350759611.521781683 1350759611.525400877
 0.098322868 0.003532171 0.87023

 *Desired Output*
 {
   118591802
   { :ReqStart [10.102.41.121 3906 118591802]
 :RxRequest [GET]
 :RxURL [/ws/boxtops/user/]
 :RxHeader [host: www.example.com Accept: application/xml]
   or better yet
 :RxHeader {:host www.example.com :Accept application/xml}
 :ReqEnd [118591802 1350759611.326084614 1350759611.329720259
 0.005002737 0.003598213 0.37432] }
   118591777
   { :ReqStart [10.102.41.121 4187 118591777]
 :RxRequest [GET]
 :RxURL [/json/engagement]
 :RxHeader [host: www.example.com Accept: application/json]
 :ReqEnd [118591777 1350759605.775758028 1350759611.249602079
 5.866879225 5.473801851 0.42200 ]}
   118591808
   { :ReqStart [10.102.41.121 4187 118591808]
 :RxRequest [GET]
 :RxURL [/ws/boxtops/user/]
 :RxHeader [host: www.example.com Accept: application/xml]
 :ReqEnd [118591808 1350759611.442447424 1350759611.444925785
 0.016906023 0.002441406 0.36955] }
   118591810
   { :ReqStart [10.102.41.121 3906 118591810]
 :RxRequest [GET]
 :RxURL [/registration/success]
 :RxHeader [host: www.example.com Accept: text/html]
 :ReqEnd [118591810 1350759611.521781683 1350759611.525400877
 0.098322868 0.003532171 0.87023] }
 }

  --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with
 your first post.
 To unsubscribe from this group, send email to
 clojure+unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/clojure?hl=en




-- 
I may be wrong or incomplete.
Please 

Re: finding intervals quickly

2012-10-21 Thread AtKaaZ
This comes to mind:
(set! *warn-on-reflection* true)
But you probably already considered it...

On Sun, Oct 21, 2012 at 4:27 AM, Brian Craft craft.br...@gmail.com wrote:

 I have a vector of maps representing intervals with (start, end) coords in
 one dimension, that are non-overlapping, and sorted. At the moment I'm not
 using a sorted data type, just a vector.

 The bottleneck in my code is in searching for intervals that overlap a
 region, like so:

 (defn filter-probe-range [probes start end]
   (filter #(and ( (:end %) start) ( (:start %) end)) probes))

 My first thought to speed it up was to add a binary search on the vector,
 since it's sorted. However, that is wildly slower than doing a full scan
 with filter. I'm not sure why that is, or how to investigate it. I've
 looked over the section on performance in The Joy of.., but nothing is
 really popping out. Should I start with jvisualvm? Is there something
 better that would identify what I'm doing that's so slow?

 Also wondering if sorted data types would do this for me for free. Is
 there a way to do a binary search on a sorted collection?

 --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with
 your first post.
 To unsubscribe from this group, send email to
 clojure+unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/clojure?hl=en




-- 
I may be wrong or incomplete.
Please express any corrections / additions,
they are encouraged and appreciated.
At least one entity is bound to be transformed if you do ;)

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: finding intervals quickly

2012-10-21 Thread Chas Emerick
If you are happy with sorting the data already, then put the pairs into a 
sorted set, and use `subseq` and `rsubseq` to query.

Depending on the size of the data (i.e. how many coordinate pairs you have) and 
the specifics of your usage (i.e. mostly, how dynamic the dataset is), using a 
specialized data structure like an interval tree (or k-d or R*/R+ tree if you 
end up working with multidimensional data) is a must.

Cheers,

- Chas

On Oct 20, 2012, at 10:27 PM, Brian Craft wrote:

 I have a vector of maps representing intervals with (start, end) coords in 
 one dimension, that are non-overlapping, and sorted. At the moment I'm not 
 using a sorted data type, just a vector.
 
 The bottleneck in my code is in searching for intervals that overlap a 
 region, like so:
 
 (defn filter-probe-range [probes start end]
   (filter #(and ( (:end %) start) ( (:start %) end)) probes))
 
 My first thought to speed it up was to add a binary search on the vector, 
 since it's sorted. However, that is wildly slower than doing a full scan with 
 filter. I'm not sure why that is, or how to investigate it. I've looked over 
 the section on performance in The Joy of.., but nothing is really popping 
 out. Should I start with jvisualvm? Is there something better that would 
 identify what I'm doing that's so slow?
 
 Also wondering if sorted data types would do this for me for free. Is there a 
 way to do a binary search on a sorted collection?
 
 -- 
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with your 
 first post.
 To unsubscribe from this group, send email to
 clojure+unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/clojure?hl=en

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

How to Aliasing Java Class Names?

2012-10-21 Thread JvJ
I'm trying to import some classes from a library (jpl), which have names 
already taken by java.lang classes.

When I try to import jpl.Float and jpl.Integer, I get exceptions due to 
name clashes.  Is there a way to alias the class names so that I can use 
them?

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: How to Aliasing Java Class Names?

2012-10-21 Thread Andy Fingerhut
As mentioned in a recent thread class name clashes on importing classes of 
same name from different Java package, you can use the classes with fully 
qualified names without importing them at all.

I am not aware of any way to alias them.

Andy

On Oct 21, 2012, at 6:57 PM, JvJ wrote:

 I'm trying to import some classes from a library (jpl), which have names 
 already taken by java.lang classes.
 
 When I try to import jpl.Float and jpl.Integer, I get exceptions due to name 
 clashes.  Is there a way to alias the class names so that I can use them?

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: Parsing Log Files with Interspersed Multi-line Log Statements

2012-10-21 Thread Dave
raym - Thanks so much for the code snippet.  That was just what I needed to 
get unstuck and start playing around with the parser.  I really appreciate 
the help.

- Dave

On Sunday, October 21, 2012 6:10:51 AM UTC-5, raym wrote:

 As Dave says, you can do this using line-seq, but you'll have to 
 accumulate some state as you read the lines so you can return all the 
 lines for a given thread's ReqStart to ReqEnd. Once you've returned 
 that block, you can delete the state for that thread-id, so your 
 accumulated state will only contain the 'active' requests. If you're 
 processing a very large file, you're best returning a lazy sequence of 
 the data. 

 Something like this should get you started: 

 (require [clojure.java.io :as io]) 

 (defn parse-line 
   [s] 
   (let [[_ thread_id tag value] (re-find #^(\d+)\s+(\S+)\s+(.+)$ s)] 
 [thread_id tag value])) 

 (defn parse-lines 
   ([lines] 
  (parse-lines lines {})) 
   ([lines state] 
  (lazy-seq 
   (when (seq lines) 
 (let [[thread-id tag value] (parse-line (first lines)) 
   state (assoc-in state [thread-id tag] 
   (conj (get-in state [thread-id tag] []) 
 value))] 
   (if (= tag ReqEnd) 
 (cons (get state thread-id) (parse-lines (rest lines) 
 (dissoc state thread-id))) 
 (parse-lines (rest lines) state))) 

 (defn parse-log-file 
   [file] 
   (with-open [logfile (io/reader file)] 
 (doall 
  (filter #(and (get % ReqStart) (get % ReqEnd)) 
  (parse-lines (line-seq logfile)) 


 On 21 October 2012 02:54, Dave da...@sevenventures.net javascript: 
 wrote: 
  Clojurists - I'm fairly new to Clojure and didn't realize how broken 
 I've 
  become using imperative languages all my life.  I'm stumped as to how to 
  parse a Varnish (www.varnish-cache.org) log file using Clojure.  The 
 main 
  problem is that for a single request a varnish log file generates 
 multiple 
  log lines and each line is interspersed with lines from other threads. 
  These log files can be several gigabytes in size (so using a stable sort 
 of 
  the entire log by thread id is out of the question). 
  
  Below I've included a small example log file and an example output 
 Clojure 
  data structure.  Let me thank everyone in advance for any hints / help 
 they 
  can provide on this seemingly simple problem. 
  
  Rules of the Varnish Log File 
  
  The first number on each line is the thread id (not unique and gets 
 reused 
  frequently) 
  Each ReqStart marks the start of a request and the last number on the 
 line 
  is the unique transaction id (e.g. 118591777) 
  ReqEnd denote the end of the processing of the request by the thread 
  Each line is atomically written, however many threads generate log lines 
  that are interspersed with other requests (threads) 
  These log files can be VERY large (10+ Gigabytes in the case of my 
  application) so using a stable sort by thread id or anything that loads 
 the 
  entire file into memory is out of the question. 
  
  
  Example Varnish Log file 
 40 ReqEnd   c 118591771 1350759605.775758028 1350759611.249602079 
  5.866879225 5.473801851 0.42200 
 15 ReqStart c 10.102.41.121 4187 118591777 
 15 RxRequestc GET 
 15 RxURLc /json/engagement 
 15 RxHeader c host: www.example.com 
 30 ReqStart c 10.102.41.121 3906 118591802 
 15 RxHeader c Accept: application/json 
 30 RxRequestc GET 
 30 RxURLc /ws/boxtops/user/ 
 30 RxHeader c host: www.example.com 
 15 ReqEnd   c 118591777 1350759605.775758028 1350759611.249602079 
  5.866879225 5.473801851 0.42200 
 30 RxHeader c Accept: application/xml 
 30 ReqEnd   c 118591802 1350759611.326084614 1350759611.329720259 
  0.005002737 0.003598213 0.37432 
 15 ReqStart c 10.102.41.121 4187 118591808 
 15 RxRequestc GET 
 15 RxURLc /ws/boxtops/user/ 
 30 ReqStart c 10.102.41.121 3906 118591810 
 15 RxHeader c host: www.example.com 
 15 RxHeader c Accept: application/xml 
 30 RxRequestc GET 
 30 RxURLc /registration/success 
 30 RxHeader c host: www.example.com 
 46 TxRequest- GET 
 30 RxHeader c Accept: text/html 
 46 TxURL- /registration/success 
 15 ReqEnd   c 118591808 1350759611.442447424 1350759611.444925785 
  0.016906023 0.002441406 0.36955 
 30 ReqEnd   c 118591810 1350759611.521781683 1350759611.525400877 
  0.098322868 0.003532171 0.87023 
  
  Desired Output 
  { 
118591802 
{ :ReqStart [10.102.41.121 3906 118591802] 
  :RxRequest [GET] 
  :RxURL [/ws/boxtops/user/] 
  :RxHeader [host: www.example.com Accept: application/xml] 
or better yet 
  :RxHeader {:host www.example.com :Accept application/xml} 
  :ReqEnd [118591802 1350759611.326084614 1350759611.329720259