Raul,
I'm forking the discussion to the general programming forum.
I rather provide the percentile list as an argument. Explicitly, I can:
percentiles =: dyad : '(x >.@* <:@# y){/:~y'
So:
(0.01 0.05 0.25 0.50 0.75 0.95 0.99) percentiles ?#~1e6
9910 49979 249913 499397 749931 950080 990111
But I wondered how to do it tacitly?
I would need a fork of with both the dyadic (>.@* <:@#) and the monadic
(/:~) parts. Can I combine them in a single tacit form?
Yoel
On Tue, Jul 28, 2015 at 8:09 PM Raul Miller <[email protected]> wrote:
> I have not been using JD, so I will leave those issues for someone
> with some relevant expertise.
>
> However, here's how I might implement percentiles in J:
>
> plabels=: 0.01 0.05 0.25 0.50 0.75 0.95 0.99
> percentiles=: ({~ plabels >.@* <:@#)@/:~
>
> Example use:
> percentiles ?#~1e6
> 9969 50074 250161 500063 750329 950518 989874
>
> Though of course if you were already working with sorted data, you
> could eliminate the @/:~
>
> Thanks,
>
> --
> Raul
>
> On Tue, Jul 28, 2015 at 12:57 PM, Yoel Jacobsen <[email protected]>
> wrote:
> > Hello,
> >
> > Long list of questions, apology in advance...
> >
> > I'm playing with Jd by trying to use it as an analytical time series
> > database and have a few questions.
> >
> > Setup:
> > - Tests are running on my laptop (1.7GHz multi core CPU, 32GB RAM)
> > - O/S is 64bit Linux. J version is 8.04
> > - Single table created as follows:
> >
> > jd'createtable calls time datetime, num byte 10, provider byte 10, a
> float,
> > b float, c float'
> >
> >
> > The 'num' field is defined as byte 10 as it describes a phone number. I
> can
> > optimize using 'int' and clean the phone number in advance but its of
> less
> > importance at the moment. I may convert time to epochdt but datetime was
> > easier to play with 6!:2..
> >
> >
> > - I loaded 700M records into the table. Size on disk:
> >
> >
> > $ du -sh *
> > 5.3G a
> > 5.3G b
> > 5.3G c
> > 4.0K column_create_order.txt
> > 668M jdactive
> > 4.0K jdclass
> > 8.0K jdindex
> > 4.0K jdstate
> > 6.6G num
> > 6.6G provider
> > 5.3G time
> >
> >
> > Now, my questions:
> >
> >
> > 1. Some queries are running out of memory. Such as:
> >
> > jd'reads min time,max time from calls'
> >
> > |Jd error: out of memory: Where: indices=:
> > /:~@:~.&.|:>,.&.>/andqueries&.>toSoP fixwhere_jdtable_ y: op:reads
> db:calls
> > user:u : jd_jd_
> >
> > | 13!:8&3 t
> >
> > Why is that? Should Jd load (map) only the 5.3GB of the 'time' field?
> >
> > 2. The following search runs out of memory:
> >
> > jd'reads count a from calls'
> >
> >
> > while this one don't:
> >
> > jd'reads count a from calls where num="012-456789'
> >
> >
> > Why is that?
> >
> >
> > 3. Is there any "server" part above running Jd in a JHS session?
> >
> > 4. Is there any way to partition the data automatically (for instance,
> have
> > a difference folder per day)?
> >
> > 5. Is there a simple way to search between dates with the built in time
> > types (epochdt, datetime)? How about queries like - between 2 and 5 hours
> > ago? 3 days ago?
> >
> > 6. Is there a "reference" architecture for having a real time DB (in RAM)
> > and historical DB with scheduled data movements? Suppose the above size
> > reflects a week of data (35GB) and I want searchable historical db for a
> > year (35GB x 52 weeks = 1.82TB). I guess some sort of partitioning is
> > mandatory to avoid out of memory errors when searching the historical
> DB..
> >
> > 7. Can one define an aggregation verb in J (my first would be
> percentile)?
> >
> > Thanks,
> > Yoel
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm