Here's a dyadic version:
plabels=: 0.01 0.05 0.25 0.50 0.75 0.95 0.99
percentiles=: (] {~ [ >.@* <:@#@]) /:~
Example use:
plabels percentiles ?#~1e6
9958 50177 249582 499524 749653 949908 990025
Hopefully you understand hook and fork mechanics, to see what I did there?
That said, Bo Jacoby has a point: if you are running out of memory,
building a sorted copy of a column may not be the right approach for
you. (Though you might also consider getting a machine with more
memory. I think can do 64GB ram without too much trouble, and I've
seen motherboards which declare that if you install 128GB in them that
you only get the benefit of 127GB ... though I've not actually tried
to make something like that work.)
Thanks,
--
Raul
On Wed, Jul 29, 2015 at 4:42 AM, Yoel Jacobsen <[email protected]> wrote:
> Raul,
>
> I'm forking the discussion to the general programming forum.
>
> I rather provide the percentile list as an argument. Explicitly, I can:
>
> percentiles =: dyad : '(x >.@* <:@# y){/:~y'
>
>
> So:
>
> (0.01 0.05 0.25 0.50 0.75 0.95 0.99) percentiles ?#~1e6
>
> 9910 49979 249913 499397 749931 950080 990111
>
>
> But I wondered how to do it tacitly?
>
>
> I would need a fork of with both the dyadic (>.@* <:@#) and the monadic
> (/:~) parts. Can I combine them in a single tacit form?
>
>
> Yoel
>
>
>
>
>
>
> On Tue, Jul 28, 2015 at 8:09 PM Raul Miller <[email protected]> wrote:
>
>> I have not been using JD, so I will leave those issues for someone
>> with some relevant expertise.
>>
>> However, here's how I might implement percentiles in J:
>>
>> plabels=: 0.01 0.05 0.25 0.50 0.75 0.95 0.99
>> percentiles=: ({~ plabels >.@* <:@#)@/:~
>>
>> Example use:
>> percentiles ?#~1e6
>> 9969 50074 250161 500063 750329 950518 989874
>>
>> Though of course if you were already working with sorted data, you
>> could eliminate the @/:~
>>
>> Thanks,
>>
>> --
>> Raul
>>
>> On Tue, Jul 28, 2015 at 12:57 PM, Yoel Jacobsen <[email protected]>
>> wrote:
>> > Hello,
>> >
>> > Long list of questions, apology in advance...
>> >
>> > I'm playing with Jd by trying to use it as an analytical time series
>> > database and have a few questions.
>> >
>> > Setup:
>> > - Tests are running on my laptop (1.7GHz multi core CPU, 32GB RAM)
>> > - O/S is 64bit Linux. J version is 8.04
>> > - Single table created as follows:
>> >
>> > jd'createtable calls time datetime, num byte 10, provider byte 10, a
>> float,
>> > b float, c float'
>> >
>> >
>> > The 'num' field is defined as byte 10 as it describes a phone number. I
>> can
>> > optimize using 'int' and clean the phone number in advance but its of
>> less
>> > importance at the moment. I may convert time to epochdt but datetime was
>> > easier to play with 6!:2..
>> >
>> >
>> > - I loaded 700M records into the table. Size on disk:
>> >
>> >
>> > $ du -sh *
>> > 5.3G a
>> > 5.3G b
>> > 5.3G c
>> > 4.0K column_create_order.txt
>> > 668M jdactive
>> > 4.0K jdclass
>> > 8.0K jdindex
>> > 4.0K jdstate
>> > 6.6G num
>> > 6.6G provider
>> > 5.3G time
>> >
>> >
>> > Now, my questions:
>> >
>> >
>> > 1. Some queries are running out of memory. Such as:
>> >
>> > jd'reads min time,max time from calls'
>> >
>> > |Jd error: out of memory: Where: indices=:
>> > /:~@:~.&.|:>,.&.>/andqueries&.>toSoP fixwhere_jdtable_ y: op:reads
>> db:calls
>> > user:u : jd_jd_
>> >
>> > | 13!:8&3 t
>> >
>> > Why is that? Should Jd load (map) only the 5.3GB of the 'time' field?
>> >
>> > 2. The following search runs out of memory:
>> >
>> > jd'reads count a from calls'
>> >
>> >
>> > while this one don't:
>> >
>> > jd'reads count a from calls where num="012-456789'
>> >
>> >
>> > Why is that?
>> >
>> >
>> > 3. Is there any "server" part above running Jd in a JHS session?
>> >
>> > 4. Is there any way to partition the data automatically (for instance,
>> have
>> > a difference folder per day)?
>> >
>> > 5. Is there a simple way to search between dates with the built in time
>> > types (epochdt, datetime)? How about queries like - between 2 and 5 hours
>> > ago? 3 days ago?
>> >
>> > 6. Is there a "reference" architecture for having a real time DB (in RAM)
>> > and historical DB with scheduled data movements? Suppose the above size
>> > reflects a week of data (35GB) and I want searchable historical db for a
>> > year (35GB x 52 weeks = 1.82TB). I guess some sort of partitioning is
>> > mandatory to avoid out of memory errors when searching the historical
>> DB..
>> >
>> > 7. Can one define an aggregation verb in J (my first would be
>> percentile)?
>> >
>> > Thanks,
>> > Yoel
>> > ----------------------------------------------------------------------
>> > For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm