Re: [Matplotlib-users] Mlab - Rec_Summarize / Rec_GroupBy

2011-07-01 Thread John Hunter
On Fri, Jul 1, 2011 at 11:14 AM, Hackett, John (Norcross, GA)
 wrote:
> After some experimentation (and judicious peeking at the source code), I
> think I’ve got the hang of writing custom functions to pass into these
> modules – basically, anything that accepts a list of values sliced from a
> single column on the structured array and returns a single list seems to
> work well. In functional programming terms, rec_summarize appears similar to
> “map”, rec_groupby appears similar to “reduce”.
>
>
>
> Now – what if I want to derive a calculation from multiple statistics in the
> original dataset – eg. create a new column on the array which is derived
> from 2 (or up to n) other fields in a custom function which I pass into the
> process?
>
>
>
> For example, conditional counts/summaries (count transactions and sum the
> sales on all orders that weighed > 5K lbs).
>
>
>
> Is there a way to do this within numpy or mlab without going all the way out
> to python and creating a list comprehension?

There are a couple of ways with the existing functions.

One is to use a logical mask::

   mask = r.weight>5
   rg = mlab.rec_groupby(r[mask], groupby, stats)

You could also create a new categorical variable with one or more
values and attach it to your record array and then use rec_groupby::

  heavy = np.where(r.weight>5, 1, 0)

and add that to your record array

  r = mlab.rec_append_fields(r, ['heavy'], [heavy])

and then do a rec_group_by using 'heavy' as your group by attribute.

Brian Schwartz has a preliminary implementation of rec_query which
allows you to make a SQL query on a record array by converting it to a
sqllite table, running the sql query, and returning the results as a
new record array, which would solve your problem more cleanly and
generically.  The code needs a little more polishing, but perhaps
Brian you can send over what you have in case John wants to take a
look.

JDH

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users


[Matplotlib-users] Mlab - Rec_Summarize / Rec_GroupBy

2011-07-01 Thread Hackett, John (Norcross, GA)
Good morning -

 

Got a question for a mlab module guru.

 

After some experimentation (and judicious peeking at the source code), I
think I've got the hang of writing custom functions to pass into these
modules - basically, anything that accepts a list of values sliced from
a single column on the structured array and returns a single list seems
to work well. In functional programming terms, rec_summarize appears
similar to "map", rec_groupby appears similar to "reduce".

 

Now - what if I want to derive a calculation from multiple statistics in
the original dataset - eg. create a new column on the array which is
derived from 2 (or up to n) other fields in a custom function which I
pass into the process? 

 

For example, conditional counts/summaries (count transactions and sum
the sales on all orders that weighed > 5K lbs).

 

Is there a way to do this within numpy or mlab without going all the way
out to python and creating a list comprehension?

 

Thanks.

 

John

 

 

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2___
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users