Re: [R] adding additional information to histogram

2012-01-27 Thread Jim Lemon

On 01/27/2012 03:12 AM, Raphael Bauduin wrote:

Hi,

I am a beginner with R, and I think the answer to my question will
seem obvious, but after searching and trying without success I've
decided to post to the list.

I am working with data loaded from a csv filewith these fields:
   order_id, item_value
As an order can have multiple items, an order_id may be present
multiple times in the CSV.

I managed to compute the total value  and the number of items for each order:

   oli- read.csv(/tmp/order_line_items_data.csv, header=TRUE)
   orders_values- tapply(oli[[2]], oli[[1]], sum)
   items_per_order- tapply(oli[[2]], oli[[1]], length)

I then can display the histogram of the order values:

   hist(orders_values, breaks=c(10*0:20,800), xlim=c(0,200), prob=TRUE)

Now on this histogram, I would like to display the average number of
items of the orders in each group (defined with the breaks).
So for the bar of orders with value 0 to 10, I'd like to display the
average number of items of these orders.


Hi Raph,
As this looks a tiny bit like homework, I'll only provide suggestions. 
You have the value and number of items for each order. What you need to 
do is to match them in groups. In order to do that, you want a factor 
that will show the group for each value-items pair. The cut function 
will give you such a factor, using the breaks above. You seem to 
understand the *apply functions, so you can use one of these to return 
the mean number of items for each value group. Alternatively, you could 
use the factor in the by function to get the mean number of items.


You should now have a factor that can be sent to table to get the 
number of orders in each value range, and a vector of the corresponding 
mean numbers of items in each value grouping. Why you could even use the 
same trick to calculate the mean price of the orders in each value 
grouping...


I would use barplot to display all this information, as it is a bit 
easier to place the mean number on items on the bars (if you check the 
return value for barplot).


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding additional information to histogram

2012-01-27 Thread Raphael Bauduin
On Fri, Jan 27, 2012 at 9:51 AM, Jim Lemon j...@bitwrit.com.au wrote:
 On 01/27/2012 03:12 AM, Raphael Bauduin wrote:

 Hi,

 I am a beginner with R, and I think the answer to my question will
 seem obvious, but after searching and trying without success I've
 decided to post to the list.

 I am working with data loaded from a csv filewith these fields:
   order_id, item_value
 As an order can have multiple items, an order_id may be present
 multiple times in the CSV.

 I managed to compute the total value  and the number of items for each
 order:

   oli- read.csv(/tmp/order_line_items_data.csv, header=TRUE)
   orders_values- tapply(oli[[2]], oli[[1]], sum)
   items_per_order- tapply(oli[[2]], oli[[1]], length)

 I then can display the histogram of the order values:

   hist(orders_values, breaks=c(10*0:20,800), xlim=c(0,200), prob=TRUE)

 Now on this histogram, I would like to display the average number of
 items of the orders in each group (defined with the breaks).
 So for the bar of orders with value 0 to 10, I'd like to display the
 average number of items of these orders.

 Hi Raph,
 As this looks a tiny bit like homework, I'll only provide suggestions. You

This is absolutely not a homework :-)
I'm learning R to try to get some info out of data of a e-commerce website.


 have the value and number of items for each order. What you need to do is to
 match them in groups. In order to do that, you want a factor that will show
 the group for each value-items pair. The cut function will give you such a
 factor, using the breaks above. You seem to understand the *apply functions,
 so you can use one of these to return the mean number of items for each
 value group. Alternatively, you could use the factor in the by function to
 get the mean number of items.

 You should now have a factor that can be sent to table to get the number
 of orders in each value range, and a vector of the corresponding mean
 numbers of items in each value grouping. Why you could even use the same
 trick to calculate the mean price of the orders in each value grouping...

 I would use barplot to display all this information, as it is a bit easier
 to place the mean number on items on the bars (if you check the return value
 for barplot).


Your suggestions helped me get the info I wanted. I still need to
finetune it as I currently generate 2 barplots.
Here's what I've done, in case it can help someone in the future:

#assigns to each entry of orders_values, the range to which is belongs
according to the breaks passed in second arg.
order_value_range-cut(orders_values, c(10*0:20, 800))
#count number of orders in each range:
orders_number_per_range=tapply(orders_values, order_value_range, length)
#equivalent to table(test_o)

average_number_of_item_per_order_in_range - tapply(items_per_order,
order_value_range, mean)

barplot(average_number_of_item_per_order_in_range, ylab=Items
number, xlab=Order value)
barplot(orders_number_per_range, ylab=Items number, xlab=Order value)


The next step: combine the two barplots in one.

Thanks already for your help!


Raph

 Jim




-- 
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding additional information to histogram

2012-01-27 Thread Jim Lemon

On 01/27/2012 10:07 PM, Raphael Bauduin wrote:

On Fri, Jan 27, 2012 at 9:51 AM, Jim Lemonj...@bitwrit.com.au  wrote:

On 01/27/2012 03:12 AM, Raphael Bauduin wrote:


Hi,

I am a beginner with R, and I think the answer to my question will
seem obvious, but after searching and trying without success I've
decided to post to the list.

I am working with data loaded from a csv filewith these fields:
   order_id, item_value
As an order can have multiple items, an order_id may be present
multiple times in the CSV.

I managed to compute the total value  and the number of items for each
order:

   oli- read.csv(/tmp/order_line_items_data.csv, header=TRUE)
   orders_values- tapply(oli[[2]], oli[[1]], sum)
   items_per_order- tapply(oli[[2]], oli[[1]], length)

I then can display the histogram of the order values:

   hist(orders_values, breaks=c(10*0:20,800), xlim=c(0,200), prob=TRUE)

Now on this histogram, I would like to display the average number of
items of the orders in each group (defined with the breaks).
So for the bar of orders with value 0 to 10, I'd like to display the
average number of items of these orders.


Hi Raph,
As this looks a tiny bit like homework, I'll only provide suggestions. You


This is absolutely not a homework :-)
I'm learning R to try to get some info out of data of a e-commerce website.



have the value and number of items for each order. What you need to do is to
match them in groups. In order to do that, you want a factor that will show
the group for each value-items pair. The cut function will give you such a
factor, using the breaks above. You seem to understand the *apply functions,
so you can use one of these to return the mean number of items for each
value group. Alternatively, you could use the factor in the by function to
get the mean number of items.

You should now have a factor that can be sent to table to get the number
of orders in each value range, and a vector of the corresponding mean
numbers of items in each value grouping. Why you could even use the same
trick to calculate the mean price of the orders in each value grouping...

I would use barplot to display all this information, as it is a bit easier
to place the mean number on items on the bars (if you check the return value
for barplot).



Your suggestions helped me get the info I wanted. I still need to
finetune it as I currently generate 2 barplots.
Here's what I've done, in case it can help someone in the future:

#assigns to each entry of orders_values, the range to which is belongs
according to the breaks passed in second arg.
order_value_range-cut(orders_values, c(10*0:20, 800))
#count number of orders in each range:
orders_number_per_range=tapply(orders_values, order_value_range, length)
#equivalent to table(test_o)

average_number_of_item_per_order_in_range- tapply(items_per_order,
order_value_range, mean)

barplot(average_number_of_item_per_order_in_range, ylab=Items
number, xlab=Order value)
barplot(orders_number_per_range, ylab=Items number, xlab=Order value)


The next step: combine the two barplots in one.

Thanks already for your help!


Hi Raph,
Okay, what you want to do is to draw one barplot, then use the text 
function (or boxed.labels in plotrix) to put the values of items per 
order over or (better for not distorting the height relationship) on the 
bars. In the barplot function, you can get the x positions of the bars 
from the return value, and of course, you know the heights of the bars...


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] adding additional information to histogram

2012-01-26 Thread Raphael Bauduin
Hi,

I am a beginner with R, and I think the answer to my question will
seem obvious, but after searching and trying without success I've
decided to post to the list.

I am working with data loaded from a csv filewith these fields:
  order_id, item_value
As an order can have multiple items, an order_id may be present
multiple times in the CSV.

I managed to compute the total value  and the number of items for each order:

  oli - read.csv(/tmp/order_line_items_data.csv, header=TRUE)
  orders_values - tapply(oli[[2]], oli[[1]], sum)
  items_per_order - tapply(oli[[2]], oli[[1]], length)

I then can display the histogram of the order values:

  hist(orders_values, breaks=c(10*0:20,800), xlim=c(0,200), prob=TRUE)

Now on this histogram, I would like to display the average number of
items of the orders in each group (defined with the breaks).
So for the bar of orders with value 0 to 10, I'd like to display the
average number of items of these orders.

Thanks in advance

Raph

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.