Re: [R] Problem with data distribution

John Fox Thu, 17 Feb 2022 11:56:10 -0800

Dear Nega gupta,

In the last point, I meant to say, "Finally, it's better to post to thelist in plain-text email, rather than html (as the posting guidesuggests)." (I accidentally inserted a "not" in this sentence.)


Sorry,
 John

On 2022-02-17 2:21 p.m., John Fox wrote:

Dear Nega gupta,

On 2022-02-17 1:54 p.m., Neha gupta wrote:
Hello everyone
I have a dataset with output variable "bug" having the followingvalues (atthe bottom of this email). My advisor asked me to provide datadistribution
of bugs with 0 values and bugs with more than 0 values.

data = readARFF("synapse.arff")
data2 = readARFF("synapse.arff")
data$bug
library(tidyverse)
data %>%
   filter(bug == 0)
data2 %>%
   filter(bug >= 1)
boxplot(data2$bug, data$bug, range=0)

But both the graphs are exactly the same, how is it possible? Where I am
doing wrong?
As it turns out, you're doing several things wrong.
First, you're not using pipes and filter() correctly. That is, you don'tdo anything with the filtered versions of the data sets. You'reapparently under the incorrect impression that filtering modifies theoriginal data set.
Second, you're greatly complicating a simple problem. You don't need toread the data twice and keep two versions of the data set. As well,processing the data with pipes and filter() is entirely unnecessary. Thefollowing code works:
    with(data, boxplot(bug[bug == 0], bug[bug >= 1], range=0))
Third, and most fundamentally, the parallel boxplots you're apparentlytrying to construct don't really make sense. The first "boxplot" is justa horizontal line at 0 and so conveys no information. Why not just plotthe nonzero values if that's what you're interested in?
Fourth, you didn't share your data in a convenient form. I was able toreconstruct them via
   bug <- scan()
   0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 0 0
   0 4 1 0
   0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 0 0
   0 0 0 0
   1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 0 0
   7 0 0 1
   0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 0 0
   0 1 0 0
   0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 1 0
   0 0 0 1
   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0

   data <- data.frame(bug)
Finally, it's better not to post to the list in plain-text email, ratherthan html (as the posting guide suggests).
I hope this helps,
  John
data$bug
[1] 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 00 0 0
0 4 1 0
[40] 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 11 0 0
0 0 0 0
[79] 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 00 0 0
7 0 0 1
[118] 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 00 0 0
0 1 0 0
[157] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 41 1 0
0 0 0 1
[196] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0

    [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with data distribution

Reply via email to