Re: [R] compressing/reducing data for plot

Carlos Ortega Mon, 17 Oct 2011 14:49:29 -0700

Hello,

One alternative is to create some kind of contourplot/levelplot which will
enhance and show clearly the spikes and will smooth the rest of the
information. That perhaps will be slow in generating the plot but not with
the postscript file.


Regards,
Carlos Ortega
www.qualityexcellence.es

2011/10/17 Timo Schneider <ti...@perlplexity.org>

> Hello,
>
> I have simulation results in the form of
>
>  Time                  V                   I
>  0.000000000000e+000  7.218354344368e-001  5.224478627497e-006
>  1.000000000000e-009  7.218354344368e-001  5.224477718002e-006
>  2.000000000000e-009  7.218354344368e-001  5.224477718002e-006
>  4.000108361244e-009  7.218354344368e-001  5.224478627497e-006
>  8.000325083733e-009  7.218354344368e-001  5.224478627497e-006
>
> as the timesteps are small, each simulation results in a lot of data,
> about 1e5 data points per simulation.
>
> Now I want to plot this data. If I do this with a simple
>
> plot(x=data$Time, y=data$V, type="l")
>
> the resulting file (I plot into postscript files) is huge and takes a
> long time to render, since R creates a new line segment for each
> timestep. Of course it makes no sense to plot more than a few hundred
> datapoints in a single plot. However, I don't have a good idea how to
> remove the "uninteresting" part of the data, i.e., the datapoints that
> lie very close to the lines that would be drawn by R anyway if there
> were no datapoint for that time value.
>
> Since the values in my simulation are constant most of the time but
> sometimes have interesting "spikes" a simple
>
> data <- data[seq(1:length(data),1000),]
>
> to only plot every 1000th point does not work for me as it could remove
> some "spikes" completely or lead to aliasing problems.
>
> Is there any standard way to do this in R?
>
>
> The best thing I came up with so far is a function that judges if a row
> in the dataframe should be kept for plotting based on each points
> difference to its predecessor. However, this function has two problems:
>
> * It is very slow! (Takes about 4 seconds for each 1e5 element
> dataframe)
>
> * It does not work well if the values increase/decrease monotonically
> with small values - it will remove them all since the difference between
> each point and its predecessor is minimal
>
> I included my own function below:
>
> === cut ===
>
> get_significant_rows_1 <- function (data, threshold) {
>
> # get the difference between each datapoint and the following datapoint
> # of course this list is one shorter than the input dataset, which does
> # not matter since the first and last datapoint will always be included
> diffs = abs(data[1:nrow(data)-1,] - data[2:nrow(data),]);
>
> # normalize the differences according to the value range in their column
> col.range = apply(data,2, function(d) {abs(max(d) - min(d))});
> normalized_diffs <- t(apply(diffs, 1, function(d) {d/col.range}));
> rm("col.range");
> # get the "biggest difference" in each row
> biggest_difference <- as.vector(apply(normalized_diffs,1, max));
>
> # check if the "biggest difference" is above the threshold -
> # that means the row is "significant" in a plot
> signif <- biggest_difference >= threshold;
> rm("biggest_difference");
> # the last datapoint/row is always significant, otherwise the plot could
> become "shorter"
> signif[length(signif)] = TRUE;
>
> # also the first one - we are adding a TRUE in front of the signif vector
> # now, since it does not include a value for this because the first value
> # naturally doesn't have a predecessor, so there was no entry for it in
> # the diffs array
> signif <- append(signif, TRUE, 0);
>
> # if a point is significant in a plot, the point before that is also
> "important",
> # at least for line plots, otherwise we get angled lines where flat ones
> should be
> signif <- (signif | append(signif[2:length(signif)], FALSE));
>
> return(data[signif,]);
>
> }
>
> #example application (makes no sense for this kind of data though)
>
> data <- data.frame(a=rnorm(10000), b=rnorm(10000));
> #                    dataset, threshold
> get_significant_rows_1(data, 0.01)
>
> ==== here ====
>
> Thank you for any helpful advice or comments. :-)
>
> Regards,
> Timo
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] compressing/reducing data for plot

Reply via email to