Re: [Rd] Confusion regarding allocating Matrices.

Simon Urbanek Sat, 24 Oct 2009 14:26:20 -0700


On Oct 24, 2009, at 2:58 PM, Abhijit Bera wrote:

Ok I get it. So everytime it does a alloc and copy.
I haven't finished the design yet. I'm just thinking about howrandomly the data might arrive; its real time data. So I willallocate a large chunk of memory and keep track of when it fills up,once the data exceeds I will alloc and copy the data (provided thesize is within system limits). In this manner I should be able toreduce the number of expensive operations of allocing and copying.

Many smart people have thought about those things before you, it'sworthwhile to read about it --- I would suggest reading a bit aboutdata structures and programming in C. What you describe is usuallytackled by a allocating additional (usually linked) buffers as you gosince that means you don't have to copy anything (except for that laststep where you create the R object). It's also very trivial toimplement.


Cheers,
Simon

Abhijit Bera
On Sat, Oct 24, 2009 at 10:06 PM, Douglas Bates<ba...@stat.wisc.edu> wrote:
On Fri, Oct 23, 2009 at 2:02 PM, Abhijit Bera <abhib...@gmail.com>wrote:
Sorry, I made a mistake while writing the code. The declaration ofData
should have been first.
I still have some doubts:
Because you are making some sweeping and incorrect assumptions about
the way that the internals of R operate.  R allows for arrays to be
dynamically resized but this is accomplished internally by allocating
new storage, copying the current contents to this new location and
installing the values of the new elements.  It is an expensive
operation, which is why it is discouraged.

Your design is deeply flawed.  Go back to the drawing board.
When you say calloc and realloc are you talking about R's Cinterface
Calloc
and Realloc or the regular calloc and realloc?
Either one.
I want to feed data directly into a R matrix and grow it asrequired. So
at
one time I might have 100 rows coming in from a data source. Thenext
time I
might have 200 rows coming in from a data source. I want to beable toexpand the R-matrix instead of creating a regular C float matrixand thenmake an R-matrix based on the new size. I just want to have one Robject
and
be able to expand it's size dynamically.
R stores floating-point numbers as the C data type double, not float.
It may seem pedantic to point out distinctions like that but not when
you are writing programs.  Compilers are the ultimate pedants - they
are real sticklers for getting the details right.

As I said, it just doesn't work the way that you think it does.  The
fact that there is an R object with a certain name before and afteran
operation doesn't mean it is the same R object.
I was reading the language specs. It says that one could declare an
object
in R like this:

m=matrix(nrows=10,ncols=10)

and then one could assign

m[101]=1.00

to expand the object.

but this has one problem when I do a

dim(m)

I get

NULL instead of 10 10

So what is happening here?


I am aware that R matrices are stored in column major order.

Thanks for the tip on using float *dat= REAL(Data);

Regards

Abhijit Bera



On Fri, Oct 23, 2009 at 7:27 PM, Douglas Bates <ba...@stat.wisc.edu>
wrote:
On Fri, Oct 23, 2009 at 9:23 AM, Douglas Bates<ba...@stat.wisc.edu>
wrote:
On Fri, Oct 23, 2009 at 8:39 AM, Abhijit Bera <abhib...@gmail.com>
wrote:
Hi

I'm having slight confusion.
Indeed.
I plan to grow/realloc a matrix depending on the data availablein a
C
program.
Here is what I'm tried to do:
Data=allocMatrix(REALSXP,3,4);
SEXP Data;
Those lines should be in the other order, shouldn't they?

Also, you need to PROTECT Data or bad things will happen.
REAL(Data)[8]=0.001123;
REAL(Data)[200000]=0.001125;
printf("%f %f\n\n\n\n",REAL(Data)[8],REAL(Data)[200000]);
And I forgot to mention, it is not a good idea to write REAL(Data)
many times like this.  REAL is a function, not a macro and you are
calling the same function over and over again unnecessarily.  It is
better to write

double *dat = REAL(Data);

and use the dat pointer instead of REAL(Data).
Here is my confusion:
Do I always require to allocate the exact number of dataelements in
a
R
Matrix?
Yes.
In the above code segment I have clearly exceeded the number of
elements that have been allocated but my program doesn't crash.
Remember that when programming in C you have a lot of rope withwhichto hang yourself. You have corrupted a memory location beyondthat
allocated to the array but nothing bad has happened  - yet.
I don't find any specific R functions for reallocation incasemy data
set
grows. How do I reallocate?
You allocate a new matrix, copy the contents of the currentmatrix tothe new matrix, then release the old one. It gets tricky inthat youshould unprotect the old one and protect the new one but youneed to
watch the order of those operations.
This approach is not a very good one. If you really need togrow anarray it is better to allocate and reallocate the memory withinyour Ccode using calloc and realloc then, at the end of thecalculations,
allocate an R matrix and copy the results over.
Also, you haven't said whether you are growing the matrix by rowor bycolumn or both. If you are adding rows then you can't justreallocatestorage because R stores matrices in column-major order. Thepositionsof the elements in a matrix with n+1 rows are different fromthose in
a matrix with n rows.
Is it necessary to reallocate or is R handling
the memory management for the matrix that I have allocated?

Regards

Abhijit Bera

      [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Confusion regarding allocating Matrices.

Reply via email to