Hi Ray,

Thanks a lot for your help. I tried to rename the file etc as you suggested 
below. You are right that the file has a header, its structure is as below:
@relation some string

@attribute ID numeric
@attribute Class {negative,positive}
@attribute Att1 numeric
...
[I have 1140 of those)

followed by a section @data and then the data rows.

It turns out that my colleague had copied the file on his machine and removed 
the headers manually, which was not done in our common SVN version which is the 
one I was trying to read. Removing the header solved the issue. 

RWeka looks very interesting, thanks for the pointer.

Cheers,
Melanie




On 2012-10-26, at 11:53 AM, Ray DiGiacomo, Jr. wrote:

> Hello Melanie,
> 
> Also make sure that you and your colleague have similar outputs for these two 
> commands:
> 
> > search()
> > ls() # L as in Larry, S as in Sam
> 
> "search()" shows the packages you have loaded into R (which take up RAM).  
> "ls()" shows a list of your loaded R objects (which take up RAM).
> 
> - Ray
> 
> 
> 
> 
> 
> On Fri, Oct 26, 2012 at 11:17 AM, Ray DiGiacomo, Jr. 
> <[email protected]> wrote:
> Hello Melanie,
> 
> I'm not too familiar with ARFF but I believe it has some headers (and 
> possibly footers) that may need to be removed before one can call the 
> read.csv function.  I am assuming you and your colleague both manually 
> removed the ARFF headers/footers before calling the read.csv function. 
> 
> You may also want to try changing the read.csv function call to:
> 
> frame1 <- read.csv("test.csv", header = FALSE)
> 
> You will have to manually change your filename to test.csv first.  Also 
> notice that the "sep" argument is not needed as it defaults to a "comma".  I 
> would also use the word "frame" instead of "mat" as the data will not be a 
> matrix after you call the read.csv function - it will be a frame.  You can 
> turn your frame into a matrix using other R commands if you like.  See this 
> page:
> 
> http://stackoverflow.com/questions/5158790/data-frame-or-matrix
> 
> Also, there are R packages called "foreign" and "RWeka" which both have 
> read.arff functions inside of them.  You may want to give these a try.  
> 
> You can learn about them here:
> 
> See Paper Page 3 (Digital Page 2)
> http://cran.r-project.org/web/packages/foreign/foreign.pdf
> 
> See Paper Page 6 (Digital Page 3)
> http://cran.r-project.org/web/packages/RWeka/RWeka.pdf
> 
> - Ray
> 
> 
> 
> 
> 
> 
> On Fri, Oct 26, 2012 at 10:18 AM, Melanie Courtot <[email protected]> wrote:
> Hi Ray and Simon, all,
> 
> Thanks for the help. My laptop has 8GB of RAM (my colleague has 12 on his 
> desktop). I ssh'ed into his machine and the whole file loads in not even 2 
> seconds.
> The file is read with mat<-read.csv('test.arff',header=FALSE,sep=',') The 
> arff file is what I use with Weka, which is basically a comma delimited file. 
> It contains around 7.5M datapoints (6200 rows, 1140 columns)
> 
> It seems that with 8GB I should be quite ok?
> 
> Based on your suggestions I tried with a part of the file only, which does 
> work fine, so it seems that it is indeed a memory problem. Any idea as to why?
> 
> Thanks,
> Melanie
> 
> 
> 
> Example record (I have 6200 of those)
> 856243,negative,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
> 
> 
> 
> On 2012-10-25, at 6:39 PM, Ray DiGiacomo, Jr. wrote:
> 
> > Hi Simon,
> >
> > I took the spec from this Revo SlideShare.  The spec is based on a 
> > regression.
> >
> > http://www.revolutionanalytics.com/news-events/free-webinars/2011/intro-to-r-for-sas-spss/
> >
> > Click the right arrow until you get to slide 3 of 14.  Then, look at the 
> > slide in the lower-right hand corner (slide 12).
> >
> > - Ray
> >
> >
> >
> >
> >
> > On Thu, Oct 25, 2012 at 6:26 PM, Simon Urbanek 
> > <[email protected]> wrote:
> >
> > On Oct 25, 2012, at 7:42 PM, Ray DiGiacomo, Jr. wrote:
> >
> > > Hello Melanie,
> > >
> > > How much RAM is installed on your MacBook Pro compared to your colleague's
> > > Linux machine?
> > >
> > > How big is your dataset in terms of rows and columns?
> > >
> > > I believe R can handle about 10M datapoints per GB of RAM.
> > >
> >
> > What exactly is that an estimate of? In R, 1GB of RAM will store ~134Mio 
> > datapoints when using numeric matrices/vectors and twice as many as 
> > integers or logicals. In practice, you will still need some room for 
> > computation on the data, though.
> >
> > Cheers,
> > Simon
> >
> >
> > > Note that datapoints = rows x columns
> > >
> > > Best Regards,
> > >
> > > Ray DiGiacomo, Jr.
> > > Master R Trainer
> > > President, Lion Data Systems LLC
> > > President, The Orange County R User Group
> > > Board Member, TDWI
> > > [email protected]
> > > (Mobile) 408-425-7851
> > > San Juan Capistrano, California
> > >
> > > Check out my one-on-one web-based R courses at liondatasystems.com/courses
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Oct 25, 2012 at 4:16 PM, Melanie Courtot <[email protected]> 
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> I am trying to run R on my MacBook Pro 2.4 GHz Intel core i5. I am trying
> > >> to read a csv file, which works fine on my work colleague's machine 
> > >> (under
> > >> linux) but causes my CPU to go up to 100% and makes the GUI unresponsive
> > >> and hangs on the command line. Activity monitor indicates there is only 
> > >> one
> > >> R thread running.
> > >>
> > >> I did see that by default R was using the BLAS library, which is
> > >> single-threaded, and that there was an option to use vecLib instead. I 
> > >> did
> > >> this, and
> > >> ls -l /Library/Frameworks/R.framework/Resources/lib/libRblas.dylib
> > >> does return
> > >> /Library/Frameworks/R.framework/Resources/lib/libRblas.dylib ->
> > >> libRblas.vecLib.dylib
> > >>
> > >> I however still see the same behavior: 100% CPU, single thread.
> > >>
> > >> I saw that some MacBook pro (Xeon Nehalem based) had a vecLib bug, so I
> > >> built the ATLAS library and symlinked R to libtatlas.dylib (unfortunately
> > >> the pre compiled binairies pointed to in a previous email on the list [1]
> > >> were not available anymore. Building ATLAS was... fun ;)) I was able to 
> > >> get
> > >> the shared libraries (using --shared in my config) but still see the same
> > >> behavior when trying to run my code. I was unsure if I should link to
> > >> libsatlas.dylib or libtatlas.dylib, so tried both (I guess the latter was
> > >> the right one though)
> > >>
> > >> I tried building R from the source (specifying -arch x86_64 and
> > >> --enable-BLAS-shlib to be able to switch libraries), but same behavior 
> > >> and
> > >> it seems it is an identical version to the prepackaged one (I tried with
> > >> BLAS, vecLib and ATLAS)
> > >>
> > >> R info: R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows", 
> > >> Platform:
> > >> x86_64-apple-darwin9.8.0/x86_64 (64-bit)
> > >>
> > >> Any help would be greatly appreciated.
> > >>
> > >> Thanks,
> > >> Melanie
> > >>
> > >>
> > >> [1] https://stat.ethz.ch/pipermail/r-sig-mac/2010-October/007817.html
> > >>
> > >> ---
> > >> Mélanie Courtot
> > >> MSFHR/PCIRN Ph.D. Candidate,
> > >> BCCRC - Terry Fox Laboratory - 12th floor
> > >> 675 West 10th Avenue
> > >> Vancouver, BC
> > >> V5Z 1L3, Canada
> > >>
> > >> _______________________________________________
> > >> R-SIG-Mac mailing list
> > >> [email protected]
> > >> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
> > >>
> > >
> > >       [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > R-SIG-Mac mailing list
> > > [email protected]
> > > https://stat.ethz.ch/mailman/listinfo/r-sig-mac
> >
> >
> 
> 
> 

_______________________________________________
R-SIG-Mac mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-mac

Reply via email to