[R] Cannot allocate vector size of... ?
Hello all, I've been working with R Fridolin Wild's lsa package a bit over the past few months, but I'm still pretty much a novice. I have a lot of files that I want to use to create a semantic space. When I begin to run the initial textmatrix( ), it runs for about 3-4 hours and eventually gives me an error. It's always ERROR: cannot allocate vector size of xxx Kb. I imagine this might be my computer running out of memory, but I'm sure. So I thought I would send this to community at large for any help/thoughts. I search the archives and didn't really find anything that specifically speaks to my situation. So I guess I have s few questions. First, is this actually an issue with the machine running out of memory? If not, what might be the cause for the error? If so, is there a way to minimize the amount of memory used by the vector data structures (e.g., Berkeley DB)? Thanks, Gabe Wingfield IT and Program Specialist I Center for Applied Social Research University of Oklahoma 2 Partners Place 3100 Monitor, Suite 100 Norman, OK 73072 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cannot allocate vector size of... ?
Oops. Yep, I totally forgot my specs and such. I'm currently running R-2.4.1 on a 64-bit Linux box (Fedora Core 6) with 4GB of RAM. The files are 10-50Kb on average, but this error came about when only working with ~16,000 of them. The final size of the corpus is ~1.7M files. So, obviously, this memory thing is going to be a large issue for me. I'm going through re-searching the help list archives and now it looks like I have S Poetry to read as well. Thanks for all the suggestions. Any others are greatly appreciated as well. Gabe Wingfield IT and Program Specialist I Center for Applied Social Research University of Oklahoma 2 Partners Place 3100 Monitor, Suite 100 Norman, OK 73072 -Original Message- From: Patrick Burns [mailto:[EMAIL PROTECTED] Sent: Thursday, March 15, 2007 12:31 PM To: Wingfield, Jerad G. Subject: Re: [R] Cannot allocate vector size of... ? You can find a few things not to do (things that waste memory) in S Poetry. You don't say how much memory your machine has, nor how big your objects are. However, it is possible that getting more memory for your machine might be the best thing to do. Patrick Burns [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and A Guide for the Unwilling S User) Wingfield, Jerad G. wrote: Hello all, I've been working with R Fridolin Wild's lsa package a bit over the past few months, but I'm still pretty much a novice. I have a lot of files that I want to use to create a semantic space. When I begin to run the initial textmatrix( ), it runs for about 3-4 hours and eventually gives me an error. It's always ERROR: cannot allocate vector size of xxx Kb. I imagine this might be my computer running out of memory, but I'm sure. So I thought I would send this to community at large for any help/thoughts. I search the archives and didn't really find anything that specifically speaks to my situation. So I guess I have s few questions. First, is this actually an issue with the machine running out of memory? If not, what might be the cause for the error? If so, is there a way to minimize the amount of memory used by the vector data structures (e.g., Berkeley DB)? Thanks, Gabe Wingfield IT and Program Specialist I Center for Applied Social Research University of Oklahoma 2 Partners Place 3100 Monitor, Suite 100 Norman, OK 73072 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] new to R: don't understand errors
Hello all, I'm brand new to the use of R, and I'm trying to quickly learning the rudiments for a couple of projects here at work. I'm working with the lsa package and trying to generate various semantic spaces. I seem to do well with small collections of clean text files, but now that I am trying to work with larger collections of less than perfection files, I'm getting errors that I don't quite understand. So I'm hoping some of you out there might recognize my issues and be able to point me in the right direction to resolve them. Currently, I have a corpus of ~12,000 text files. I've separated them out into other folder of varying sizes to check if there is some sort of limit on the number of files. Even when I only use the same number as previous working collections, I still get the errors. So I am wondering if it might be something in the files themselves... At any rate I routinely get these two errors. The first is generated when I include a minDocFreq=x, and it looks a little like this when I run it: data(stopwords_en) CCauto = textmatrix( CultureMineTXT , minWordLength=3, minDocFreq=50, stopwords=stopwords_en) Error in data.frame(docs = basename(file), terms = names(tab), Freq = tab, : arguments imply differing number of rows: 1, 0 If I remove the minDocFreq, I get a different error: data(stopwords_en) CCauto = textmatrix( CultureMineTXT , minWordLength=3, stopwords=stopwords_en) Error in as.vector(x, mode) : invalid argument 'mode' Any help would be greatly appreciated. Gabe Wingfield IT and Program Specialist I Center for Applied Social Research University of Oklahoma 3200 Marshall Avenue, Suite 201 Norman, OK 73072 P: 405-325-4786 F: 405-321-6936 [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.