[R] Crosstab with Average and Count
I have the following data: x - as.factor(c(1,1,1,2,2,2,3,3,3)) y - as.factor(c(10,10,10,20,20,20,30,30,30)) z - c(100,100,NA,200,200,200,300,300,300) I could create the cross tab of x and y with Sum of z as its elements using the xtabs function as follows: # X Vs. Y with Sum Z xtabs(z ~ x + y) y x10 20 30 1 200 0 0 2 0 600 0 3 0 0 900 How do I replace the sum with average and count so that I can get the following outputs?? # X Vs. Y with Average of Z y x 10 20 30 1100 0 0 20 200 0 30 0 300 # X Vs. Y with Count Z y x10 20 30 12 0 0 20 3 0 30 0 3 Would appreciate any help on these? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Crosstab-with-Average-and-Count-tp4637180.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Merging on Datetime Column
I have the following dataframe with the first column being of type datetime: dateTime - c(10/01/2005 0:00, 10/01/2005 0:20, 10/01/2005 0:40, 10/01/2005 1:00, 10/01/2005 1:20) var1 - c(1,2,3,4,5) var2 - c(10,20,30,40,50) df - data.frame(dateTime = dateTime, var1 = var1, var2 = var2) df$dateTime - strptime(df$dateTime,%m/%d/%Y %H:%M) I want to create 10 minute interval data as follows: minTime - min(df$dateTime) maxTime - max(df$dateTime) newTime - seq(minTime,maxTime,600) newDf - data.frame(newDateTime = newTime) newDf - merge(newDf,df,by.x = newDateTime,by.y = dateTime,all.x = TRUE) The objective here is to create a data frame with values from df for the datetime in df and NA for the missing ones. However, I am getting the following data frame with both Var1 and Var2 having all NAs. newDf newDateTime var1 var2 1 2005-10-01 00:00:00 NA NA 2 2005-10-01 00:10:00 NA NA 3 2005-10-01 00:20:00 NA NA 4 2005-10-01 00:30:00 NA NA 5 2005-10-01 00:40:00 NA NA 6 2005-10-01 00:50:00 NA NA 7 2005-10-01 01:00:00 NA NA 8 2005-10-01 01:10:00 NA NA 9 2005-10-01 01:20:00 NA NA Can someone help me on how to do the merge based on the two datetime columns? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Merging-on-Datetime-Column-tp4636417.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Skipping lines and incomplete rows
Thanks a lot for the guidance. I have another text file with a time stamp and an empty column as given below: First line: Skip this line Second line: skip this line Third line: skip this line variable1 Variable2 Variable3 Variable4 Unit1 Unit2 Unit3 11/1/2004 0:00 0.1 0.001 11/1/2004 0:10 0.2 0.002 11/1/2004 0:20 0.3 0.003 11/1/2004 0:30 0.4 0.004 This is space separated text file. When I use the following code: head - readLines(testInput.txt, n=4)[4] dat - read.table(testInput.txt, skip=5, sep=,fill = TRUE, stringsAsFactors=FALSE) names(dat) - unlist(strsplit(head, )) I get the following output: str(dat) 'data.frame': 4 obs. of 4 variables: $ variable1: chr 11/1/2004 11/1/2004 11/1/2004 11/1/2004 $ Variable2: chr 0:00 0:10 0:20 0:30 $ Variable3: num 0.1 0.2 0.3 0.4 $ Variable4: num 0.001 0.002 0.003 0.004 Variable1's date and time gets split as Variable1 and Variable2 whereas they should both be part of Variable1. Also, the empty column is missing from the data frame. Is there a way to handle these two cases? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830p4636129.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Skipping lines and incomplete rows
Thanks a lot Rui and Arun. The methods work fine with the data I gave but when I tried the two methods with the following semi-colon separated data using sep = ;. Only the first 3 columnns are read properly rest of the columns are either empty or NAs. ** Remove this line Remove this line Remove this line Time;Actual Speed;Actual Direction;Temp;Press;Value1;Value2 ;[m/s];[°];°C;[hPa];[MWh];[MWh] 1/1/2012;0.0;0;#N/A;#N/A;0.;0. 1/2/2012;0.0;0;#N/A;#N/A;0.;0. 1/3/2012;0.0;0;#N/A;#N/A;1.5651;2.2112 1/4/2012;0.0;0;#N/A;#N/A;1.;2. 1/5/2012;0.0;0;#N/A;#N/A;3.2578;7.5455 *** I used the following code: dat1-read.table(testInput.txt,sep=;,skip=3,fill=TRUE,header=TRUE) dat1-dat1[-1,] row.names(dat1)-1:nrow(dat1) Could you please let me know what is wrong with this approach? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830p4635952.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Skipping lines and incomplete rows
I have a text file that has semi-colon separated values. The table is nearly 10,000 by 585. The files looks as follows: *** First line: Skip this line Second line: skip this line Third line: skip this line variable1 Variable2 Variable3 Variable4 Unit1 Unit2 Unit3 10 0.1 0.01 0.001 20 0.2 0.02 0.002 30 0.3 0.03 0.003 40 0.4 0.04 0.004 *** The first three lines need to be skipped. Moreover, line 5 doesn't have units for all the variables and hence, has to be skipped as well. Effectively, I want the following to be read to a dataframe skipping rows 1, 2, 3 and 5. *** variable1 Variable2 Variable3 Variable4 10 0.1 0.01 0.001 20 0.2 0.02 0.002 30 0.3 0.03 0.003 40 0.4 0.04 0.004 *** I tried using read.table with skip for line 1-3 as follows inputData - read.table(test.txt,sep = ;,skip = 3) but the line 4 is creating problem with the following error: Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 3 did not have 585 elements Can someone help me with this? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Large Test Datasets in R
I am looking for some large datasets (10,000 rows 100,000 columns or vice versa) to create some test sets. I am not concerned about the invidividual elements since I will be converting them to binary (0/1) by using arbitrary thresholds. Does any R package provide such big datasets? Also, what is the biggest text document collection available in R? tm package seems to provide only 20 records from the Reuters dataset. Is there any package that has 10,000+ documents?? Would appreciate any help on these. Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Large-Test-Datasets-in-R-tp4634330.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] htmlParse Error
I am trying to parse a webpage using the htmlParse command in XML package as follows: library(XML) u = http://en.wikipedia.org/wiki/World_population; doc = htmlParse(u) I get the following error: Error in htmlParse(u) : error in creating parser for http://en.wikipedia.org/wiki/World_population I am using a R 2.13.1 (32 bit version) on a 64 bit Windows. (I tried installing it in 64 bit version of R but getting an error that the previous version cannot be removed) Can someone please help with how to resolve this issue? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/htmlParse-Error-tp4630738.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Word Count
I have a sentence like the following: sentence - Part 1 is working, Part 2 is not working and Part 3 is working I would like th get the total count of working and not working as Working = 2 and Not Working = 1. Can someone help with how can this be done in R??? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Word-Count-tp4544970p4544970.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in prune in the rEMM package
I am trying to use rEMM package for the Extensible Markov Models. I tried the following sequence of code: emmt=EMM(measure=euclidean,threshold=0.75,lambda=0.001) emmt=build(emmt,data) new_threshold=sum(cluster_counts(emmt))*0.002 emmt_ new=prune(emmt,new_threshold) However, I get the following error when I run the last line of the code: Error in remove_clusters(x, rare_clusters(x, count_threshold = count_threshold)) : (subscript) logical subscript too long In addition: Warning message: In smc_removeState(x@tracds_d$mm, to_remove) : State 7210325432838344346362367369370376377390412425440445483489499 does not exist! I am unable to provide the data that I used since it is confidential. It would be great if someone can still help with the issue??? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Error-in-prune-in-the-rEMM-package-tp4474200p4474200.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Conditionally adding a constant
I am trying to add a constant to the previous value of a variable based on certain conditions. Maybe there is a simple way to do this that I am missing completely. I have given an example below: df - data.frame(x = c(1,2,3,4,5), y = c(10,20,30,NA,NA)) df x y 1 1 10 2 2 20 3 3 30 4 4 NA 5 5 NA I want to add 2 to the previous value of y, if x exceeds 3 (also will have to handle NAs in the process). The resulting output would look like: x y 1 1 10 2 2 20 3 3 30 4 4 32 5 5 34 Can someone please explain how to do it? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Conditionally-adding-a-constant-tp4253049p4253049.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] HTML Forms to R
I have currently a R function that reads a csv file, does some computations, produces some plots and writes a csv file as output. I would like to use HTML forms to make a user interface for calling appropriate parts of the functions (reading csv file, doing computations, displaying plots and writing csv files). Are there are tutorials available that would help me get started?? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/HTML-Forms-to-R-tp4164360p4164360.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sequential Sum in R
I am trying to code the following excel formula in R. ab cResultFormula 1 10 0.1 #N/A IF(B220,NA(),C2+IF(ISERROR(D1),0,D1)) 2 20 0.2 0.2 IF(B320,NA(),C3+IF(ISERROR(D2),0,D2)) 3 30 0.3 0.5 IF(B420,NA(),C4+IF(ISERROR(D3),0,D3)) 4 40 0.4 0.9 IF(B520,NA(),C5+IF(ISERROR(D4),0,D4)) 5 50 0.5 1.4 IF(B620,NA(),C6+IF(ISERROR(D5),0,D5)) 6 60 0.6 2 IF(B720,NA(),C7+IF(ISERROR(D6),0,D6)) 7 70 0.7 2.7 IF(B820,NA(),C8+IF(ISERROR(D7),0,D7)) 8 80 0.8 3.5 IF(B920,NA(),C9+IF(ISERROR(D8),0,D8)) 9 90 0.9 4.4 IF(B1020,NA(),C10+IF(ISERROR(D9),0,D9)) 10100 1 5.4 IF(B1120,NA(),C11+IF(ISERROR(D10),0,D10)) The variable Result is obtained using the excel formula shown next to it. Column D contains the Result. dataFrame - data.frame(a = seq(1:10),b = seq(10,100,by = 10),c = seq(0.1,1,by = 0.1)) Can someone please help me as how to calculate the sequential sum in R given by the excel formula?? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Sequential-Sum-in-R-tp4165916p4165916.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Spatial Statistics using R
Thanks, Raphael. Just checked their website. It appears that they currently do not have any online courses planned. -- View this message in context: http://r.789695.n4.nabble.com/Spatial-Statistics-using-R-tp4079092p4079574.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Spatial Statistics using R
Thanks a lot for the guidance. I will take a look at these options. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Spatial-Statistics-using-R-tp4079092p4082354.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Spatial Statistics using R
I am looking for online courses to learn Spatial Statistics using R. Statistics.com is offering an online course in December on the same topic but that schedule doesn't suit mine. Are there any other similar modes for learning spatial statistics using R??? Can someone please advice??? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Spatial-Statistics-using-R-tp4079092p4079092.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Automatic Labeling of Document Clusters
I am performing document clustering on a set of documents using R. I performed hierarchical clustering using hclust and have identified the cluster corresponding to each data point. I would like to lablel each cluster automatically in order to identify the top keywords associated with each cluster. This would help me in validating the clusters. Are there any packages in R that helps us do automatic labelling of clusters??? A few clustering labeling methods are given here: http://en.wikipedia.org/wiki/Cluster_labeling http://erulemaking.ucsur.pitt.edu/doc/papers/dgo06-labeling.pdf Thanks you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Automatic-Labeling-of-Document-Clusters-tp4038849p4038849.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Removing numbers from a list
I am using gsub to remove numbers for each element of a list. Code is given below. testList - list(this contains a number 1000,this does not contain) removeNumbers - function(X) { gsub(\\d,,X) } outputList - lapply(testList,removeNumbers) However, when I try to find the number of words in outputList as follows outLength - lapply(strsplit(outputList, ),length) it throws out the following error: Error in strsplit(outputList, ) : non-character argument Can someone help me with this? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Removing-numbers-from-a-list-tp4023074p4023074.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Min Frequency in findFreqTerms
I am using 'tm' package for text mining. I use the function findFreqTerms to obtain the frequent words based on their frequency in the term document matrix. The following is the example given in the help page of this function: library(tm) data(crude) tdm - TermDocumentMatrix(crude) findFreqTerms(tdm, 2, 3) The first three columns of the document term matrix are shown below: (bpd) (bpd). (gcc) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 The first term (bpd) has a frequency of 3 whereas the second and third terms have a frequency of 1 which is below the lowfreq = 2 specified. Can someone help me whether this is the right way of interpreting this function??? If so, is there a bug in the package?? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Min-Frequency-in-findFreqTerms-tp4019143p4019143.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reading stopwords from a csv file
I am using the tm package to do text miniing: I have a huge list of stopwords (2000+) that are in a csv file. I read it as follows: stopwordlist - read.csv(stopwords to be Removed 10042011.csv) myStopwords - as.character(stopwordlist$stopwords) When try removing the stopwords using tr1=tm_map(tr1,removeWords,myStopwords) I am getting the following error: Error in gsub(sprintf(\\b(%s)\\b, paste(words, collapse = |)), , : internal error in compiling regexp However, this works fine when I define myStopwords = c() instead of reading from the csv file. Can someone please help me to resolve this issue? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Reading-stopwords-from-a-csv-file-tp3871697p3871697.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading stopwords from a csv file
The following for loops does the work but it takes a good 30 minutes to run: for(i in 1:length(myStopwords)) { currentWord - myStopwords[i] tr1=tm_map(tr1,removeWords,currentWord) } Are there any faster alternatives?? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Reading-stopwords-from-a-csv-file-tp3871697p3871864.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SVD Memory Issue
I am trying to perform Singular Value Decomposition (SVD) on a Term Document Matrix I created using the 'tm' package. Eventually I want to do a Latent Semantic Analysis (LSA). There are 5677 documents with 771 terms (the DTM is 771 x 5677). When I try to do the SVD, it runs out of memory. I am using a 12GB Dual core Machine with Windows XP and don't think I can increase the memory anymore. Are there any other memory efficient methods to find the SVD? The term document is obtained using: tdm2 - TermDocumentMatrix(tr1,control=list(weighting=weightTf,minWordLength=3)) str(tdm2) List of 6 $ i : int [1:6438] 202 729 737 278 402 621 654 718 157 380 ... $ j : int [1:6438] 1 2 3 7 7 7 7 8 10 10 ... $ v : num [1:6438] 8 5 6 9 5 7 5 6 5 7 ... $ nrow: int 771 $ ncol: int 5677 $ dimnames:List of 2 ..$ Terms: chr [1:771] access accessori accumul acoust ... ..$ Docs : chr [1:5677] 1 2 3 4 ... - attr(*, class)= chr [1:2] TermDocumentMatrix simple_triplet_matrix - attr(*, Weighting)= chr [1:2] term frequency tf SVD is calcualted using: tdm_matrix - as.matrix(tdm2) svd_out-svd(tdm_matrix) Error: cannot allocate vector of size 767.7 Mb In addition: Warning messages: 1: In matrix(0, n, np) : Reached total allocation of 3583Mb: see help(memory.size) 2: In matrix(0, n, np) : Reached total allocation of 3583Mb: see help(memory.size) 3: In matrix(0, n, np) : Reached total allocation of 3583Mb: see help(memory.size) 4: In matrix(0, n, np) : Reached total allocation of 3583Mb: see help(memory.size) Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/SVD-Memory-Issue-tp3809667p3809667.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] findFreqTerms vs minDocFreq in Package 'tm'
I am using 'tm' package for text mining and facing an issue with finding the frequently occuring terms. From the definition it appears that findFreqTerms and minDocFreq are equivalent commands and both tries to identify the documents with terms appearing more than a specified threshold. However, I am getting drastically different results with both. I have given the results from both the commands below: findFreqTerms identifies 3140 words that appear more than 5 times but minDocFreq identifies only 659 terms. Can someone please explain the reason for the different or whether I have misunderstood their definitions?? tdm1 - TermDocumentMatrix(tr1,control=list(weighting=weightBin)) freq_terms - findFreqTerms(tdm1, lowfreq =5, highfreq = Inf) str(freq_terms) chr [1:3140] abc abil abl abnorm abort absenc ... tdm2 - TermDocumentMatrix(tr1,control=list(minDocFreq=5,minWordLength=1)) str(tdm2) List of 6 $ i : int [1:4703] 173 616 624 241 350 534 563 609 129 333 ... $ j : int [1:4703] 1 2 3 7 7 7 7 8 10 10 ... $ v : num [1:4703] 7 5 6 9 5 7 5 5 5 7 ... $ nrow: int 659 $ ncol: int 5677 $ dimnames:List of 2 ..$ Terms: chr [1:659] \024 \026 ac access ... ..$ Docs : chr [1:5677] 1 2 3 4 ... - attr(*, class)= chr [1:2] TermDocumentMatrix simple_triplet_matrix - attr(*, Weighting)= chr [1:2] term frequency tf Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/findFreqTerms-vs-minDocFreq-in-Package-tm-tp3806644p3806644.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] findFreqTerms vs minDocFreq in Package 'tm'
Thanks, Bettina. -- View this message in context: http://r.789695.n4.nabble.com/findFreqTerms-vs-minDocFreq-in-Package-tm-tp3806644p3808134.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Distance between a vector and matrix rows
I am trying to find the distance between a vector and each row of a dataframe. I am using the function distancevector in the package hopach as follows: mydata-as.data.frame(matrix(c(1,1,1,1,0,1,1,1,1,0),nrow=2)) V1 V2 V3 V4 V5 1 1 1 0 1 1 2 1 1 1 1 0 vec - c(1,1,1,1,1) d2-distancevector(mydata,vec,d=euclid) The Euclidean distance between the two rows of the data frame to the vector should be 1. But I am getting 0.4472136 for both. Can someone please point out the reason for the discrepancy??? Also, are there other packages for calculating the distance between a binary vector and all the rows of a data frame (contains only binary values)? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Distance-between-a-vector-and-matrix-rows-tp3726268p3726268.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Distance between a vector and matrix rows
Thank you both for your reply. I went with the cosine function for similarity and used it with apply to get a measure of distance. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Distance-between-a-vector-and-matrix-rows-tp3726268p3726610.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Extracting certain text using tm package
I have used tm package to import a set of text documents using the following command: text - Corpus(DirSource(.),readerControl = list(language =ansi)) I would like to extract only a certain portion of the text in each document using certain keywords. For example, I would like to include all the text between key words Start Text and End Text. All the remaining text should be discarded. Is there anyway to accomplish this in 'tm' package??? Also, is there a quick way to remove all the HTML tags from the text??? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Extracting-certain-text-using-tm-package-tp3627063p3627063.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R program writing standard/practices
Check this out: http://www1.maths.lth.se/help/R/RCC/ -- View this message in context: http://r.789695.n4.nabble.com/R-program-writing-standard-practices-tp3588716p3588911.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Checking for combination of words in a sentence
I am trying to implement some expert rules based on the presence or absence of words in a sentence. I have given a reproducible example below. In this, every time I come across the words lunch and bag in the same sentence, the outcome would be 1. If lunch and pack are in the same sentence, then the outcome would be 2. If only lunch is present, the outcome would be 3. There is no guarantee that these two words (lunch/bag or lunch/pack) will be next to each other in the sentence. I tried to implement this using regexpr but the last rule (lunch - 3) supersedes the other two rules. This works fine only if I have lunch only as the first rule following by the other two. Is there a way to make sure outcome will be 3 if only lunch is present? (I have hundreds of rules; Hence, finding out the correct order manually is not possible( keyWord - c(lunch bag,lunch pack,lunch) outcome - c(1,2,3) expertRules- data.frame(keyWord = keyWord, outcome = outcome) testWords - c(lunch pack,lunch,lunch,lunch bag,lunch pack) predictedOutcome - c(NA,NA,NA,NA,NA) testDf - data.frame(testWords = testWords, predictedOutcome = predictedOutcome) for(i in 1:nrow(expertRules)) { testDf$predictedOutcome - ifelse((regexpr(expertRules[i,1],testDf$testWords)0), expertRules[i,2], testDf$predictedOutcome) } testDf testWords predictedOutcome 1 lunch pack3 2 lunch3 3 lunch3 4 lunch bag3 5 lunch pack3 Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Checking-for-combination-of-words-in-a-sentence-tp3570104p3570104.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] append date to write csv filename
You could use the paste function to define the filename with date appended to it. See the example below: currentDate - Sys.Date() csvFileName - paste(C:/R/Remake/XPX,currentDate,.csv,sep=) write.csv(S1X.sub, file=csvFileName) -- View this message in context: http://r.789695.n4.nabble.com/append-date-to-write-csv-filename-tp3570379p3570420.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Text Summarization
Is there a text mining/ NLP package in R that could do text summarization? For example, take a huge text as input and provide a summary of the text. In package tm, summarization is defined more as high frequency terms which is not what I want. I actually want a summary of what is present in the huge volume of text. Any help on a R package would be helpful. Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Text-Summarization-tp3562735p3562735.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using read.xls
I am using read.xls command from the gdata package. I get the following error when I try to read a work sheet from an excel sheet. Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method, : Intermediate file 'C:\Tmp\RtmpYvLnAu\file7f06650f.csv' missing! In addition: Warning message: running command 'C:\Apps\Perl\bin\perl.exe C:/Program Files/R/R-2.13.0/library/gdata/perl/xls2csv.pl excelFileName.xls C:\Tmp\RtmpYvLnAu\file7f06650f.csv Test Sheet' had status 5 Error in file.exists(tfn) : invalid 'file' argument However, the same command works fine with another excel file stored in the same directory. Could you please let me know what is causing this problem?? Thank you. -- View this message in context: http://r.789695.n4.nabble.com/Using-read-xls-tp3552122p3552122.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fortran DLL in Spotfire
I have a R code that loads a FORTRAN DLL to do some calculations. The code works fine when I use it in R. But when I try it in spotfire it throws an error that the it is unable to load the shared library and the specified DLL cannot be found. I have used setwd to point to the location in the spotfire statistical services server library. Is this the correct way to call the DLL in spotfire??? I would appreciate any inputs on this. Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Fortran-DLL-in-Spotfire-tp3547779p3547779.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Building Custom GUIs for R
I am looking to build simple GUIs based on the R codes I have. The main objective is to hide the scary R codes from non-programming people and make it easier for them to try out different inputs. For example, 1. The GUI will have means to upload a csv file which will be read by the R code. 2. A button to preprocess data (carried out by a R function behind) 3. A button to build some models and run simulations 4. Space to display visual charts based on the simulations results 5. Option to save the results to a csv file or something similar. Are there any tools currently available that enable us build GUIs??? (MATLAB has a GUI builder that enables the users build custom GUIs). Can we make a exe of such GUI (with the R code) and let people use it without having to install R??? Any help on this would be much appreciated?? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Building-Custom-GUIs-for-R-tp3537794p3537794.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Building Custom GUIs for R
Thanks everyone. I will try out the packages you have mentioned. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Building-Custom-GUIs-for-R-tp3537794p3538539.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Total effect of X on Y under presence of interaction effects
This is what I believe is referred to as supression in regression, where the correlation correlation between the independent and the dependent variable turns out to be of one sign whereas the regression coefficient turns out to be of the opposite sign. Read here about supression: http://www.uvm.edu/~dhowell/gradstat/psych341/lectures/MultipleRegression/multreg3.html HTH -- View this message in context: http://r.789695.n4.nabble.com/Total-effect-of-X-on-Y-under-presence-of-interaction-effects-tp3514137p3516446.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fortran Symbol Name not in Load Table
I am trying to call a FORTRAN subroutine from R. is.loaded is turning out to be TRUE. However when I run my .Fortran command I get the following error: Error in .Fortran(VALUEAHROPTIMIZE, as.double(ahrArray), as.double(kwArray), : Fortran symbol name valueahroptimize not in load table I have given the FORTRAN declaration below: subroutine VALUEAHROPTIMIZE(AHR, KW, min_IHR_delta, max_AHR_error, AHR_out,!! Output AHE array IHR_out,!! Output IHR array Errors_out) ! Expose subroutine my_function to users of this DLL !DEC$ ATTRIBUTES DLLEXPORT,C,REFERENCE,ALIAS:'VALUEAHROPTIMIZE_'::VALUEAHROPTIMIZE ! Body of my_function Implicit None Integer *4 IERR, iSum DOUBLE PRECISION min_IHR_delta, max_AHR_error logical switch_AHR_tuner character * 512 AHR_tuner_FilePath !!DOUBLE PRECISION AHR(500), kW(500) !! Initial Array for reading Namelist DOUBLE PRECISION AHR(*), kW(*) !! Initial Array for reading Namelist DOUBLE PRECISION AHR_out(*), IHR_out(*) integer Errors_out(*) The R code I tried using is given below: ahrArray - runif(147) kwArray - runif(147) outputAHR - c(rep(0,11*11)) outputIHR - c(rep(0,11*11)) outputError - c(rep(NA,11)) dyn.load(my_function.dll) is.loaded(VALUEAHROPTIMIZE) [1] TRUE .Fortran(VALUEAHROPTIMIZE, as.double(ahrArray), as.double(kwArray), as.double(0.0005), as.double(5), as.double(outputAHR), as.double(outputIHR), as.integer(outputError)) Can someone please help me with how to fix this issue? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Fortran-Symbol-Name-not-in-Load-Table-tp3509852p3509852.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fortran Symbol Name not in Load Table
I used the DLL export viewer to what is the table name being exported. It is showing as VALUEAHROPTIMIZE_. This is the name of the function we have used plus the underscore. Is there any other reason for the function not getting recognized??? Thanks. -- View this message in context: http://r.789695.n4.nabble.com/Fortran-Symbol-Name-not-in-Load-Table-tp3509852p3510761.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SQP with Constraints
I am trying to optimize function similar to the following: Minimize x1^2 - x2^2 - x3^2 st x1 x2 x2 x3 The constraint is that the variables should be monotonically increasing. Is there any package that implements Sequential Quadratic Programming with ability include these constraints??? Thanks you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/SQP-with-Constraints-tp3510857p3510857.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] histograms and for loops
This should work!! for(i in 1:12){ xLabel - paste(Graph,i) plotTitle - paste(Graph,i,.jpg) jpeg(plotTitle) print(hist(zNort1[,i], freq=FALSE, xlab=xLabel, col=blue, main=Standardized Residuals Histogram, ylim=c(0,1), xlim=c(-3.0,3.0)),axes = FALSE) axis(1, col = blue,col.axis = blue) axis(2, col= red,col.axis = red) zNortmin-min(zNort1[,1]) zNortmax-max(zNort1[,1]) zNortmean-mean(zNort1[,1]) zNortsd-sd(zNort1[,1]) X1-seq(-3.0, 3.0, by=.01) lines(X1, dnorm(X1, zNortmean, zNortsd) , col=black) dev.off() } -- View this message in context: http://r.789695.n4.nabble.com/histograms-and-for-loops-tp3503648p3503758.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Loading a FORTRAN DLL
I have a FORTRAN DLL file obtained from Compaq Visual Fortran and when I try to load the DLL into the R environment I get an error. dyn.load(my_function.dll) This application has failed to start because MSCVRTD.dll was not found. Re-installing this application may fix the problem. When I tried it again, the above error doesn't appear anymore. Instead, I get the following error: Error in inDL(x, as.logical(local), as.logical(now), ...) : unable to load shared library 'D://my_function.dll': LoadLibrary failure: The specified module could not be found. Do I need to have FORTRAN installed to be able to run the DLL file??? Can someone please help me with what is causing this error??? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Loading-a-FORTRAN-DLL-tp3493263p3493263.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fitting gamma and exponential Distributions with fitdist
Joshua, thanks for your reply. I have tried out the following scaling and it seems to work fine: scaledVariable - (test-min(test)+0.001)/(max(test)-min(test)+0.002) The gamma distribution parameters are obtained using the scaled variable and samples obtained from this distributions are scaled back using: scaled - (randomSamples*(max(test) - min(test) + 0.002)) + min(test) - 0.001 Is there a better way to scale the variable??? I would prefer fitting a distribution without scaling it. Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Fitting-gamma-and-exponential-Distributions-with-fitdist-tp3477391p3480265.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fitting gamma and exponential Distributions with fitdist
I tried using JMP for the same and get two distinct recommendations when using the unscaled values. When using the unscaled values, Log Normal appears to be best fit. fitdist in R is unable to provide a fit in this case. Compare Distributions ShowDistributionNumber of Parameters -2*LogLikelihood AICc X LogNormal 2 1016.29587 1020.50639 Johnson Sl 3 1015.21183 1021.6404 GLog3 1016.29587 1022.72444 Exponential 1 1021.58662 1023.65559 Johnson Su 4 1015.21183 1023.9391 Gamma 2 1021.02475 1025.23528 Weibull 2 1021.50762 1025.71815 Extreme Value 2 1021.50762 1025.71815 Normal 2 Mixture5 1042.55455 1053.66566 Normal 3 Mixture8 1042.74433 1061.56786 Normal 2 1082.36992 1086.58045 However, when using the scaled values, Gamma appears to be best fit. I am getting the same using R as well. Compare Distributions ShowDistributionNumber of Parameters -2*LogLikelihood AICc X Gamma 2 -114.92911 -110.71858 Weibull 2 -113.54302 -109.3325 Extreme Value 2 -113.54302 -109.3325 Exponential 1 -108.01019 -105.94122 Johnson Sl 3 -104.69191 -98.263335 Johnson Su 4 -104.69191 -95.964634 GLog3 -102.35037 -95.921798 LogNormal 2 -70.727608 -66.517082 Normal 2 Mixture5 -77.349192 -66.238081 Normal 3 Mixture8 -77.159407 -58.335878 Normal 2 -37.533813 -33.323287 What is the difference between the MLE methods in JMP and R??? Is it advisable to go with the scaled values in R??? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Fitting-gamma-and-exponential-Distributions-with-fitdist-tp3477391p3480422.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fitting gamma and exponential Distributions with fitdist
I am trying to fit gamma and exponential distributions using fitdist function in the fitdistrplus package to the data I have and obtain the parameters along with the AIC values of the fit. However, I am getting errors with both distributions. I have given an reproducible example with the errors I am getting below. Can someone please let me know how to overcome this issue??? library(fitdistrplus) test - (895.13582915.7447,335.5472,1470.4022,194.5461,1814.2328, 1056.3067,3110.0783,11441.8656,142.1714,2136.0964,1958.9022, 891.89,352.6939,1341.7042,167.4883,2502.0528,1742.1306, 837.1481,867.8533,3590.4308,1125.9889,1200.605,4321.0011, 1873.9706,323.6633,1912.3147,865.6058,2870.8592,236.7214, 580.2861,350.9269,6842.4969,1886.2403,265.5094,199.9825, 1215.6197,7241.8075,2381.9517,3078.1331,5461.3703,2051.3997, 751.6575,714.3536,598.4539,425.6656,215.2103,608.785, 369.4744,2398.6506,918.6844,525.6925,2549.3694,4108.8983, 2824.0758,1068.7508,249.995,3863.9839,1152.1506,531.6844) fitdist(test,gamma,method =mle) Error in fitdist(test, gamma, method = mle) : the function mle failed to estimate the parameters, with the error code 100 In addition: Warning messages: 1: In dgamma(x, shape, scale, log) : NaNs produced 2: In dgamma(x, shape, scale, log) : NaNs produced 3: In dgamma(x, shape, scale, log) : NaNs produced 4: In dgamma(x, shape, scale, log) : NaNs produced 5: In dgamma(x, shape, scale, log) : NaNs produced 6: In dgamma(x, shape, scale, log) : NaNs produced 7: In dgamma(x, shape, scale, log) : NaNs produced 8: In dgamma(x, shape, scale, log) : NaNs produced 9: In dgamma(x, shape, scale, log) : NaNs produced fitdist(test,exp,method =mle) Error in fitdist(test, exp, method = mle) : the function mle failed to estimate the parameters, with the error code 100 In addition: Warning message: In dexp(x, 1/rate, log) : NaNs produced Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Fitting-gamma-and-exponential-Distributions-with-fitdist-tp3477391p3477391.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fitting gamma and exponential Distributions with fitdist
There was a small error in the data creation step and have fixed it as below: test - c(895.1358,2915.7447,335.5472,1470.4022,194.5461,1814.2328, 1056.3067,3110.0783,11441.8656,142.1714,2136.0964,1958.9022, 891.89,352.6939,1341.7042,167.4883,2502.0528,1742.1306, 837.1481,867.8533,3590.4308,1125.9889,1200.605,4321.0011, 1873.9706,323.6633,1912.3147,865.6058,2870.8592,236.7214, 580.2861,350.9269,6842.4969,1886.2403,265.5094,199.9825, 1215.6197,7241.8075,2381.9517,3078.1331,5461.3703,2051.3997, 751.6575,714.3536,598.4539,425.6656,215.2103,608.785, 369.4744,2398.6506,918.6844,525.6925,2549.3694,4108.8983, 2824.0758,1068.7508,249.995,3863.9839,1152.1506,531.6844) Any help would be appreciated. Thank you. -- View this message in context: http://r.789695.n4.nabble.com/Fitting-gamma-and-exponential-Distributions-with-fitdist-tp3477391p3480133.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Integrate na.rm in own defined functions
This should work!! rmse-function (x){ dquared-x^2 sum1-sum(x^2,na.rm=TRUE) rmse-sqrt((1/length(x))*sum1) rmse} -- View this message in context: http://r.789695.n4.nabble.com/Integrate-na-rm-in-own-defined-functions-tp3462492p3462615.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Optimzing a nested function
I am trying to optimize a nested function using nlminb. This throws out an error that y is missing. Can someone help me with the correct syntax?? Thank you. test1 - function(x,y) { sum - x + y return(sum) } test2 - function(x,y) { sum - test1(x,y) sumSq - sum*sum return(sumSq) } nlminb(start = c(1,1), test2,lower = c(0,0), upper = c(5,5)) -- View this message in context: http://r.789695.n4.nabble.com/Optimzing-a-nested-function-tp3443825p3443825.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] flow map lines between point pairs (latitude/longitude)
I am working on a similar problem. I have to add two columns: one containing the US state to which the origin belongs and another one to add the state in to which destination belongs. All I have is the latitude and the longitude of the origin and destination. Are there any packages in R that can do this??? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/flow-map-lines-between-point-pairs-latitude-longitude-tp860009p3383842.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Seasonality in STL Decomposition
I having issues with interpreting the results of STL decomposition. The following is the data used as well as the decompsed seasonality, trend and the remainder components. It is a weekly data. The original data doesn't appear to be seasonal. But there seems to be a periodic peak in the seasonal component. Can some one please let me know how to interpret the seasonality plot in this??? Also, what does the negative values mean in the seasonality and the remainder components??? Thank you. Ravi input - c(2152.787,1686.4335,1856.3705,1719.872,1755.1917,1705.3385,1701.6683,1610.2746,1554.6005,1463.85, 1429.2155,1453.1788,1385.1424,1346.8169,1425.896,1388.6765,1349.8714,1293.0909,1279.1281,1264.8579, 1270.9211,1278.4987,1271.3811,1270.448,1243.3399,1279.2172,1337.455,1316.6042,1288.9275,1287.403, 1290.0595,1317.4027,1365.4406,1302.6746,1254.4507,1274.5924,1281.1415,1282.0631,1249.1892,1312.9578, 1253.104,1256.7432,1253.802,1294.004,1252.422,1259.6932,1281.6017,1274.6644,1267.9903,1229.1065, 1259.2091,1275.5516,1220.9909,1217.682,1252.5307,1251.7036,1237.4322,1269.2433,1253.5857,1209.1975, 1188.7189,1203.7423,1267.6006,1208.4465,1219.0118,1214.3023,1209.4117,1203.1483,1214.807,1165.5629, 1189.984,1230.2438,1265.8107,1206.4665,1197.2052,1189.0741,1191.9008,1209.7856,1203.7599,1206.3015, 1164.3207,1197.648,1175.555,1188.6788,1206.7764,1210.7119,1209.0285,1205.9406,1202.4434,1222.4249, 1233.8498,1244.0629,1228.476,1235.1948,1264.1828,1242.6138,1241.0844,1313.3505,1317.6045,1260.7927, 1255.9414,1303.2353,1329.5308,1322.976,1324.9601,1366.406,1319.9496,1357.3928,1334.4911,1302.5362, 1336.5733,1293.3127,1366.4449,1364.4946,1339.0855,1371.0803,1337.266,1349.3965,1308.6867,1402.4016) timeSeries - ts(input, start = c(2004, 20),frequency = 52) stlDecompose - stl(timeSeries, s.window = periodic,s.degree = 1) plot(stlDecompose) Decomposition Results: seasonaltrend remainder 2004.365 215.915141 1585.077 3.517953e+02 2004.385 75.846356 1576.654 3.393327e+01 2004.404 131.292835 1568.231 1.568465e+02 2004.423 100.668748 1559.809 5.939475e+01 2004.442 102.720994 1551.386 1.010849e+02 2004.462 88.423662 1542.963 7.395170e+01 2004.481 95.695097 1534.540 7.143275e+01 2004.500 38.382598 1526.118 4.577424e+01 2004.519 39.744099 1517.695 -2.838678e+00 2004.538 16.208476 1509.559 -6.191717e+01 2004.558 19.836887 1501.422 -9.204370e+01 2004.577 21.128396 1493.286 -6.123552e+01 2004.596 -6.943227 1485.150 -9.306392e+01 2004.615 -15.429798 1477.013 -1.147665e+02 2004.635 -2.455302 1468.877 -4.052547e+01 2004.654 16.103626 1460.740 -8.816752e+01 2004.673 -43.781184 1452.604 -5.895142e+01 2004.692 -93.469553 1444.576 -5.801584e+01 2004.712 -84.916472 1436.549 -7.250401e+01 2004.731 -68.597741 1428.521 -9.506523e+01 2004.750 -44.458760 1420.493 -1.051133e+02 2004.769 -67.619804 1412.465 -6.634695e+01 2004.788 -73.086998 1404.438 -5.996964e+01 2004.808 -74.896842 1396.410 -5.106519e+01 2004.827 -84.315286 1388.382 -6.072713e+01 2004.846 -54.710480 1380.497 -4.656974e+01 2004.865 -25.880674 1372.613 -9.276842e+00 2004.885 -32.311518 1364.728 -1.581190e+01 2004.904 -64.416512 1356.843 -3.498703e+00 2004.923 -45.672884 1348.958 -1.588193e+01 2004.942 -52.548907 1341.073 1.535493e+00 2004.962 -29.473180 1333.188 1.368787e+01 2004.981 6.436797 1325.303 3.370069e+01 2005.000 -20.697564 1318.761 4.611184e+00 2005.019 -43.370325 1312.219 -1.439782e+01 2005.038 -32.562537 1305.677 1.478221e+00 2005.058 -28.755698 1299.135 1.076261e+01 2005.077 -16.133184 1292.592 5.603832e+00 2005.096 -24.686719 1286.050 -1.217440e+01 2005.115 14.475095 1279.508 1.897452e+01 2005.135 -21.074291 1272.966 1.212235e+00 2005.154 -15.042155 1269.978 1.807456e+00 2005.173 -1.165620 1266.990 -1.202212e+01 2005.192 9.004016 1264.002 2.099840e+01 2005.212 -11.698548 1261.013 3.107118e+00 2005.231 28.962729 1258.025 -2.729480e+01 2005.250 42.936605 1255.037 -1.637202e+01 2005.269 11.954682 1252.049 1.066076e+01 2005.288 7.084609 1249.061 1.184488e+01 2005.308 11.654557 1247.216 -2.976370e+01 2005.327 40.218506 1245.370 -2.637988e+01 2005.346 45.477254 1243.525 -1.345096e+01 2005.365 215.915141 1241.680 -2.366044e+02 2005.385 75.846356 1239.835 -9.799933e+01 2005.404 131.292835 1237.990 -1.167519e+02 2005.423 100.668748 1236.145 -8.510978e+01 2005.442 102.720994 1234.299 -9.958826e+01 2005.462 88.423662 1233.257 -5.243698e+01 2005.481 95.695097 1232.214 -7.432316e+01 2005.500 38.382598 1231.171 -6.035601e+01 2005.519 39.744099 1230.128 -8.115327e+01 2005.538 16.208476 1229.085 -4.155139e+01 2005.558 19.836887 1228.042 1.972135e+01 2005.577 21.128396 1227.000 -3.968141e+01 2005.596 -6.943227 1225.957 -1.638219e-03 2005.615 -15.429798 1226.144 3.587887e+00 2005.635 -2.455302 1226.332 -1.446475e+01 2005.654 16.103626 1226.519 -3.947463e+01 2005.673 -43.781184 1226.707 3.188134e+01 2005.692 -93.469553 1226.894 3.213806e+01 2005.712 -84.916472 1227.082 4.781853e+01 2005.731 -68.597741 1227.269
Re: [R] Zero Inflated Distributions
Any help on this would be appreciated. Thank you. -- View this message in context: http://r.789695.n4.nabble.com/Zero-Inflated-Distributions-tp3334861p3338344.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Zero Inflated Distributions
I am currently fitting the following distributions using JMP and looking for ways to fit the same distributions in R: Zero Inflated Lognormal Zero Inflated Loglogistic Zero Inflated Frechet Zero Inflated Weibull Threshold Frechet Threshold Loglogistic Threshold Lognormal Log Generalized Gamma Threshold Weibull LEV Logistic Normal SEV Are there any packages that contain these distributions??? I am specifically interested in the zero inflated distributions since the data I have contains quite a bit of zeros. Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Zero-Inflated-Distributions-tp3334861p3334861.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Zero Inflated Distributions
Thanks, Thierry. Has anyone used the bayescount for estimating zero inflated distributions? It states that it is a crude function. Does that mean the estimates are only approximate??? The example they have given seems to work only with Gamma Poisson. data - rpois(100, rgamma(100, shape=1, scale=8)) data[1:15] - 0 maximise.likelihood(data, ZIGP) However, when I tried fitting Gamma/LogNormal/Weibull (assuming that data is continuous), it throws out the following error: shape scale zi 9.532 4 21 Error in optim(c(shape, scale, zi), f6, control = list(fnscale = -1)) : function cannot be evaluated at initial parameters What is this error about??? Moreover, the function seems extremely slow. For the 100 data point example considered, it takes around 8 seconds for the estimation. Please let me know your opinions on this package and alternative packages, if any. Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Zero-Inflated-Distributions-tp3334861p3335122.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with loading the Snowball package
I tried using the Snowball package for performing stemming in text mining. But when I tried to load the package the following error is thrown: Error : .onLoad failed in loadNamespace() for 'Snowball', details: call: NULL error: .onLoad failed in loadNamespace() for 'rJava', details: call: hive[[hive$CurrentVersion]] error: attempt to select less than one element Error: package/namespace load failed for 'Snowball' Latest version of Java is installed in my system. I am not sure where the problem is. Can someone help me on this? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Problem-with-loading-the-Snowball-package-tp3248487p3248487.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 3D Binning
This worked fine. Thanks. -- View this message in context: http://r.789695.n4.nabble.com/3D-Binning-tp3236223p3248489.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 3D Binning
I am trying to do binning on three variables (3d binning). The bin boundaries are specified by the user separately for each variable. I used the bin2 function in the 'ash' package for 2d binning that involves only two variables but didn't any package for similar binning with three variables. Are there any packages or codes available for 3d binning?? Thank you. -- View this message in context: http://r.789695.n4.nabble.com/3D-Binning-tp3236223p3236223.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 3D Binning
I am trying to do binning on three variables (3d binning). The bin boundaries are specified by the user separately for each variable. I used the bin2 function in the 'ash' package for 2d binning that involves only two variables but didn't any package for similar binning with three variables. Are there any packages or codes available for 3d binning?? Thank you. -- View this message in context: http://r.789695.n4.nabble.com/3D-Binning-tp3229137p3229137.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Which - value not present
I am trying to use which function to obtain the index of a value in a dataframe. Depending on whether the value is present in the dataframe or not I am performing further operations to the dataframe. However, if the value is not present in the dataframe, I am getting an integer(0). How do I check for integer(0)? something like is.na??? Thank you. Ravishankar -- View this message in context: http://r.789695.n4.nabble.com/Which-value-not-present-tp3035455p3035455.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Which - value not present
Thank you. It works fine. -- View this message in context: http://r.789695.n4.nabble.com/Which-value-not-present-tp3035455p3035575.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random Forest AUC
Thanks Max and Andy. If the Random Forest is always giving an AUC of 1, isn't it over fitting??? If not, how do you differentiate this from over fitting??? I believe Random forests are claimed to never over fit (from the following link). http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#features Ravishankar R -- View this message in context: http://r.789695.n4.nabble.com/Random-Forest-AUC-tp3006649p3008157.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Random Forest AUC
Guys, I used Random Forest with a couple of data sets I had to predict for binary response. In all the cases, the AUC of the training set is coming to be 1. Is this always the case with random forests? Can someone please clarify this? I have given a simple example, first using logistic regression and then using random forests to explain the problem. AUC of the random forest is coming out to be 1. data(iris) iris - iris[(iris$Species != setosa),] iris$Species - factor(iris$Species) fit - glm(Species~.,iris,family=binomial) train.predict - predict(fit,newdata = iris,type=response) library(ROCR) plot(performance(prediction(train.predict,iris$Species),tpr,fpr),col = red) auc1 - performance(prediction(train.predict,iris$Species),auc)@y.values[[1]] legend(bottomright,legend=c(paste(Logistic Regression (AUC=,formatC(auc1,digits=4,format=f),),sep=)), col=c(red), lty=1) library(randomForest) fit - randomForest(Species ~ ., data=iris, ntree=50) train.predict - predict(fit,iris,type=prob)[,2] plot(performance(prediction(train.predict,iris$Species),tpr,fpr),col = red) auc1 - performance(prediction(train.predict,iris$Species),auc)@y.values[[1]] legend(bottomright,legend=c(paste(Random Forests (AUC=,formatC(auc1,digits=4,format=f),),sep=)), col=c(red), lty=1) Thank you. Regards, Ravishankar R -- View this message in context: http://r.789695.n4.nabble.com/Random-Forest-AUC-tp3006649p3006649.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.