[R] Crosstab with Average and Count

2012-07-20 Thread vioravis
I have the following data:

x - as.factor(c(1,1,1,2,2,2,3,3,3))
y - as.factor(c(10,10,10,20,20,20,30,30,30))
z - c(100,100,NA,200,200,200,300,300,300)

I could create the cross tab of x and y with Sum of z as its elements using
the xtabs function as follows:

# X Vs. Y with Sum Z

xtabs(z ~ x + y)

   y
x10  20  30
  1 200   0   0
  2   0 600   0
  3   0   0 900

How do I replace the sum with average and count so that I can get the
following outputs??

# X Vs. Y with Average of Z
   y
x  10  20  30
  1100 0   0
  20   200 0
  30   0   300

# X Vs. Y with Count Z
  y
x10  20  30
 12   0   0
 20   3   0
 30   0   3

Would appreciate any help on these? Thank you.

Ravi





--
View this message in context: 
http://r.789695.n4.nabble.com/Crosstab-with-Average-and-Count-tp4637180.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Merging on Datetime Column

2012-07-13 Thread vioravis
I have the following dataframe with the first column being of type datetime:

dateTime - c(10/01/2005 0:00,
  10/01/2005 0:20,
  10/01/2005 0:40,
  10/01/2005 1:00,
  10/01/2005 1:20)
var1 - c(1,2,3,4,5)
var2 - c(10,20,30,40,50)
df - data.frame(dateTime = dateTime, var1 = var1, var2 = var2)
df$dateTime - strptime(df$dateTime,%m/%d/%Y %H:%M)

I want to create 10 minute interval data as follows:

minTime - min(df$dateTime)
maxTime - max(df$dateTime)
newTime - seq(minTime,maxTime,600)
newDf - data.frame(newDateTime = newTime)
newDf - merge(newDf,df,by.x = newDateTime,by.y = dateTime,all.x = TRUE)

The objective here is to create a data frame with values from df for the
datetime in df and NA for the missing ones. However, I am getting the
following data frame with both Var1 and Var2 having all NAs.

 newDf
  newDateTime var1 var2
1 2005-10-01 00:00:00   NA   NA
2 2005-10-01 00:10:00   NA   NA
3 2005-10-01 00:20:00   NA   NA
4 2005-10-01 00:30:00   NA   NA
5 2005-10-01 00:40:00   NA   NA
6 2005-10-01 00:50:00   NA   NA
7 2005-10-01 01:00:00   NA   NA
8 2005-10-01 01:10:00   NA   NA
9 2005-10-01 01:20:00   NA   NA

Can someone help me on how to do the merge based on the two datetime
columns?

Thank you.

Ravi






--
View this message in context: 
http://r.789695.n4.nabble.com/Merging-on-Datetime-Column-tp4636417.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Skipping lines and incomplete rows

2012-07-11 Thread vioravis
Thanks a lot for the guidance. I have another text file with a time stamp and
an empty column as given below:


First line: Skip this line 
Second line: skip this line 
Third line: skip this line 
variable1 Variable2 Variable3 Variable4 
Unit1 Unit2 Unit3 
11/1/2004 0:00  0.1 0.001 
11/1/2004 0:10  0.2 0.002 
11/1/2004 0:20  0.3 0.003 
11/1/2004 0:30  0.4 0.004 


This is space separated text file. When I use the following code:

head - readLines(testInput.txt, n=4)[4] 
dat - read.table(testInput.txt, skip=5, sep=,fill = TRUE,
stringsAsFactors=FALSE) 
names(dat) - unlist(strsplit(head,  ))

I get the following output:

 str(dat)
'data.frame':   4 obs. of  4 variables:
 $ variable1: chr  11/1/2004 11/1/2004 11/1/2004 11/1/2004
 $ Variable2: chr  0:00 0:10 0:20 0:30
 $ Variable3: num  0.1 0.2 0.3 0.4
 $ Variable4: num  0.001 0.002 0.003 0.004

Variable1's date and time gets split as Variable1 and Variable2 whereas they
should both be part of Variable1.

Also, the empty column is missing from the data frame.

Is there a way to handle these two cases? 

Thank you.

Ravi


--
View this message in context: 
http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830p4636129.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Skipping lines and incomplete rows

2012-07-10 Thread vioravis
Thanks a lot Rui and Arun.

The methods work fine with the data I gave but when I tried the two methods
with the following semi-colon separated data using sep = ;. Only the first
3 columnns are read properly rest of the columns are either empty or NAs.


**
Remove this line
Remove this line
Remove this line
Time;Actual Speed;Actual Direction;Temp;Press;Value1;Value2
;[m/s];[°];°C;[hPa];[MWh];[MWh]
1/1/2012;0.0;0;#N/A;#N/A;0.;0.
1/2/2012;0.0;0;#N/A;#N/A;0.;0.
1/3/2012;0.0;0;#N/A;#N/A;1.5651;2.2112
1/4/2012;0.0;0;#N/A;#N/A;1.;2.
1/5/2012;0.0;0;#N/A;#N/A;3.2578;7.5455
***

I used the following code:
dat1-read.table(testInput.txt,sep=;,skip=3,fill=TRUE,header=TRUE) 
dat1-dat1[-1,] 
row.names(dat1)-1:nrow(dat1)

Could you please let me know what is wrong with this approach? 

Thank you.

Ravi

--
View this message in context: 
http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830p4635952.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Skipping lines and incomplete rows

2012-07-09 Thread vioravis
I have a text file that has semi-colon separated values. The table is nearly
10,000 by 585. The files looks as follows:

***
First line: Skip this line
Second line: skip this line
Third line: skip this line
variable1 Variable2 Variable3 Variable4
   Unit1 Unit2 Unit3
10  0.1   0.01   0.001
20  0.2   0.02   0.002 
30  0.3   0.03   0.003
40  0.4   0.04   0.004
***

The first three lines need to be skipped. Moreover, line 5 doesn't have
units for all the variables and hence, has to be skipped as well.
Effectively, I want the following to be read to a dataframe skipping rows 1,
2, 3 and 5.

***
variable1 Variable2 Variable3 Variable4
10  0.1   0.01   0.001
20  0.2   0.02   0.002 
30  0.3   0.03   0.003
40  0.4   0.04   0.004
***

I tried using read.table with skip for line 1-3 as follows 

inputData - read.table(test.txt,sep = ;,skip = 3)

but the line 4 is creating problem with the following error:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
: 
  line 3 did not have 585 elements

Can someone help me with this?

Thank you.

Ravi

--
View this message in context: 
http://r.789695.n4.nabble.com/Skipping-lines-and-incomplete-rows-tp4635830.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Large Test Datasets in R

2012-06-24 Thread vioravis
I am looking for some large datasets (10,000 rows  100,000 columns or vice
versa) to create some test sets.  I am not concerned about the invidividual
elements since I will be converting them to binary (0/1) by using arbitrary
thresholds.

Does any R package provide such big datasets?

Also, what is the biggest text document collection available in R? tm
package seems to provide only 20 records from the Reuters dataset. Is there
any package that has 10,000+ documents??

Would appreciate any help on these.

Thank you.

Ravi

--
View this message in context: 
http://r.789695.n4.nabble.com/Large-Test-Datasets-in-R-tp4634330.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] htmlParse Error

2012-05-21 Thread vioravis
I am trying to parse a webpage using the htmlParse command in XML package as
follows:

library(XML)
u = http://en.wikipedia.org/wiki/World_population;
doc = htmlParse(u)

I get the following error:

Error in htmlParse(u) : 
  error in creating parser for http://en.wikipedia.org/wiki/World_population

I am using a R 2.13.1 (32 bit version) on a 64 bit Windows. (I tried
installing it in 64 bit version of R but getting an error that the previous
version cannot be removed)

Can someone please help with how to resolve this issue? 

Thank you.

Ravi


--
View this message in context: 
http://r.789695.n4.nabble.com/htmlParse-Error-tp4630738.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Word Count

2012-04-10 Thread vioravis
I have a sentence like the following:

sentence - Part 1 is working, Part 2 is not working and Part 3 is working

I would like th get the total count of working and not working as Working =
2 and Not Working = 1.

Can someone help with how can this be done in R??? Thank you.

Ravi



--
View this message in context: 
http://r.789695.n4.nabble.com/Word-Count-tp4544970p4544970.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error in prune in the rEMM package

2012-03-15 Thread vioravis
I am trying to use rEMM package for the Extensible Markov Models. I tried the
following sequence of code:

emmt=EMM(measure=euclidean,threshold=0.75,lambda=0.001)
emmt=build(emmt,data)
new_threshold=sum(cluster_counts(emmt))*0.002
emmt_ new=prune(emmt,new_threshold)

However, I get the following error when I run the last line of the code:

Error in remove_clusters(x, rare_clusters(x, count_threshold =
count_threshold)) : 
  (subscript) logical subscript too long
In addition: Warning message:
In smc_removeState(x@tracds_d$mm, to_remove) :
  State 7210325432838344346362367369370376377390412425440445483489499
does not exist!
 
I am unable to provide the data that I used since it is confidential. It
would be great if someone can still help with the issue???

Thank you.

Ravi



--
View this message in context: 
http://r.789695.n4.nabble.com/Error-in-prune-in-the-rEMM-package-tp4474200p4474200.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Conditionally adding a constant

2012-01-02 Thread vioravis
I am trying to add a constant to the previous value of a variable based on
certain conditions. Maybe there is a simple way to do this that I am missing
completely. I have given an example below:

df - data.frame(x = c(1,2,3,4,5), y = c(10,20,30,NA,NA))

 df
  x  y
1 1 10
2 2 20
3 3 30
4 4 NA
5 5 NA

I want to add 2 to the previous value of y, if x exceeds 3 (also will have
to handle NAs in the process). The resulting output would look like:

  x  y
1 1 10
2 2 20
3 3 30
4 4 32
5 5 34

Can someone please explain how to do it? Thank you.

Ravi










--
View this message in context: 
http://r.789695.n4.nabble.com/Conditionally-adding-a-constant-tp4253049p4253049.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] HTML Forms to R

2011-12-06 Thread vioravis
I have currently a R function that reads a csv file, does some computations,
produces some plots and writes a csv file as output. I would like to use
HTML forms to make a user interface for calling appropriate parts of the
functions (reading csv file, doing computations, displaying plots and
writing csv files). 

Are there are tutorials available that would help me get started?? 

Thank you.

Ravi

--
View this message in context: 
http://r.789695.n4.nabble.com/HTML-Forms-to-R-tp4164360p4164360.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sequential Sum in R

2011-12-06 Thread vioravis
I am trying to code the following excel formula in R. 

ab  cResultFormula
1   10 0.1 #N/A 
IF(B220,NA(),C2+IF(ISERROR(D1),0,D1))
2   20 0.2 0.2
IF(B320,NA(),C3+IF(ISERROR(D2),0,D2))
3   30 0.3 0.5
IF(B420,NA(),C4+IF(ISERROR(D3),0,D3))
4   40 0.4 0.9
IF(B520,NA(),C5+IF(ISERROR(D4),0,D4))
5   50 0.5 1.4
IF(B620,NA(),C6+IF(ISERROR(D5),0,D5))
6   60 0.6 2   
IF(B720,NA(),C7+IF(ISERROR(D6),0,D6))
7   70 0.7 2.7
IF(B820,NA(),C8+IF(ISERROR(D7),0,D7))
8   80 0.8 3.5
IF(B920,NA(),C9+IF(ISERROR(D8),0,D8))
9   90 0.9 4.4
IF(B1020,NA(),C10+IF(ISERROR(D9),0,D9))
10100 1   5.4
IF(B1120,NA(),C11+IF(ISERROR(D10),0,D10))


The variable Result is obtained using the excel formula shown next to it.
Column D contains the Result.

dataFrame - data.frame(a = seq(1:10),b = seq(10,100,by = 10),c =
seq(0.1,1,by = 0.1))

Can someone please help me as how to calculate the sequential sum in R given
by the excel formula??

Thank you.

Ravi

--
View this message in context: 
http://r.789695.n4.nabble.com/Sequential-Sum-in-R-tp4165916p4165916.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Spatial Statistics using R

2011-11-17 Thread vioravis
Thanks, Raphael. Just checked their website. It appears that they currently
do not have any online courses planned. 

--
View this message in context: 
http://r.789695.n4.nabble.com/Spatial-Statistics-using-R-tp4079092p4079574.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Spatial Statistics using R

2011-11-17 Thread vioravis
Thanks a lot for the guidance. I will take a look at these options.

Ravi

--
View this message in context: 
http://r.789695.n4.nabble.com/Spatial-Statistics-using-R-tp4079092p4082354.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Spatial Statistics using R

2011-11-16 Thread vioravis
I am looking for online courses to learn Spatial Statistics using R.
Statistics.com is offering an online course in December on the same topic
but that schedule doesn't suit mine. Are there any other similar modes for
learning spatial statistics using R??? Can someone please advice???

Thank you. 

Ravi

--
View this message in context: 
http://r.789695.n4.nabble.com/Spatial-Statistics-using-R-tp4079092p4079092.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Automatic Labeling of Document Clusters

2011-11-14 Thread vioravis
I am performing document clustering on a set of documents using R. I
performed hierarchical clustering using hclust and have identified the
cluster corresponding to each data point. I would like to lablel each
cluster automatically in order to identify the top keywords associated with
each cluster. This would help me in validating the clusters.

Are there any packages in R that helps us do automatic labelling of
clusters???

A few clustering labeling methods are given here:

http://en.wikipedia.org/wiki/Cluster_labeling
http://erulemaking.ucsur.pitt.edu/doc/papers/dgo06-labeling.pdf


Thanks you.

Ravi

--
View this message in context: 
http://r.789695.n4.nabble.com/Automatic-Labeling-of-Document-Clusters-tp4038849p4038849.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Removing numbers from a list

2011-11-10 Thread vioravis
I am using gsub to remove numbers for each element of a list. Code is given
below.

  testList - list(this contains a number 1000,this does not contain)
  removeNumbers - function(X)
  {
gsub(\\d,,X) 
  }
  outputList - lapply(testList,removeNumbers)

However, when I try to find the number of words in outputList as follows

  outLength - lapply(strsplit(outputList, ),length)

it throws out the following error:

  Error in strsplit(outputList,  ) : non-character argument


Can someone help me with this? 

Thank you.

Ravi

--
View this message in context: 
http://r.789695.n4.nabble.com/Removing-numbers-from-a-list-tp4023074p4023074.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Min Frequency in findFreqTerms

2011-11-09 Thread vioravis
I am using 'tm' package for text mining. I use the function findFreqTerms to
obtain the frequent words based on their frequency in the term document
matrix.

The following is the example given in the help page of this function:

library(tm)
data(crude)
tdm - TermDocumentMatrix(crude)
findFreqTerms(tdm, 2, 3)

The first three columns of the document term matrix are shown below:

(bpd)   (bpd).  (gcc)
0   0   0
0   0   0
0   0   0
0   0   0
0   0   0
1   0   0
0   0   0
0   0   0
0   0   0
1   0   0
1   0   0
0   0   1
0   0   0
0   1   0
0   0   0
0   0   0
0   0   0
0   0   0
0   0   0
0   0   0


The first term (bpd) has a frequency of 3 whereas the second and third
terms have a frequency of 1 which is below the lowfreq = 2 specified. 

Can someone help me whether this is the right way of interpreting this
function??? If so, is there a bug in the package??

Thank you.

Ravi





--
View this message in context: 
http://r.789695.n4.nabble.com/Min-Frequency-in-findFreqTerms-tp4019143p4019143.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reading stopwords from a csv file

2011-10-04 Thread vioravis
I am using the tm package to do text miniing:

I have a huge list of stopwords (2000+) that are in a csv file. I read it as
follows:

stopwordlist - read.csv(stopwords to be Removed 10042011.csv)
myStopwords - as.character(stopwordlist$stopwords)

When try removing the stopwords using 

tr1=tm_map(tr1,removeWords,myStopwords)

I am getting the following error:

Error in gsub(sprintf(\\b(%s)\\b, paste(words, collapse = |)), ,  : 
  internal error in compiling regexp

However, this works fine when I define myStopwords = c() instead of
reading from the csv file.

Can someone please help me to resolve this issue?

Thank you.

Ravi

--
View this message in context: 
http://r.789695.n4.nabble.com/Reading-stopwords-from-a-csv-file-tp3871697p3871697.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading stopwords from a csv file

2011-10-04 Thread vioravis
The following for loops does the work but it takes a good 30 minutes to run:

for(i in 1:length(myStopwords))
{
  currentWord - myStopwords[i]
  tr1=tm_map(tr1,removeWords,currentWord)
}

Are there any faster alternatives?? Thank you.

Ravi



--
View this message in context: 
http://r.789695.n4.nabble.com/Reading-stopwords-from-a-csv-file-tp3871697p3871864.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] SVD Memory Issue

2011-09-13 Thread vioravis
I am trying to perform Singular Value Decomposition (SVD) on a Term Document
Matrix  I created using the 'tm' package. Eventually I want to do a Latent
Semantic Analysis (LSA).

There are 5677 documents with 771 terms (the DTM is 771 x 5677). When I try
to do the SVD, it runs out of memory. I am using a 12GB Dual core Machine
with Windows XP and don't think I can increase the memory anymore. Are there
any other memory efficient methods to find the SVD?

The term document is obtained using:

tdm2 -
TermDocumentMatrix(tr1,control=list(weighting=weightTf,minWordLength=3))
str(tdm2)

List of 6
 $ i   : int [1:6438] 202 729 737 278 402 621 654 718 157 380 ...
 $ j   : int [1:6438] 1 2 3 7 7 7 7 8 10 10 ...
 $ v   : num [1:6438] 8 5 6 9 5 7 5 6 5 7 ...
 $ nrow: int 771
 $ ncol: int 5677
 $ dimnames:List of 2
  ..$ Terms: chr [1:771] access accessori accumul acoust ...
  ..$ Docs : chr [1:5677] 1 2 3 4 ...
 - attr(*, class)= chr [1:2] TermDocumentMatrix simple_triplet_matrix
 - attr(*, Weighting)= chr [1:2] term frequency tf

SVD is calcualted using:

 tdm_matrix - as.matrix(tdm2)
 svd_out-svd(tdm_matrix)

Error: cannot allocate vector of size 767.7 Mb
In addition: Warning messages:
1: In matrix(0, n, np) :
  Reached total allocation of 3583Mb: see help(memory.size)
2: In matrix(0, n, np) :
  Reached total allocation of 3583Mb: see help(memory.size)
3: In matrix(0, n, np) :
  Reached total allocation of 3583Mb: see help(memory.size)
4: In matrix(0, n, np) :
  Reached total allocation of 3583Mb: see help(memory.size)


Thank you.

Ravi



--
View this message in context: 
http://r.789695.n4.nabble.com/SVD-Memory-Issue-tp3809667p3809667.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] findFreqTerms vs minDocFreq in Package 'tm'

2011-09-12 Thread vioravis
I am using 'tm' package for text mining and facing an issue with finding the
frequently occuring terms. From the definition it appears that findFreqTerms
and minDocFreq are equivalent commands and both tries to identify the
documents with terms appearing more than a specified threshold. However, I
am getting drastically different results with both. I have given the results
from both the commands below:

findFreqTerms identifies 3140 words that appear more than 5 times but
minDocFreq identifies only 659 terms. Can someone please explain the reason
for the different or whether I have misunderstood their definitions??


tdm1 - TermDocumentMatrix(tr1,control=list(weighting=weightBin))
 freq_terms - findFreqTerms(tdm1, lowfreq =5, highfreq = Inf) 
 str(freq_terms)
 chr [1:3140] abc abil abl abnorm abort absenc ...


 tdm2 - TermDocumentMatrix(tr1,control=list(minDocFreq=5,minWordLength=1))
 str(tdm2)
List of 6
 $ i   : int [1:4703] 173 616 624 241 350 534 563 609 129 333 ...
 $ j   : int [1:4703] 1 2 3 7 7 7 7 8 10 10 ...
 $ v   : num [1:4703] 7 5 6 9 5 7 5 5 5 7 ...
 $ nrow: int 659
 $ ncol: int 5677
 $ dimnames:List of 2
  ..$ Terms: chr [1:659] \024 \026 ac access ...
  ..$ Docs : chr [1:5677] 1 2 3 4 ...
 - attr(*, class)= chr [1:2] TermDocumentMatrix simple_triplet_matrix
 - attr(*, Weighting)= chr [1:2] term frequency tf


Thank you.

Ravi



--
View this message in context: 
http://r.789695.n4.nabble.com/findFreqTerms-vs-minDocFreq-in-Package-tm-tp3806644p3806644.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] findFreqTerms vs minDocFreq in Package 'tm'

2011-09-12 Thread vioravis
Thanks, Bettina.

--
View this message in context: 
http://r.789695.n4.nabble.com/findFreqTerms-vs-minDocFreq-in-Package-tm-tp3806644p3808134.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Distance between a vector and matrix rows

2011-08-08 Thread vioravis
I am trying to find the distance between a vector and each row of a
dataframe. I am using the function distancevector in the package hopach
as follows:

mydata-as.data.frame(matrix(c(1,1,1,1,0,1,1,1,1,0),nrow=2))

 V1 V2 V3 V4 V5
1  1  1  0  1  1
2  1  1  1  1  0
vec - c(1,1,1,1,1)
d2-distancevector(mydata,vec,d=euclid)

The Euclidean distance between the two rows of the data frame to the vector
should be 1. But I am getting 0.4472136 for both.

Can someone please point out the reason for the discrepancy???

Also, are there other packages for calculating the distance between a binary
vector and all the rows of a data frame (contains only binary values)? 

Thank you.

Ravi

--
View this message in context: 
http://r.789695.n4.nabble.com/Distance-between-a-vector-and-matrix-rows-tp3726268p3726268.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Distance between a vector and matrix rows

2011-08-08 Thread vioravis
Thank you both for your reply. I went with the cosine function for similarity
and used it with apply to get a measure of distance.

Ravi

--
View this message in context: 
http://r.789695.n4.nabble.com/Distance-between-a-vector-and-matrix-rows-tp3726268p3726610.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Extracting certain text using tm package

2011-06-27 Thread vioravis
I have used tm package to import a set of text documents using the
following command:

text - Corpus(DirSource(.),readerControl = list(language =ansi))

I would like to extract only a certain portion of the text in each document
using certain keywords. For example, I would like to include all the text
between key words Start Text and End Text. All the remaining text should
be discarded. Is there anyway to accomplish this in 'tm' package???

Also, is there a quick way to remove all the HTML tags from the text???

Thank you.

Ravi





--
View this message in context: 
http://r.789695.n4.nabble.com/Extracting-certain-text-using-tm-package-tp3627063p3627063.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R program writing standard/practices

2011-06-10 Thread vioravis
Check this out:

http://www1.maths.lth.se/help/R/RCC/

--
View this message in context: 
http://r.789695.n4.nabble.com/R-program-writing-standard-practices-tp3588716p3588911.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Checking for combination of words in a sentence

2011-06-03 Thread vioravis
I am trying to implement some expert rules based on the presence or absence
of words in a sentence. I have given a reproducible example below. In this,
every time I come across the words lunch and bag in the same sentence, the
outcome would be 1. If lunch and pack are in the same sentence, then the
outcome would be 2. If only lunch is present, the outcome would be 3. There
is no guarantee that these two words (lunch/bag or lunch/pack) will be next
to each other in the sentence. I tried to implement this using regexpr but
the last rule (lunch - 3) supersedes the other two rules. This works fine
only if I have lunch only as the first rule following by the other two. 

Is there a way to make sure outcome will be 3 if only lunch is present? (I
have hundreds of rules; Hence, finding out the correct order manually is not
possible(



keyWord - c(lunch bag,lunch pack,lunch)
outcome - c(1,2,3)

expertRules- data.frame(keyWord = keyWord, outcome = outcome)


testWords - c(lunch pack,lunch,lunch,lunch bag,lunch pack)
predictedOutcome - c(NA,NA,NA,NA,NA)

testDf - data.frame(testWords = testWords, 
 predictedOutcome = predictedOutcome)

for(i in 1:nrow(expertRules))
{

  testDf$predictedOutcome -
ifelse((regexpr(expertRules[i,1],testDf$testWords)0),
 expertRules[i,2],
 testDf$predictedOutcome)
}


 testDf
   testWords predictedOutcome
1 lunch pack3
2  lunch3
3  lunch3
4  lunch bag3
5 lunch pack3



Thank you.

Ravi

--
View this message in context: 
http://r.789695.n4.nabble.com/Checking-for-combination-of-words-in-a-sentence-tp3570104p3570104.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] append date to write csv filename

2011-06-03 Thread vioravis
You could use the paste function to define the filename with date appended to
it. See the example below:

currentDate - Sys.Date()
csvFileName - paste(C:/R/Remake/XPX,currentDate,.csv,sep=)
write.csv(S1X.sub, file=csvFileName) 


--
View this message in context: 
http://r.789695.n4.nabble.com/append-date-to-write-csv-filename-tp3570379p3570420.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Text Summarization

2011-05-31 Thread vioravis
Is there a text mining/ NLP package in R that could do text summarization?
For example, take a huge text as input and provide a summary of the text. 

In package tm, summarization is defined more as high frequency terms which
is not what I want. I actually want a summary of what is present in the huge
volume of text.

Any help on a R package would be helpful. Thank you.

Ravi

--
View this message in context: 
http://r.789695.n4.nabble.com/Text-Summarization-tp3562735p3562735.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using read.xls

2011-05-26 Thread vioravis
I am using read.xls command from the gdata package. I get the following error
when I try to read a work sheet from an excel sheet. 

Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method,  : 
  Intermediate file 'C:\Tmp\RtmpYvLnAu\file7f06650f.csv' missing!
In addition: Warning message:
running command 'C:\Apps\Perl\bin\perl.exe C:/Program
Files/R/R-2.13.0/library/gdata/perl/xls2csv.pl excelFileName.xls
C:\Tmp\RtmpYvLnAu\file7f06650f.csv Test Sheet' had status 5 
Error in file.exists(tfn) : invalid 'file' argument

However, the same command works fine with another excel file stored in the
same directory. 

Could you please let me know what is causing this problem??

Thank you.

--
View this message in context: 
http://r.789695.n4.nabble.com/Using-read-xls-tp3552122p3552122.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fortran DLL in Spotfire

2011-05-24 Thread vioravis
I have a R code that loads a FORTRAN DLL to do some calculations. The code
works fine when I use it in R. But when I try it in spotfire it throws an
error that the it is unable to load the shared library and the specified DLL
cannot be found. I have used setwd to point to the location in the
spotfire statistical services server library. Is this the correct way to
call the DLL in spotfire??? 

I would appreciate any inputs on this.

Thank you.

Ravi



--
View this message in context: 
http://r.789695.n4.nabble.com/Fortran-DLL-in-Spotfire-tp3547779p3547779.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Building Custom GUIs for R

2011-05-20 Thread vioravis
I am looking to build simple GUIs based on the R codes I have. The main
objective is to hide the scary R codes from non-programming people and make
it easier for them to try out different inputs.

For example, 

1. The GUI will have means to upload a csv file which will be read by the R
code. 

2. A button to preprocess data (carried out by a R function behind)

3. A button to build some models and run simulations

4. Space to display visual charts based on the simulations results

5. Option to save the results to a csv file or something similar.

Are there any tools currently available that enable us build GUIs??? (MATLAB
has a GUI builder that enables the users build custom GUIs). 

Can we make a exe of such GUI (with the R code) and let people use it
without having to install R???

Any help on this would be much appreciated??

Thank you.

Ravi







--
View this message in context: 
http://r.789695.n4.nabble.com/Building-Custom-GUIs-for-R-tp3537794p3537794.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Building Custom GUIs for R

2011-05-20 Thread vioravis
Thanks everyone. I will try out the packages you have mentioned.

Ravi

--
View this message in context: 
http://r.789695.n4.nabble.com/Building-Custom-GUIs-for-R-tp3537794p3538539.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Total effect of X on Y under presence of interaction effects

2011-05-12 Thread vioravis
This is what I believe is referred to as supression in regression, where
the correlation correlation between the independent and the dependent
variable turns out to be of one sign whereas the regression coefficient
turns out to be of the opposite sign. 

Read here about supression:

http://www.uvm.edu/~dhowell/gradstat/psych341/lectures/MultipleRegression/multreg3.html

HTH



--
View this message in context: 
http://r.789695.n4.nabble.com/Total-effect-of-X-on-Y-under-presence-of-interaction-effects-tp3514137p3516446.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fortran Symbol Name not in Load Table

2011-05-09 Thread vioravis
I am trying to call a FORTRAN subroutine from R. is.loaded is turning out to
be TRUE. However when I run my .Fortran command I get the following error: 

Error in .Fortran(VALUEAHROPTIMIZE, as.double(ahrArray),
as.double(kwArray),  : 
  Fortran symbol name valueahroptimize not in load table 


I have given the FORTRAN declaration below: 

subroutine VALUEAHROPTIMIZE(AHR, 
KW, 
min_IHR_delta,   
max_AHR_error,   
AHR_out,!! Output AHE array 
IHR_out,!! Output IHR array 
Errors_out) 
  ! Expose subroutine my_function to users of this DLL 
  !DEC$ ATTRIBUTES
DLLEXPORT,C,REFERENCE,ALIAS:'VALUEAHROPTIMIZE_'::VALUEAHROPTIMIZE 
  
  ! Body of my_function 
  
  Implicit None 
  
  Integer *4  IERR, iSum 
  DOUBLE PRECISION min_IHR_delta, max_AHR_error 
  logical switch_AHR_tuner 
  character * 512 AHR_tuner_FilePath 
  
  !!DOUBLE PRECISION  AHR(500), kW(500)  !! Initial Array for reading
Namelist 
  
  DOUBLE PRECISION  AHR(*), kW(*)  !! Initial Array for reading Namelist 
  DOUBLE PRECISION  AHR_out(*), IHR_out(*) 
  integer Errors_out(*) 


The R code I tried using is given below: 

ahrArray - runif(147) 
kwArray - runif(147) 
outputAHR - c(rep(0,11*11)) 
outputIHR - c(rep(0,11*11)) 
outputError - c(rep(NA,11)) 

dyn.load(my_function.dll) 
is.loaded(VALUEAHROPTIMIZE) 

[1] TRUE 

 .Fortran(VALUEAHROPTIMIZE, 
  as.double(ahrArray), 
  as.double(kwArray), 
  as.double(0.0005), 
  as.double(5), 
  as.double(outputAHR), 
  as.double(outputIHR), 
  as.integer(outputError)) 

Can someone please help me with how to fix this issue? Thank you. 

Ravi  



--
View this message in context: 
http://r.789695.n4.nabble.com/Fortran-Symbol-Name-not-in-Load-Table-tp3509852p3509852.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fortran Symbol Name not in Load Table

2011-05-09 Thread vioravis
I used the DLL export viewer to what is the table name being exported. It is
showing as VALUEAHROPTIMIZE_. This is the name of the function we have used
plus the underscore. 

Is there any other reason for the function not getting recognized??? Thanks.

--
View this message in context: 
http://r.789695.n4.nabble.com/Fortran-Symbol-Name-not-in-Load-Table-tp3509852p3510761.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] SQP with Constraints

2011-05-09 Thread vioravis
I am trying to optimize  function similar to the following:

Minimize x1^2 - x2^2 - x3^2

st x1  x2
x2  x3

The constraint is that the variables should be monotonically increasing. Is
there any package that implements Sequential Quadratic Programming with
ability include these constraints??? 

Thanks you.

Ravi

--
View this message in context: 
http://r.789695.n4.nabble.com/SQP-with-Constraints-tp3510857p3510857.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] histograms and for loops

2011-05-06 Thread vioravis
This should work!!

for(i in 1:12){
xLabel - paste(Graph,i) 
plotTitle - paste(Graph,i,.jpg) 
jpeg(plotTitle)
print(hist(zNort1[,i], freq=FALSE, xlab=xLabel, col=blue,
main=Standardized Residuals Histogram, ylim=c(0,1), xlim=c(-3.0,3.0)),axes
= FALSE)
axis(1, col = blue,col.axis = blue)
axis(2, col= red,col.axis = red)
zNortmin-min(zNort1[,1]) 
zNortmax-max(zNort1[,1]) 
zNortmean-mean(zNort1[,1]) 
zNortsd-sd(zNort1[,1]) 
X1-seq(-3.0, 3.0, by=.01) 
lines(X1, dnorm(X1, zNortmean, zNortsd) , col=black) 
dev.off()
} 
 

--
View this message in context: 
http://r.789695.n4.nabble.com/histograms-and-for-loops-tp3503648p3503758.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Loading a FORTRAN DLL

2011-05-03 Thread vioravis
I have a FORTRAN DLL file obtained from Compaq Visual Fortran and when I try
to load the DLL into the R environment I get an error. 

 dyn.load(my_function.dll)

This application has failed to start because MSCVRTD.dll was not found.
Re-installing this application may fix the problem.

When I tried it again, the above error doesn't appear anymore. Instead, I
get the following error:

Error in inDL(x, as.logical(local), as.logical(now), ...) : 
  unable to load shared library 'D://my_function.dll':
  LoadLibrary failure:  The specified module could not be found.

Do I need to have FORTRAN installed to be able to run the DLL file??? Can
someone please help me with what is causing this error???

Thank you.

Ravi





--
View this message in context: 
http://r.789695.n4.nabble.com/Loading-a-FORTRAN-DLL-tp3493263p3493263.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fitting gamma and exponential Distributions with fitdist

2011-04-28 Thread vioravis
Joshua, thanks for your reply.

I have tried out the following scaling and it seems to work fine:

scaledVariable - (test-min(test)+0.001)/(max(test)-min(test)+0.002)  

The gamma distribution parameters are obtained using the scaled variable and
samples obtained from this distributions are scaled back using:

scaled - (randomSamples*(max(test) - min(test) + 0.002)) + min(test) -
0.001

Is there a better way to scale the variable???  I would prefer fitting a
distribution without scaling it.

Thank you.

Ravi

--
View this message in context: 
http://r.789695.n4.nabble.com/Fitting-gamma-and-exponential-Distributions-with-fitdist-tp3477391p3480265.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fitting gamma and exponential Distributions with fitdist

2011-04-28 Thread vioravis
I tried using JMP for the same and get two distinct recommendations when
using the unscaled values. 

When using the unscaled values, Log Normal appears to be best fit. fitdist
in R is unable to provide a fit in this case.

Compare Distributions
 
ShowDistributionNumber of Parameters -2*LogLikelihood   AICc
   
X   LogNormal   2   1016.29587  1020.50639 
Johnson Sl  3   1015.21183  1021.6404  
GLog3   1016.29587  1022.72444 
Exponential 1   1021.58662  1023.65559 
Johnson Su  4   1015.21183  1023.9391  
Gamma   2   1021.02475  1025.23528 
Weibull 2   1021.50762  1025.71815 
Extreme Value   2   1021.50762  1025.71815 
Normal 2 Mixture5   1042.55455  1053.66566 
Normal 3 Mixture8   1042.74433  1061.56786 
Normal  2   1082.36992  1086.58045   


However, when using the scaled values, Gamma appears to be best fit. I am
getting the same using R as well.

Compare Distributions
 
ShowDistributionNumber of Parameters -2*LogLikelihood   AICc
   
X   Gamma   2   -114.92911  -110.71858 
Weibull 2   -113.54302  -109.3325  
Extreme Value   2   -113.54302  -109.3325  
Exponential 1   -108.01019  -105.94122 
Johnson Sl  3   -104.69191  -98.263335 
Johnson Su  4   -104.69191  -95.964634 
GLog3   -102.35037  -95.921798 
LogNormal   2   -70.727608  -66.517082 
Normal 2 Mixture5   -77.349192  -66.238081 
Normal 3 Mixture8   -77.159407  -58.335878 
Normal  2   -37.533813  -33.323287   


What is the difference between the MLE methods in JMP and R??? Is it
advisable to go with the scaled values in R???

Thank you.

Ravi

--
View this message in context: 
http://r.789695.n4.nabble.com/Fitting-gamma-and-exponential-Distributions-with-fitdist-tp3477391p3480422.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fitting gamma and exponential Distributions with fitdist

2011-04-27 Thread vioravis
I am trying to fit gamma and exponential distributions using fitdist function
in the fitdistrplus package to the data I have and obtain the parameters
along with the AIC values of the fit. However, I am getting errors with both
distributions. I have given an reproducible example with the errors I am
getting below. Can someone please let me know how to overcome this issue???

library(fitdistrplus)
test - (895.13582915.7447,335.5472,1470.4022,194.5461,1814.2328,
1056.3067,3110.0783,11441.8656,142.1714,2136.0964,1958.9022,
891.89,352.6939,1341.7042,167.4883,2502.0528,1742.1306,
837.1481,867.8533,3590.4308,1125.9889,1200.605,4321.0011,
1873.9706,323.6633,1912.3147,865.6058,2870.8592,236.7214,
580.2861,350.9269,6842.4969,1886.2403,265.5094,199.9825,
1215.6197,7241.8075,2381.9517,3078.1331,5461.3703,2051.3997,
751.6575,714.3536,598.4539,425.6656,215.2103,608.785,
369.4744,2398.6506,918.6844,525.6925,2549.3694,4108.8983,
2824.0758,1068.7508,249.995,3863.9839,1152.1506,531.6844)

fitdist(test,gamma,method =mle)

Error in fitdist(test, gamma, method = mle) : 
  the function mle failed to estimate the parameters, 
with the error code 100
In addition: Warning messages:
1: In dgamma(x, shape, scale, log) : NaNs produced
2: In dgamma(x, shape, scale, log) : NaNs produced
3: In dgamma(x, shape, scale, log) : NaNs produced
4: In dgamma(x, shape, scale, log) : NaNs produced
5: In dgamma(x, shape, scale, log) : NaNs produced
6: In dgamma(x, shape, scale, log) : NaNs produced
7: In dgamma(x, shape, scale, log) : NaNs produced
8: In dgamma(x, shape, scale, log) : NaNs produced
9: In dgamma(x, shape, scale, log) : NaNs produced


fitdist(test,exp,method =mle)

Error in fitdist(test, exp, method = mle) : 
  the function mle failed to estimate the parameters, 
with the error code 100
In addition: Warning message:
In dexp(x, 1/rate, log) : NaNs produced

Thank you.
Ravi

--
View this message in context: 
http://r.789695.n4.nabble.com/Fitting-gamma-and-exponential-Distributions-with-fitdist-tp3477391p3477391.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fitting gamma and exponential Distributions with fitdist

2011-04-27 Thread vioravis
There was a small error in the data creation step and have fixed it as below:

test - c(895.1358,2915.7447,335.5472,1470.4022,194.5461,1814.2328, 
1056.3067,3110.0783,11441.8656,142.1714,2136.0964,1958.9022, 
891.89,352.6939,1341.7042,167.4883,2502.0528,1742.1306, 
837.1481,867.8533,3590.4308,1125.9889,1200.605,4321.0011, 
1873.9706,323.6633,1912.3147,865.6058,2870.8592,236.7214, 
580.2861,350.9269,6842.4969,1886.2403,265.5094,199.9825, 
1215.6197,7241.8075,2381.9517,3078.1331,5461.3703,2051.3997, 
751.6575,714.3536,598.4539,425.6656,215.2103,608.785, 
369.4744,2398.6506,918.6844,525.6925,2549.3694,4108.8983, 
2824.0758,1068.7508,249.995,3863.9839,1152.1506,531.6844) 

Any help would be appreciated. Thank you.

--
View this message in context: 
http://r.789695.n4.nabble.com/Fitting-gamma-and-exponential-Distributions-with-fitdist-tp3477391p3480133.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Integrate na.rm in own defined functions

2011-04-20 Thread vioravis
This should work!!

rmse-function (x){ 
dquared-x^2 
sum1-sum(x^2,na.rm=TRUE) 
rmse-sqrt((1/length(x))*sum1) 
rmse}



--
View this message in context: 
http://r.789695.n4.nabble.com/Integrate-na-rm-in-own-defined-functions-tp3462492p3462615.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Optimzing a nested function

2011-04-12 Thread vioravis
I am trying to optimize a nested function using nlminb. This throws out an
error that y is missing. Can someone help me with the correct syntax?? Thank
you.

test1 - function(x,y)
{
  sum - x + y
  return(sum)
}

test2 - function(x,y)
{
  sum - test1(x,y)
  sumSq - sum*sum
  return(sumSq)
}

nlminb(start = c(1,1), test2,lower = c(0,0), upper = c(5,5))



--
View this message in context: 
http://r.789695.n4.nabble.com/Optimzing-a-nested-function-tp3443825p3443825.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] flow map lines between point pairs (latitude/longitude)

2011-03-17 Thread vioravis
I am working on a similar problem. I have to add two columns: one containing
the US state to which the origin belongs and another one to add the state in
to which destination belongs. All I have is the latitude and the longitude
of the origin and destination. Are there any packages in R that can do
this??? Thank you.

Ravi

--
View this message in context: 
http://r.789695.n4.nabble.com/flow-map-lines-between-point-pairs-latitude-longitude-tp860009p3383842.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Seasonality in STL Decomposition

2011-03-10 Thread vioravis
I having issues with interpreting the results of STL decomposition. The
following is the data used as well as the decompsed seasonality, trend and
the remainder components. It is a weekly data.

The original data doesn't appear to be seasonal. But there seems to be a
periodic peak in the seasonal component. Can some one please let me know how
to interpret the seasonality plot in this???

Also, what does the negative values mean in the seasonality and the
remainder components??? 

Thank you.

Ravi

input -
c(2152.787,1686.4335,1856.3705,1719.872,1755.1917,1705.3385,1701.6683,1610.2746,1554.6005,1463.85,
1429.2155,1453.1788,1385.1424,1346.8169,1425.896,1388.6765,1349.8714,1293.0909,1279.1281,1264.8579,
1270.9211,1278.4987,1271.3811,1270.448,1243.3399,1279.2172,1337.455,1316.6042,1288.9275,1287.403,
1290.0595,1317.4027,1365.4406,1302.6746,1254.4507,1274.5924,1281.1415,1282.0631,1249.1892,1312.9578,
1253.104,1256.7432,1253.802,1294.004,1252.422,1259.6932,1281.6017,1274.6644,1267.9903,1229.1065,
1259.2091,1275.5516,1220.9909,1217.682,1252.5307,1251.7036,1237.4322,1269.2433,1253.5857,1209.1975,
1188.7189,1203.7423,1267.6006,1208.4465,1219.0118,1214.3023,1209.4117,1203.1483,1214.807,1165.5629,
1189.984,1230.2438,1265.8107,1206.4665,1197.2052,1189.0741,1191.9008,1209.7856,1203.7599,1206.3015,
1164.3207,1197.648,1175.555,1188.6788,1206.7764,1210.7119,1209.0285,1205.9406,1202.4434,1222.4249,
1233.8498,1244.0629,1228.476,1235.1948,1264.1828,1242.6138,1241.0844,1313.3505,1317.6045,1260.7927,
1255.9414,1303.2353,1329.5308,1322.976,1324.9601,1366.406,1319.9496,1357.3928,1334.4911,1302.5362,
1336.5733,1293.3127,1366.4449,1364.4946,1339.0855,1371.0803,1337.266,1349.3965,1308.6867,1402.4016)

timeSeries - ts(input, start = c(2004, 20),frequency = 52)
stlDecompose - stl(timeSeries, s.window = periodic,s.degree = 1)
plot(stlDecompose)


Decomposition Results:

   seasonaltrend remainder
2004.365 215.915141 1585.077  3.517953e+02
2004.385  75.846356 1576.654  3.393327e+01
2004.404 131.292835 1568.231  1.568465e+02
2004.423 100.668748 1559.809  5.939475e+01
2004.442 102.720994 1551.386  1.010849e+02
2004.462  88.423662 1542.963  7.395170e+01
2004.481  95.695097 1534.540  7.143275e+01
2004.500  38.382598 1526.118  4.577424e+01
2004.519  39.744099 1517.695 -2.838678e+00
2004.538  16.208476 1509.559 -6.191717e+01
2004.558  19.836887 1501.422 -9.204370e+01
2004.577  21.128396 1493.286 -6.123552e+01
2004.596  -6.943227 1485.150 -9.306392e+01
2004.615 -15.429798 1477.013 -1.147665e+02
2004.635  -2.455302 1468.877 -4.052547e+01
2004.654  16.103626 1460.740 -8.816752e+01
2004.673 -43.781184 1452.604 -5.895142e+01
2004.692 -93.469553 1444.576 -5.801584e+01
2004.712 -84.916472 1436.549 -7.250401e+01
2004.731 -68.597741 1428.521 -9.506523e+01
2004.750 -44.458760 1420.493 -1.051133e+02
2004.769 -67.619804 1412.465 -6.634695e+01
2004.788 -73.086998 1404.438 -5.996964e+01
2004.808 -74.896842 1396.410 -5.106519e+01
2004.827 -84.315286 1388.382 -6.072713e+01
2004.846 -54.710480 1380.497 -4.656974e+01
2004.865 -25.880674 1372.613 -9.276842e+00
2004.885 -32.311518 1364.728 -1.581190e+01
2004.904 -64.416512 1356.843 -3.498703e+00
2004.923 -45.672884 1348.958 -1.588193e+01
2004.942 -52.548907 1341.073  1.535493e+00
2004.962 -29.473180 1333.188  1.368787e+01
2004.981   6.436797 1325.303  3.370069e+01
2005.000 -20.697564 1318.761  4.611184e+00
2005.019 -43.370325 1312.219 -1.439782e+01
2005.038 -32.562537 1305.677  1.478221e+00
2005.058 -28.755698 1299.135  1.076261e+01
2005.077 -16.133184 1292.592  5.603832e+00
2005.096 -24.686719 1286.050 -1.217440e+01
2005.115  14.475095 1279.508  1.897452e+01
2005.135 -21.074291 1272.966  1.212235e+00
2005.154 -15.042155 1269.978  1.807456e+00
2005.173  -1.165620 1266.990 -1.202212e+01
2005.192   9.004016 1264.002  2.099840e+01
2005.212 -11.698548 1261.013  3.107118e+00
2005.231  28.962729 1258.025 -2.729480e+01
2005.250  42.936605 1255.037 -1.637202e+01
2005.269  11.954682 1252.049  1.066076e+01
2005.288   7.084609 1249.061  1.184488e+01
2005.308  11.654557 1247.216 -2.976370e+01
2005.327  40.218506 1245.370 -2.637988e+01
2005.346  45.477254 1243.525 -1.345096e+01
2005.365 215.915141 1241.680 -2.366044e+02
2005.385  75.846356 1239.835 -9.799933e+01
2005.404 131.292835 1237.990 -1.167519e+02
2005.423 100.668748 1236.145 -8.510978e+01
2005.442 102.720994 1234.299 -9.958826e+01
2005.462  88.423662 1233.257 -5.243698e+01
2005.481  95.695097 1232.214 -7.432316e+01
2005.500  38.382598 1231.171 -6.035601e+01
2005.519  39.744099 1230.128 -8.115327e+01
2005.538  16.208476 1229.085 -4.155139e+01
2005.558  19.836887 1228.042  1.972135e+01
2005.577  21.128396 1227.000 -3.968141e+01
2005.596  -6.943227 1225.957 -1.638219e-03
2005.615 -15.429798 1226.144  3.587887e+00
2005.635  -2.455302 1226.332 -1.446475e+01
2005.654  16.103626 1226.519 -3.947463e+01
2005.673 -43.781184 1226.707  3.188134e+01
2005.692 -93.469553 1226.894  3.213806e+01
2005.712 -84.916472 1227.082  4.781853e+01
2005.731 -68.597741 1227.269 

Re: [R] Zero Inflated Distributions

2011-03-07 Thread vioravis
Any help on this would be appreciated. Thank you.

--
View this message in context: 
http://r.789695.n4.nabble.com/Zero-Inflated-Distributions-tp3334861p3338344.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Zero Inflated Distributions

2011-03-04 Thread vioravis
I am currently fitting the following distributions using JMP and looking for
ways to fit the same distributions in R:

Zero Inflated Lognormal
Zero Inflated Loglogistic
Zero Inflated Frechet
Zero Inflated Weibull
Threshold Frechet
Threshold Loglogistic
Threshold Lognormal
Log Generalized Gamma
Threshold Weibull
LEV
Logistic
Normal
SEV

Are there any packages that contain these distributions??? I am specifically
interested in the zero inflated distributions since the data I have contains
quite a bit of zeros.

Thank you.

Ravi

--
View this message in context: 
http://r.789695.n4.nabble.com/Zero-Inflated-Distributions-tp3334861p3334861.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Zero Inflated Distributions

2011-03-04 Thread vioravis
Thanks, Thierry.  

Has anyone used the bayescount for estimating zero inflated distributions?
It states that it is a crude function. Does that mean the estimates are
only approximate???

The example they have given seems to work only with Gamma Poisson.

data - rpois(100, rgamma(100, shape=1, scale=8))
data[1:15] - 0
maximise.likelihood(data, ZIGP)

However, when I tried fitting Gamma/LogNormal/Weibull (assuming that data is
continuous), it throws out the following error:


   shape   scale   zi
9.532   4   21  
 
Error in optim(c(shape, scale, zi), f6, control = list(fnscale = -1)) : 
  function cannot be evaluated at initial parameters

What is this error about???

Moreover, the function seems extremely slow. For the 100 data point example
considered, it takes around 8 seconds for the estimation. 

Please let me know your opinions on this package and alternative packages,
if any.

Thank you.

Ravi

--
View this message in context: 
http://r.789695.n4.nabble.com/Zero-Inflated-Distributions-tp3334861p3335122.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem with loading the Snowball package

2011-01-31 Thread vioravis

I tried using the Snowball package for performing stemming in text mining.
But when I tried to load the package the following error is thrown:

Error : .onLoad failed in loadNamespace() for 'Snowball', details:
  call: NULL
  error: .onLoad failed in loadNamespace() for 'rJava', details:
  call: hive[[hive$CurrentVersion]]
  error: attempt to select less than one element
Error: package/namespace load failed for 'Snowball'


Latest version of Java is installed in my system. I am not sure where the
problem is. Can someone help me on this? 

Thank you.

Ravi
 


-- 
View this message in context: 
http://r.789695.n4.nabble.com/Problem-with-loading-the-Snowball-package-tp3248487p3248487.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 3D Binning

2011-01-31 Thread vioravis

This worked fine. Thanks.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/3D-Binning-tp3236223p3248489.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] 3D Binning

2011-01-25 Thread vioravis

I am trying to do binning on three variables (3d binning). The bin boundaries
are specified by the user separately for each variable. I used the bin2
function in the 'ash' package for 2d binning that involves only two
variables but didn't any package for similar binning with three variables.
Are there any packages or codes available for 3d binning?? Thank you. 
 


-- 
View this message in context: 
http://r.789695.n4.nabble.com/3D-Binning-tp3236223p3236223.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] 3D Binning

2011-01-21 Thread vioravis

I am trying to do binning on three variables (3d binning). The bin boundaries
are specified by the user separately for each variable. I used the bin2
function in the 'ash' package for 2d binning that involves only two
variables but didn't any package for similar binning with three variables.
Are there any packages or codes available for 3d binning?? Thank you.

-- 
View this message in context: 
http://r.789695.n4.nabble.com/3D-Binning-tp3229137p3229137.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Which - value not present

2010-11-09 Thread vioravis

I am trying to use which function to obtain the index of a value in a
dataframe. Depending on whether the value is present in the dataframe or not
I am performing further operations to the dataframe.

However, if the value is not present in the dataframe, I am getting an
integer(0).

How do I check for integer(0)? something like is.na???

Thank you.

Ravishankar






-- 
View this message in context: 
http://r.789695.n4.nabble.com/Which-value-not-present-tp3035455p3035455.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Which - value not present

2010-11-09 Thread vioravis

Thank you. It works fine.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Which-value-not-present-tp3035455p3035575.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random Forest AUC

2010-10-22 Thread vioravis

Thanks Max and Andy. If the Random Forest is always giving an AUC of 1, isn't
it over fitting??? If not, how do you differentiate this from over
fitting??? I believe Random forests are claimed to never over fit (from the
following link).

http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#features


Ravishankar R
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Random-Forest-AUC-tp3006649p3008157.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Random Forest AUC

2010-10-21 Thread vioravis

Guys,

I used Random Forest with a couple of data sets I had to predict for binary
response. In all the cases, the AUC of the training set is coming to be 1.
Is this always the case with random forests? Can someone please clarify
this? 

I have given a simple example, first using logistic regression and then
using random forests to explain the problem. AUC of the random forest is
coming out to be 1.

data(iris)
iris - iris[(iris$Species != setosa),]
iris$Species - factor(iris$Species)
fit - glm(Species~.,iris,family=binomial)
train.predict - predict(fit,newdata = iris,type=response)  
library(ROCR)
plot(performance(prediction(train.predict,iris$Species),tpr,fpr),col =
red)
auc1 -
performance(prediction(train.predict,iris$Species),auc)@y.values[[1]]
legend(bottomright,legend=c(paste(Logistic Regression
(AUC=,formatC(auc1,digits=4,format=f),),sep=)),  
col=c(red), lty=1)


library(randomForest)
fit - randomForest(Species ~ ., data=iris, ntree=50)
train.predict - predict(fit,iris,type=prob)[,2]  
plot(performance(prediction(train.predict,iris$Species),tpr,fpr),col =
red)
auc1 -
performance(prediction(train.predict,iris$Species),auc)@y.values[[1]]
legend(bottomright,legend=c(paste(Random Forests
(AUC=,formatC(auc1,digits=4,format=f),),sep=)),  
col=c(red), lty=1)

Thank you.

Regards,
Ravishankar R
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Random-Forest-AUC-tp3006649p3006649.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.