[R] DTM Package removeSparseTerms function question

2014-01-16 Thread ramoss
IN inspect(removeSparseTerms(dtm, 0.4)) does anyone knows how the sparse term A numeric for the maximal allowed sparsity works? ie what is the difference between say 0.2, 0.4 0.6? Thanks for your help -- View this message in context:

[R] Package TM dataframes

2014-01-10 Thread ramoss
Hi, I am trying to use the package TM on a dataframe get the following error: complaints - tm_map(complaints, tolower) Error in UseMethod(tm_map, x) : no applicable method for 'tm_map' applied to an object of class data.frame Tm doesn't work on dataframes? My data frame consists of 1 text

[R] How do you transform a dataframe to a corpus?

2014-01-10 Thread ramoss
Hi; I have a data frame complains w/ dimensions 11335291 ( 1.13m obs 1 col) I am trying to transform it into a corpus using the following code: myCorpus -Corpus(VectorSource(complaints$text)) Error in .Source(readPlain, encoding, length(x), FALSE, names(x), 0, TRUE, : vectorized sources

Re: [R] How do you transform a dataframe to a corpus?

2014-01-10 Thread ramoss
The column length is 4000 bytes long if that helps. -- View this message in context: http://r.789695.n4.nabble.com/How-do-you-transform-a-dataframe-to-a-corpus-tp4683396p4683402.html Sent from the R help mailing list archive at Nabble.com. __

[R] Removing rows w/ smaller value from data frame

2013-05-23 Thread ramoss
Hello, I have a column called max_date in my data frame and I only want to keep the bigger values for the same activity. How can I do that? data frame: activitymax_dt A2013-03-05 B 2013-03-28 A 2013-03-28 C 2013-03-28 B 2013-03-01

[R] how to merge 2 data frame if you want to exclude mutual obs

2013-05-13 Thread ramoss
In the example below, I am merging 2 data frames I want everything in the first one(all) all2 - merge(all,spets, by.x=c(tdate,symbol), by.y=c(tdate,symbol),all.x=TRUE) What if I want to exclude everything in y? I tried below but doesn't seem to work. all2 - merge(all,spets, by.x=c(tdate,symbol),

Re: [R] how to merge 2 data frame if you want to exclude mutual obs

2013-05-13 Thread ramoss
To clarify: So if in data frame A you have TdatesymbolTA 12/12/12 AX 123 12/11/12 ZZA4R 12/12/12 WQ B8R Data frame B TdatesymbolTA 12/12/12 AX 123 12/11/12 ZZ

Re: [R] how to merge 2 data frame if you want to exclude mutual obs

2013-05-13 Thread ramoss
Thanks Adam your solution worked perfectly. Thank you all for your responses. -- View this message in context: http://r.789695.n4.nabble.com/how-to-merge-2-data-frame-if-you-want-to-exclude-mutual-obs-tp4666975p4666985.html Sent from the R help mailing list archive at Nabble.com.

[R] subsetting by is not

2013-05-09 Thread ramoss
Hello, I have a simple question: I know how to subset by is: buy1 - subset(buy,buybdge==badge) How do I subset if I don't want buybdge to equal badge? Thanks ahead for your help -- View this message in context: http://r.789695.n4.nabble.com/subsetting-by-is-not-tp4666706.html Sent from

Re: [R] subsetting by is not

2013-05-09 Thread ramoss
I want to clarify we are talking about 2 variables in a datframe here. -- View this message in context: http://r.789695.n4.nabble.com/subsetting-by-is-not-tp4666706p4666707.html Sent from the R help mailing list archive at Nabble.com. __

[R] Stat question: How to deal w/ negative outliers?

2013-04-12 Thread ramoss
Hello all, I have a question: I am using the interquantile method to spot outliers it gives me values of say 234 -120 or for the higher lower benchmarks. I don't have any issues w/ the higher end. However I don't have any negative values. My lowest possible value is 0. Should I consider 0

[R] Using PLYR to apply a custom function to a data frame

2013-04-10 Thread ramoss
Hello, I am still struggling w/ the PLYR syntax. I am trying to build a customized function to detect outliers in a data frame based on the interquantile method. My data frame is called ALL I am trying to create two new variables in my data frame: upper=q3+ 1.5*(q3-q1)

Re: [R] Using PLYR to apply a custom function to a data frame

2013-04-10 Thread ramoss
Thanks everyone. The mutate function worked great: all2- mutate(all1,upper=p75+1.5*(p75-p25),lower=p25-1.5*(p75-p25)) -- View this message in context: http://r.789695.n4.nabble.com/Using-PLYR-to-apply-a-custom-function-to-a-data-frame-tp4663897p4663902.html Sent from the R help mailing list

[R] How to perform a grouped shapiro wilk test on dataframe

2013-04-05 Thread ramoss
Hello, I was wandering if it is possible to perform on a dataframe called 'all' a shapiro wilk normality test for COUNTS by variable Group ACTIVITY? Could it be done using plyer? I saw an eg that applies to an array but not to a dataframe:

Re: [R] Can package plyr also calculate the mode?

2013-04-04 Thread ramoss
When I put in mode=mode(COUNTS) I get the value numeric as an answer. I think it's giving me the data type not the mode. -- View this message in context: http://r.789695.n4.nabble.com/Can-package-plyr-also-calculate-the-mode-tp4663235p4663301.html Sent from the R help mailing list archive at

Re: [R] Can package plyr also calculate the mode?

2013-04-04 Thread ramoss
When I run yy - ddply(all,ACTIVIT, summarise, mode=mode(COUNTS)) I get : ACTIVITmode XX numeric ZZ numeric so on. -- View this message in context:

[R] Can package plyr also calculate the mode?

2013-04-03 Thread ramoss
I am trying to replicate the SAS proc univariate in R. I got most of the stats I needed for a by grouping in a data frame using: all1 - ddply(all,ACT_NAME, summarise, mean=mean(COUNTS), sd=sd(COUNTS), q25=quantile(COUNTS,.25),median=quantile(COUNTS,.50), q75=quantile(COUNTS,.75),

[R] What is SAS options missing=0 equivalent in R?

2013-04-02 Thread ramoss
I have a dataframe wish to convert the NA (missing values) to zero . In SAS I would use options missing=0 to convert all my obs in a dataset. How can I accomplish the same thing in R? Can it be done? Thanks for any thoughts on this. -- View this message in context:

[R] Data frame question

2013-04-01 Thread ramoss
Hello, I have 2 data frames: activity and dates. Activity contains a l variable listing all activities: activityA, activityB etc. The dates contain all the valid business dates. I need to combine the 2 so that I get a single data frame activitydat that contains the activity name along w/

[R] Left join in R

2013-04-01 Thread ramoss
I have never used the data.table package. I am trying to do the following SQL left join in R create table all as select a.* from dates b left outerjoin activitycount a on a.tdate=b.tdate and a.activity=b.activity

[R] Subset in, not in

2013-01-10 Thread ramoss
Hello, I need to subset my dataframe into 2 parts; in: mm - subset(agr1, subset=lmpcrd %in% c(11697,149823,7654)) not in: but where do I stick the ! in the above? I've tried every position. Thanks for your help. -- View this message in context:

[R] Inserting percentile values in a data frame

2013-01-03 Thread ramoss
Hello I need to calculate and insert the values for the 50,75,90,95 99 percentiles in a data frame for each row. I used agr1$quantile - quantile(agr1$cnt, probs=c(.50, .75, .90, .95, .99)) but that didn't work. How can calculate the percentile for my variable cnt , insert name the percentile

[R] Help w/ FF package to upload large file.

2012-12-31 Thread ramoss
Hello, Does anyone here know how to use this package? Documentation most confusing. I have a large CSV file w/ 6.8M obs 19 variables. I am having memory issues trying to upload it to Green plump using: sqlSave(chann, rave, tablename=mossader_dev.rave, rownames=F, colnames=T) How can I write

[R] subset data frame by variable with missing value

2012-11-30 Thread ramoss
Hello, I have a variable in a data frame that contains NA values. I just want to subset so that I get the obs where that variable is missing. In SAS I would do: data missing; set test; if myvalue=' '; run; How can I perform this simple task in R? Thanks in advance for your help. --

Re: [R] subset data frame by variable with missing value

2012-11-30 Thread ramoss
I found the answer; Its mymissing - subset(mydata,is.na(myvar)) -- View this message in context: http://r.789695.n4.nabble.com/subset-data-frame-by-variable-with-missing-value-tp4651439p4651440.html Sent from the R help mailing list archive at Nabble.com.

Re: [R] Can you have a by variable in Lag function as in SAS

2012-11-16 Thread ramoss
Thank you again all responders. Dan your solution was both easy miraculous. -- View this message in context: http://r.789695.n4.nabble.com/Can-you-have-a-by-variable-in-Lag-function-as-in-SAS-tp4649647p4649773.html Sent from the R help mailing list archive at Nabble.com.

[R] Can you have a by variable in Lag function as in SAS

2012-11-15 Thread ramoss
Hello, I want to use lag on a time variable but I have to take date into consideration ie I don't want days to overlap ie: I don't want my first time of today to match my last time of yeterday. In SAS I would use : data x; set y; by date tim; previous=lag(tim); if first.date then

[R] Using lubridate to increment date by business days only

2012-11-13 Thread ramoss
Hello, I know how to increment a date by calendar date: ticker$ldate - ticker$tdate + days(5) How do I increment it by business days only so that week-ends are not counted? So for example friday november 2 + 5days becomes friday november 9 not wednesday nov 7. Thanks for your help. --

[R] Creating a new by variable in a dataframe

2012-10-19 Thread ramoss
Hello, I have a dataframe w/ 3 variables of interest: transaction,date(tdate) time(event_tim). How could I create a 4th variable (last_trans) that would flag the last transaction of the day for each day? In SAS I use: proc sort data=all6; by tdate event_tim; run; /*Create last

Re: [R] Creating a new by variable in a dataframe

2012-10-19 Thread ramoss
Thanks for all the help guys. This worked for me: all6 - arrange(all6, tdate,event_tim) lt - ddply(all6,.(tdate),tail,1) lt$last_trans -'Y' all6 -merge(all6,lt, by.x=c(tdate,event_tim), by.y=c(tdate,event_tim),all.x=TRUE) -- View this message in context:

[R] How to replicate SAS by group processing in R

2012-10-10 Thread ramoss
Hello, I am trying to re-code all my programs from SAS into R. In SAS I use the following code: proc sort data=upper; by tdate stock_symbol expire strike; run; data upper1; set upper; by tdate stock_symbol expire strike; if first.expire then output; rename strike=astrike; run; on the

[R] Conditional operations in R

2012-09-18 Thread ramoss
Hello, I am a newbie to R coming from SAS background. I am trying to program the following: I have a monthly data frame with 2 variables: client pct_total A 15% B 10% C 10% D 9% E 8% F 6% G 4% I need to come up w/ a monthly list

Re: [R] Conditional operations in R

2012-09-18 Thread ramoss
Thanks to all who responded, particularly to Michael. Your solution was the easiest to understand to implement. This worked beautifully: cmtot - arrange(cmtot, -PCTTOT)#sort by descending top - with(cmtot,which.max(cumsum(PCTTOT) = 50)) topcm - cmtot[seq(1,top),] -- View this message in

Re: [R] Cannot install package xlsx

2012-09-14 Thread ramoss
It looks like they are all corrupted. I tried several other CRAN sites across the world. How can we notify the package owner? -- View this message in context: http://r.789695.n4.nabble.com/Cannot-install-package-xlsx-tp4643054p4643142.html Sent from the R help mailing list archive at

[R] Paasing values to sqlQuery like SAS macro

2012-09-13 Thread ramoss
Hello, We lost our SAS licence I am busy transfering my old SAS programs to R environment. I am very new to R. In 1 program I was creating SAS macro vars passing them into a SQL query to run against the server. There are 3 variables firm, begindt, enddt. # of values for each varies month to

[R] Cannot install package xlsx

2012-09-13 Thread ramoss
I get following error message: trying URL 'http://cran.stat.ucla.edu/bin/windows/contrib/2.15/xlsx_0.4.2.zip' Content type 'application/zip' length 365611 bytes (357 Kb) opened URL downloaded 357 Kb Error in read.dcf(file.path(pkgname, DESCRIPTION), c(Package, Type)) : cannot open the

Re: [R] Paasing values to sqlQuery like SAS macro

2012-09-13 Thread ramoss
Thanks I was doing something similar in SAS. I was looping macro based on a dataset containing the values: data _null_; set summary2; mindat=put(datepart(mindate),date9.); min_date='mindat_'|| trim(left(_n_)); put mindate= mindat= min_date=; /*check values in log*/ call symput

[R] FF package downloading a large file using sqlQuery

2012-09-06 Thread ramoss
I am new to R and am encountering memory issues while trying to download a large table from Green Plump, using sqlQuery. Is there any way this FF package can help me create a large dataframe in R while downloading from the server? The FF documentations are very confusing. Thanks for any

[R] help w/ uploading table frm R to green plump

2012-09-05 Thread ramoss
Hi, Does anyone know how to upload a table to green plumb have it be distributed? I know how to upload using sqlSave(chann, d, tablename=castaneg.wh_d, rownames=F, colnames=T) but how can I make my table be distributed randomly on the server. In SAS you can use the option distribute_on=random

[R] Conditional merging in R if then statement

2012-08-31 Thread ramoss
1)I am wandering how the following SQL statement can be written in R language w/o using sqldf: create table detail2 as select a.* from detail a, pdetail b where a.TDATE=b.TDATE and(a.STIM = b.STIM and a.STIM =b.MAXTIM) 2) when try if then in R it only applies to the 1st row not to

Re: [R] Deduping in R by multiple variables

2012-08-30 Thread ramoss
Thanks for your help guys. I was refering to the variables the wrong way. This worked for me: idx - !duplicated(detail2[,c(TDATE,FIRM,CM,BRANCH, BEGTIME, ENDTIME,OTYPE,OCOND, ACCTYP,OSIDE,SHARES,STOCKS, STKFUL)]) detail3 -

[R] Deduping in R by multiple variables

2012-08-29 Thread ramoss
I have a dataset w/ 184K obs 16 variables. In SAS I proc sort nodupkey it in seconds by 11 variables. I tried to do the same thing in R using both the unique then the !duplicated functions but it just hangs there I get no output. Does anyone know how to solve this? This is how I tried to do

Re: [R] Concatenating data frames in R versus SAS

2012-08-24 Thread ramoss
I used summary -rbind.fill(agency,prop) it worked like a charm. Thanks everyone. -- View this message in context: http://r.789695.n4.nabble.com/Concatenating-data-frames-in-R-versus-SAS-tp4641138p4641219.html Sent from the R help mailing list archive at Nabble.com.

[R] if then in R versus SAS

2012-08-24 Thread ramoss
I am new to R and I have the following SAS statements: if otype='M' and ocond='1' and entry='a.Prop' then MOC=1; else MOC=0; How would I translate that into R code? Thanks in advance -- View this message in context: http://r.789695.n4.nabble.com/if-then-in-R-versus-SAS-tp4641225.html Sent

[R] Concatenating data frames in R versus SAS

2012-08-23 Thread ramoss
I am trying to concatenate 2 datasets that don't have exactly the same column. In SAS I did: data summary; set agency prop; run; No problem in R I get error message summary -rbind(agency,prop) Error in match.names(clabs, names(xi)) : names do not match previous names But when I use

[R] Merging data in R compared to SAS

2012-08-22 Thread ramoss
Hello, I am a SAS user new to R. What is the R equivalent to following SAS statements: 1) data all; merge test1(in=a) test2(in=b) ; by account_id; if a; run; 2) proc sort data=all nodupkey; by account_id; run; 3) data all test1onnly test2only; merge test1(in=a)