[R] tidyquant error downloading symbols for Index

2017-08-06 Thread Sparks, John James
Hi R Helpers,

I recently tried to take advantage of the ability to download all the
tickers in the S 500 using the functionality of tidyquant, but it threw
an error.

For summary, the set of commands that I ran was

library(tidyquant)
tq_index_options()
tq_index("SP500")
sessionInfo()


R feedback including error message and sessionInfo are provided below.

Guidance would be appreciated.

--John J. Sparks, Ph.D.


> library(tidyquant)
Loading required package: lubridate

Attaching package: ‘lubridate’

The following object is masked from ‘package:base’:

date

Loading required package: PerformanceAnalytics
Loading required package: xts
Loading required package: zoo

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

as.Date, as.Date.numeric


Package PerformanceAnalytics (1.4.3541) loaded.
Copyright (c) 2004-2014 Peter Carl and Brian G. Peterson, GPL-2 | GPL-3
http://r-forge.r-project.org/projects/returnanalytics/


Attaching package: ‘PerformanceAnalytics’

The following object is masked from ‘package:graphics’:

legend

Loading required package: quantmod
Loading required package: TTR
Version 0.4-0 included new data defaults. See ?getSymbols.
Learn from a quantmod author:
https://www.datacamp.com/courses/importing-and-managing-financial-data-in-r
Loading required package: tidyverse
Loading tidyverse: ggplot2
Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Loading tidyverse: dplyr
Conflicts with tidy packages
-
as.difftime(): lubridate, base
date():lubridate, base
filter():  dplyr, stats
first():   dplyr, xts
intersect():   lubridate, base
lag(): dplyr, stats
last():dplyr, xts
setdiff(): lubridate, base
union():   lubridate, base

Attaching package: ‘tidyquant’

The following object is masked from ‘package:dplyr’:

as_tibble

The following object is masked from ‘package:tibble’:

as_tibble

There were 14 warnings (use warnings() to see them)
> tq_index_options()
[1] "RUSSELL1000" "RUSSELL2000" "RUSSELL3000" "DOW" "DOWGLOBAL"
[6] "SP400"   "SP500"   "SP600"   "SP1000"
> tq_index("SP500")
Getting holdings for SP500
# A tibble: 0 x 0
Warning message:
In tq_index("SP500") : Error at SP500 during download.
Error: .onLoad failed in loadNamespace() for 'rJava', details:
  call: fun(libname, pkgname)
  error: No CurrentVersion entry in Software/JavaSoft registry! Try
re-installing Java and make sure R and Java have matching architectures.

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
 [1] tidyquant_0.5.3   dplyr_0.7.2
 [3] purrr_0.2.3   readr_1.1.1
 [5] tidyr_0.6.3   tibble_1.3.3
 [7] ggplot2_2.2.1 tidyverse_1.1.1
 [9] quantmod_0.4-10   TTR_0.23-2
[11] PerformanceAnalytics_1.4.3541 xts_0.10-0
[13] zoo_1.8-0 lubridate_1.6.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.12 cellranger_1.1.0 plyr_1.8.4   bindr_0.1
 [5] forcats_0.2.0tools_3.3.2  jsonlite_1.5 nlme_3.1-131
 [9] gtable_0.2.0 lattice_0.20-35  pkgconfig_2.0.1  rlang_0.1.1
[13] psych_1.7.5  curl_2.8.1   parallel_3.3.2   haven_1.1.0
[17] bindrcpp_0.2 xml2_1.1.1   httr_1.2.1   stringr_1.2.0
[21] hms_0.3  grid_3.3.2   glue_1.1.1   R6_2.2.2
[25] Quandl_2.8.0 readxl_1.0.0 foreign_0.8-69   modelr_0.1.1
[29] reshape2_1.4.2   magrittr_1.5 scales_0.4.1 rvest_0.3.2
[33] assertthat_0.2.0 mnormt_1.5-5 colorspace_1.3-2 stringi_1.1.5
[37] lazyeval_0.2.0   munsell_0.4.3broom_0.4.2

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Looping Through QuantMod Objects

2017-08-02 Thread Sparks, John James
Dear R Helpers,

I have run into a problem trying to perform a number of actions on a set
of quantmod data objects through a loop and I am hoping that this is an
easy problem for someone else as opposed to very  difficult for me.

The example task is to get the first three objects of the quarterly
balance sheet for a number of companies from the getFinancials object and
put them together into a single file.  I can do this one by one, but if I
try to build a loop and use the get function then the results are not
anticipated and leave me baffled.

If I do it one at a time all is good.


require(quantmod)

getFinancials("AAPL")
getFinancials("IBM")
getFinancials("MSFT")


items=c("Cash & Equivalents","Short Term Investments","Cash and Short Term
Investments")

HoldQuart<-AAPL.f$BS$Q
CashHold<-subset(HoldQuart,rownames(HoldQuart) %in% items)
CashT<-t(CashHold)
Cashdf<-data.frame(CashT)
Cashdf$tic<-"AAPL"
AAPL.c<-Cashdf

HoldQuart<-IBM.f$BS$Q
CashHold<-subset(HoldQuart,rownames(HoldQuart) %in% items)
CashT<-t(CashHold)
Cashdf<-data.frame(CashT)
Cashdf$tic<-"IBM"
IBM.c<-Cashdf


HoldQuart<-MSFT.f$BS$Q
CashHold<-subset(HoldQuart,rownames(HoldQuart) %in% items)
CashT<-t(CashHold)
Cashdf<-data.frame(CashT)
Cashdf$tic<-"MSFT"
MSFT.c<-Cashdf


BigCash<-rbind(AAPL.c, IBM.c, MSFT.c)
#setwd<-("C:/Users/HP USER/Documents")
#write.csv(BigCash,file="CashList.csv")


When I try to process through this using a loop, however, things go south
pretty quickly.

tickerlist<-ls(pattern="^[A-Z]+\\.f")

for( i in 1:1)
{
test<-get(paste0(tickerlist[i],"$BS$Q"))
}

Error in get(paste0(tickerlist[i], "$BS$Q")) :
  object 'AAPL.f$BS$Q' not found

So I tried to break it up into smaller steps, but the resulting matrix
seems to have lost its structure (see below).

If someone could help me out, I sure would appreciate.

Thanks.
--John Sparks


tickerlist<-ls(pattern="^[A-Z]+\\.f")
for( i in 1:1)
{
HoldFin<-get(tickerlist[i])
BSQ<-as.matrix(paste0(HoldFin,"$BS$Q"))
}
BSQ

[1,] "list(Q = c(52896, NA, 52896, 32305, 20591, 3718, 2776, NA, NA, NA,
NA, 38799, 14097, NA, NA, -165, 14684, 11029, NA, NA, 11029, NA, NA, NA,
11029, NA, 11029, 11029, NA, NA, NA, NA, 5261.69, 2.1, NA, 0.57, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2.1, 78351, NA, 78351, 48175,
30176, 3946, 2871, NA, NA, NA, NA, 54992, 23359, NA, NA, 122, 24180,
17891, NA, NA, 17891, NA, NA, NA, 17891, NA, 17891, 17891, NA, NA, NA, NA,
5327.99, 3.36, NA, 0.57, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
3.36, \n46852, NA, 46852, 29039, 17813, 3482, 2570, NA, NA, NA, NA, 35091,
11761, NA, NA, -159, 12188, 9014, NA, NA, 9014, NA, NA, NA, 9014, NA,
9014, 9014, NA, NA, NA, NA, 5393.33, 1.67, NA, 0.57, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 1.67, 42358, NA, 42358, 26252, 16106, 3441,
2560, NA, NA, NA, NA, 32253, 10105, NA, NA, -263, 10469, 7796, NA, NA,
7796, NA, NA, NA, 7796, NA, 7796, 7796, NA, NA, NA, NA, 5472.78, 1.42, NA,
0.57, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1.42, 50557, NA,
50557, \n30636, 19921, 3423, 2511, NA, NA, NA, NA, 36570, 13987, NA, NA,
-510, 14142, 10516, NA, NA, 10516, NA, NA, NA, 10516, NA, 10516, 10516,
NA, NA, NA, NA, 5540.89, 1.9, NA, 0.52, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 1.9), A = c(215639, NA, 215639, 131376, 84263, 14194,
10045, NA, NA, NA, NA, 155615, 60024, NA, NA, -1195, 61372, 45687, NA, NA,
45687, NA, NA, NA, 45687, NA, 45687, 45687, NA, NA, NA, NA, 5500.28, 8.31,
NA, 2.18, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 8.31, 233715,
NA, \n233715, 140089, 93626, 14329, 8067, NA, NA, NA, NA, 162485, 71230,
NA, NA, -903, 72515, 53394, NA, NA, 53394, NA, NA, NA, 53394, NA, 53394,
53394, NA, NA, NA, NA, 5793.07, 9.22, NA, 1.98, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 9.22, 182795, NA, 182795, 112258, 70537, 11993,
6041, NA, NA, NA, NA, 130292, 52503, NA, NA, -311, 53483, 39510, NA, NA,
39510, NA, NA, NA, 39510, NA, 39510, 39510, NA, NA, NA, 0, 6122.66, 6.45,
NA, 1.81, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 6.45, 170910,
\nNA, 170910, 106606, 64304, 10830, 4475, NA, NA, NA, NA, 121911, 48999,
NA, NA, -24, 50155, 37037, NA, NA, 37037, NA, NA, NA, 37037, NA, 37037,
37037, NA, NA, NA, 0, 6521.5, 5.68, NA, 1.63, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 5.68))$BS$Q"
[2,] "list(Q = c(NA, 59501, 67101, 11579, NA, 20612, 2910, NA, 11367,
101990, 65124, -37961, 5473, 2617, 189740, 7549, 334532, 28573, 21665,
9992, 3999, 9113, 73342, 84531, NA, 84531, 98522, 28226, NA, 14351,
200450, NA, NA, 33579, NA, 100925, NA, -902, 134082, 334532, NA, 5205.81,
NA, 51093, 60452, 14057, NA, 27977, 2712, NA, 12191, 103332, 62759,
-36249, 5423, 2848, 185638, 7390, 331141, 38510, 21895, 10493, 3499, 9733,
84130, 73557, NA, 73557, 87549, 26948, NA, 14116, 198751, NA, NA, 32144,
NA, 11, \nNA, -1567, 132390, 331141, NA, 5255.42, NA, 58554, 67155,
15754, NA, 29299, 2132, NA, 8283, 106869, 61245, -34235, 5414, 3206,
170430, 8757, 321686, 37294, 20951, 8105, 3500, 9156, 79006, 75427, NA,
75427, 87032, 26019, NA, 12985, 

Re: [R] Setting .Rprofile for RStudio on a Windows 7 x64bit

2017-04-18 Thread Sparks, John James
Bruce,

Do you think that you could post the final solution to the problem?  That
way it would be stored with this thread and the next person who has the
same problem would be able to locate the FINAL solution.

--JJS


On Mon, April 17, 2017 12:47 pm, BR_email wrote:
> TO _ALL_:
> THANK YOU. THANK YOU. THANK YOU.
> After hours, and hours, and hours, and ... , and hours: Success.
> To all who helped, thanks.
> My quest was minor, but major for me, as I learn from the path of one,
> whether big or small begets another.
>
> I never look down at anyone, except to help him/her up.
>
> With gratitude,
> Bruce
>
> Bruce Ratner, Ph.D.
> The Significant Statistician™
> (516) 791-3544
> Statistical Predictive Analtyics -- www.DMSTAT1.com
> Machine-Learning Data Mining and Modeling -- www.GenIQ.net
>
>
> Peter Dalgaard wrote:
>>> On 17 Apr 2017, at 19:01 , BR_email  wrote:
>>>
>>> Berend: Something looks good, but RStudio still Rprofile still doees
>>> not affect the launch.
>>>
 source(echo=TRUE, "C:/Users/BruceRatner/Documents/.Rprofile.site")
 options(prompt="R> ")
 set.seed(12345)
 rm(list=ls())
>>> R>
>>>
>>>
>>> Bruce Ratner, Ph.D.
>>> The Significant Statistician™
>>> (516) 791-3544
>>> Statistical Predictive Analtyics -- www.DMSTAT1.com
>>> Machine-Learning Data Mining and Modeling -- www.GenIQ.net
>>>
>>> Berend Hasselman wrote:
 source(echo=TRUE, ""C:/Users/BruceRatner/Documents/.Rprofile.site")
>> According to the gospel of St.Henrik, that filename is wrong, and
>> possibly the directory too.
>>
>> So try his suggestions. What is the output (show us!) of
>>
>> normalizePath("./.Rprofile")
>> normalizePath("~/.Rprofile")
>>
>> Assuming that the former is
>>
>> "C:/Users/BruceRatner/Documents/.Rprofile"
>>
>> you could try renaming the .Rprofile.site file to that. If need be, use
>> file.rename, as in
>>
>> file.rename(from="C:/Users/BruceRatner/Documents/.Rprofile.site",
>> to="C:/Users/BruceRatner/Documents/.Rprofile")
>>
>> (and restart, obviously).
>>
>> [I wouldn't set the seed in a .Rprofile file, nor would I use rm()
>> there, but that is a different kettle of fish.]
>>
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cforest Single Tree Output for Categorical Variable

2017-01-13 Thread Sparks, John James
Hello R Helpers,

I am building a random forest using the cforest method in the party
package.  I then want to have a look at the characteristics of a few of
the trees.  I get the output for one of the trees by executing

pt <- party:::prettytree(cforest@ensemble[[3]],
names(cforest@data@get("input")))
pt

The first splitting variable is a categorical variable (here named cat,
which contains value 0 through 9 and is a factor), but the output does not
specify which values went into which part of the tree:

1) cat == {}; criterion = 1, statistic = 32.792

Can anyone help me to get the detail on this splitting variable to appear
in the output?

I regret that I cannot send a reproducible example because the data is
proprietary.  I will try to work up an example with a public data set that
has the same problem.

Any help would be much appreciated.

Best wishes,
--John J. Sparks, Ph.D.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Pull Stock Symbol Out of String

2014-04-08 Thread Sparks, John James
Dear R Helpers,

My regex skills are beginner to intermediate and banging around the web
has not resulted in a solution to the problem below so I hope that one of
you who has mad skills can help me out.

I want to extract the stock ticker--AMT-- out of the string

American Tower Corporation (REIT) (AMT)

The presence of the other parenthetical text (REIT) makes this difficult. 
Please note that the string may or may not have a interfering set of
characters such as the (REIT) so the solution needs to be generalizable to
the last set of characters that are contained in parentheses in the larger
string.  So an example of a string without the interfering (REIT) would be

Aetna Inc. (AET)


Your assistance would be very much appreciated.

--John Sparks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grap Element from Web Page

2013-08-16 Thread Sparks, John James
Thanks, the second approach worked fine on Windows.

--JJS

On Thu, August 15, 2013 8:38 am, Jeffrey Dick wrote:
 Sorry, I can't generate an error when running those commands in R on Linux
 64-bit. But if I move to Windows (R version 3.0.1, XML_3.98-1.1), I get a
 different error ...

 require(XML)
 Loading required package: XML
 doc - htmlTreeParse(
 http://www.sec.gov/cgi-bin/browse-edgar?CIK=MSFTFind=Searchowner=excludeaction=getcompany
 )
 node - getNodeSet(doc[[1]], //link[@rel='alternate'] )
 Input is not proper UTF-8, indicate encoding !
 Bytes: 0xC2 0x0A 0x20 0x20
 Error: 1: Input is not proper UTF-8, indicate encoding !
 Bytes: 0xC2 0x0A 0x20 0x20
 node - getNodeSet(doc, //link[@rel='alternate'] )
 Error in UseMethod(xpathApply) :
   no applicable method for 'xpathApply' applied to an object of class
 XMLDocumentContent

 ... note that I've tried both doc[[1]] and doc in the function call. Also,
 only the XML library is required. I'm not sure what's going on with the
 character encoding error, might be my system settings. Reading the help
 page (?htmlTreeParse) provides a clue to use the htmlParse function
 instead, equivalent to setting the useInternalNodes parameter to TRUE ...
 These can then be searched using XPath expressions via 'xpathApply' and
 'getNodeSet'. That seems to be relevant to this case.

 doc - htmlParse(
 http://www.sec.gov/cgi-bin/browse-edgar?CIK=MSFTFind=Searchowner=excludeaction=getcompany
 )
 node - xpathSApply(doc, //link[@rel='alternate'], xmlAttrs)
 node

 [,1]

 rel
 alternate

 type
 application/atom+xml

 title
 ATOM

 href
 /cgi-bin/browse-edgar?action=getcompanyCIK=789019type=dateb=owner=excludecount=40output=atom
 strsplit(strsplit(node[[4]], CIK=)[[1]][2], type)[[1]][1]
 [1] 789019

 Perhaps that approach is less prone to error.


 On Thu, Aug 15, 2013 at 12:48 PM, Sparks, John James
 jspa...@uic.eduwrote:

 Thanks so much for looking into this for me.

 Unfortunately, I get an error when I execute your code.  Is there a
 library that you loaded that I haven't?

 require(scrapeR)
 require(XML)
 require(RCurl)
 doc-htmlTreeParse(
 http://www.sec.gov/cgi-bin/browse-edgar?CIK=MSFTFind=Searchowner=excludeaction=getcompany
 )
 node - getNodeSet(doc[[1]], //link[@rel='alternate'] )
 Error in UseMethod(xpathApply) :
   no applicable method for 'xpathApply' applied to an object of class
 character


 Guidance would be much appreciated.

 --JJS



 On Wed, August 14, 2013 4:19 am, Jeffrey Dick wrote:
  Hi,
 
  There are many occurrences of the CIK number in the page source. This
  pulls
  out the first node containing it:
 
  node - getNodeSet(doc[[1]], //link[@rel='alternate'] )
 
  From there you can extract the number. Here's one way to do it.
 
  strsplit(strsplit(unlist(node)[[5]], CIK=)[[1]][2], type)[[1]][1]
 
  Jeff
 
 
  On Wed, Aug 14, 2013 at 1:34 PM, Sparks, John James jspa...@uic.edu
  wrote:
 
  Dear R Helpers,
 
  I would like to pull the CIK number from the web page
 
 
 
 http://www.sec.gov/cgi-bin/browse-edgar?CIK=MSFTFind=Searchowner=excludeaction=getcompany
 
  If you put this web page into your browser you will see the CIK
 number
  in
  red on the left side of the page near the top.
 
  When I try the basic
  require(scrapeR)
  require(XML)
  require(RCurl)
  doc
  -htmlTreeParse(
 
 http://www.sec.gov/cgi-bin/browse-edgar?CIK=MSFTFind=Searchowner=excludeaction=getcompany
  )
  str(doc)
 
  I get a large number of items in the data frame that I don't know how
 to
  interpret.  Both
  tables - readHTMLTable(doc)
 
  and
 
  list-xmlToList(doc)
 
  result in errors.
 
  Any (positive) guidance would be much appreciated.
 
  --John J. Sparks, Ph.D.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grap Element from Web Page

2013-08-14 Thread Sparks, John James
Thanks so much for looking into this for me.

Unfortunately, I get an error when I execute your code.  Is there a
library that you loaded that I haven't?

require(scrapeR)
require(XML)
require(RCurl)
doc-htmlTreeParse(http://www.sec.gov/cgi-bin/browse-edgar?CIK=MSFTFind=Searchowner=excludeaction=getcompany;)
node - getNodeSet(doc[[1]], //link[@rel='alternate'] )
Error in UseMethod(xpathApply) :
  no applicable method for 'xpathApply' applied to an object of class
character


Guidance would be much appreciated.

--JJS



On Wed, August 14, 2013 4:19 am, Jeffrey Dick wrote:
 Hi,

 There are many occurrences of the CIK number in the page source. This
 pulls
 out the first node containing it:

 node - getNodeSet(doc[[1]], //link[@rel='alternate'] )

 From there you can extract the number. Here's one way to do it.

 strsplit(strsplit(unlist(node)[[5]], CIK=)[[1]][2], type)[[1]][1]

 Jeff


 On Wed, Aug 14, 2013 at 1:34 PM, Sparks, John James jspa...@uic.edu
 wrote:

 Dear R Helpers,

 I would like to pull the CIK number from the web page


 http://www.sec.gov/cgi-bin/browse-edgar?CIK=MSFTFind=Searchowner=excludeaction=getcompany

 If you put this web page into your browser you will see the CIK number
 in
 red on the left side of the page near the top.

 When I try the basic
 require(scrapeR)
 require(XML)
 require(RCurl)
 doc
 -htmlTreeParse(
 http://www.sec.gov/cgi-bin/browse-edgar?CIK=MSFTFind=Searchowner=excludeaction=getcompany
 )
 str(doc)

 I get a large number of items in the data frame that I don't know how to
 interpret.  Both
 tables - readHTMLTable(doc)

 and

 list-xmlToList(doc)

 result in errors.

 Any (positive) guidance would be much appreciated.

 --John J. Sparks, Ph.D.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Grap Element from Web Page

2013-08-13 Thread Sparks, John James
Dear R Helpers,

I would like to pull the CIK number from the web page

http://www.sec.gov/cgi-bin/browse-edgar?CIK=MSFTFind=Searchowner=excludeaction=getcompany

If you put this web page into your browser you will see the CIK number in
red on the left side of the page near the top.

When I try the basic
require(scrapeR)
require(XML)
require(RCurl)
doc
-htmlTreeParse(http://www.sec.gov/cgi-bin/browse-edgar?CIK=MSFTFind=Searchowner=excludeaction=getcompany;)
str(doc)

I get a large number of items in the data frame that I don't know how to
interpret.  Both
tables - readHTMLTable(doc)

and

list-xmlToList(doc)

result in errors.

Any (positive) guidance would be much appreciated.

--John J. Sparks, Ph.D.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] T test for Single Mean

2013-06-19 Thread Sparks, John James
Dear R Helpers,

I am stuck on some syntax and I thought that I was following one of the
examples that I found out there quite faithfully.

I just want to know how to do a t test on a single mean for whether or not
it is greater than a specific value.  So I am using the data set sleep and
I want to know if the mean of extra is greater then zero.  I was under the
impression that the syntax is

t.test(sleep$extra,mu=0,greater)

but I get the error message

Error in t.test.default(sleep$extra, mu = 0, greater) :
  not enough 'y' observations

I have tried this on a few other data sets that have more then 20
observations and I get the same error.  I looked in the documentation but
the examples are for the comparison of two groups, not a single group
mean.

Any help would be most appreciated.

--John J. Sparks, Ph.D.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Refer to Data Frame Name Inside a List

2013-06-04 Thread Sparks, John James
Dear R Helpers,

I have a fairly complicated list of data frames.  To give you an idea of
the structure, the top of the str output is shown below.

How do I refer to the data.frame name for each data.frame in the list? 
That is, how can I pull the terms Advertising2007, AirFreightDelivery2007,
Apparel2007 etc. out of the list?  I need them to keep track of
correlations that I am doing inside each data frame of the list.

Apologies for not sending a reproducible example.  I am hoping that
someone knows this off the top of their head.

--John Sparks

 str(ResList)
List of 60
 $ Advertising2007 :'data.frame':   21 obs. of  10 variables:
  ..$ RFPred   : num [1:21] -0.01749 -0.00801 -0.01155 -0.01494
-0.03715 ...
  ..$ marsPred : num [1:21] 0.0901 0.0127 0.0616 0.0618 -0.0559 ...
  ..$ GainRepAft3  : num [1:21] -0.0673 -0.0183 -0.2353 0.0294 -0.059 ...
  ..$ Industry : chr [1:21] Advertising2007 Advertising2007
Advertising2007 Advertising2007 ...
  ..$ dateavail: Factor w/ 346 levels 2008-02-01,2008-02-13,..: 18
4 14 12 13 19 1 15 17 8 ...
  ..$ FinYearEnd   : Factor w/ 12 levels 2007-12-01,2007-03-01,..: 1 1
1 1 1 1 1 1 1 1 ...
  ..$ GainAft1Aft30: num [1:21] -0.2376 -0.1384 -0.1176 0.0145 0.0527 ...
  ..$ GainAft1Aft60: num [1:21] -0.36212 -0.17801 -0.23529 -0.00501
-0.27414 ...
  ..$ GainAft1Aft90: num [1:21] -0.516 -0.203 -0.176 0.024 -0.241 ...
  ..$ groups   : Factor w/ 40 levels -0.04013239,..: 4 11 8 6 1 1 10
13 2 5 ...
 $ AirFreightDelivery2007  :'data.frame':   20 obs. of  10 variables:
  ..$ RFPred   : num [1:20] 0.00322 -0.00351 0.034 0.01095 0.02237 ...
  ..$ marsPred : num [1:20] -0.013 -0.109 0.0662 0.0353 0.0662 ...
  ..$ GainRepAft3  : num [1:20] 0.0344 -0.0659 0.054 0.045 0.0266 ...
  ..$ Industry : chr [1:20] AirFreightDelivery2007
AirFreightDelivery2007 AirFreightDelivery2007
AirFreightDelivery2007 ...
  ..$ dateavail: Factor w/ 346 levels 2008-02-01,2008-02-13,..: 22
10 26 33 35 32 25 23 31 10 ...
  ..$ FinYearEnd   : Factor w/ 12 levels 2007-12-01,2007-03-01,..: 2 1
1 1 1 1 1 3 1 1 ...
  ..$ GainAft1Aft30: num [1:20] -0.0656 -0.1539 -0.1002 -0.0694 -0.4101 ...
  ..$ GainAft1Aft60: num [1:20] -0.133 -0.141 -0.242 -0.691 -0.212 ...
  ..$ GainAft1Aft90: num [1:20] -0.0523 -0.0673 -0.1793 -0.6875 -0.187 ...
  ..$ groups   : Factor w/ 40 levels -0.04013239,..: 24 16 39 32 37
21 17 30 35 37 ...
 $ Apparel2007 :'data.frame':   28 obs. of  10 variables:
  ..$ RFPred   : num [1:28] 0.011439 0.021311 0.014564 0.018168
-0.000892 ...
  ..$ marsPred : num [1:28] -0.001463 0.0345 0.027227 -0.000129
-0.006483 ...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Selecting A List of Columns

2013-05-17 Thread Sparks, John James
Dear R Helpers,

I need help with a slightly unusual situation in which I am trying to
select some columns from a data frame.  I know how to use the subset
statement with column names as in:


x=as.data.frame(matrix(c(1,2,3,
1,2,3,
1,2,2,
1,2,2,
1,1,1),ncol=3,byrow=T))

all.cols-colnames(x)
to.keep-all.cols[1:2]

Kept-subset(x,select=to.keep)
Kept

However, if I want to select some columns based on a selection of the most
important variables from a random forest then I find myself stuck.  The
example below demonstrates the problem.


library(randomForest)

data(mtcars)
mtcars.rf - randomForest(mpg ~ ., data=mtcars,importance=TRUE)
Importance-data.frame(mtcars.rf$importance)
Importance



MSEImportance-head(Importance[order(Importance$X.IncMSE,
decreasing=TRUE),],3)
MSEVars-row.names(MSEImportance)
MSEVars-data.frame(MSEVars,stringsAsFactors = FALSE)
colnames(MSEVars)-Vars

NodeImportance-head(Importance[order(Importance$IncNodePurity,decreasing=TRUE),],
3)
NodeVars-row.names(NodeImportance)
NodeVars-data.frame(NodeVars,stringsAsFactors = FALSE)
colnames(NodeVars)-Vars


ImportantVars-rbind(MSEVars,NodeVars)
ImportantVars-unique(ImportantVars)
nrow(ImportantVars)
ImportantVars-as.character(ImportantVars)
ImportantVars
CarsVarsKept-subset(mtcars,select=ImportantVars)
Error in `[.data.frame`(x, r, vars, drop = drop) :
  undefined columns selected

Any help on how to select these columns from the data frame would be most
appreciated.

--John J. Sparks, Ph.D.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] To List or Not To List

2013-05-16 Thread Sparks, John James
Dear R Helpers,

A few weeks ago I asked for some help on how to accomplish modifications
to data in a set of data frames.  As part of that request I mentioned that
I realized that one way to accomplish my goal was to put the data frames
together in a list but that I was looking for a way to do it with data
frames and a loop because I believe the better thing is to work df by df
for my particular situation.

A couple of posters asked me to provide more detail as to what is it about
my situation that made data frame alterations in a loop more appropriate
vs. a list.

Life and the scoring of many exams intervened in the last several days,
but with grades filed I am now able to return to this issue.

First, let me provide some particulars regarding my situation.  I am
working with 5,863 data frames, each with 7 columns and between 5,686 and
21 rows of data.  Each data frame contains the daily stock price history
for an equity traded on one of the U.S. markets.  I wanted to get an
historical price change for each of the days on the file.  If one were
working with a single data from for IBM then the command is

if(nrow(IBM)129){IBM$Mo129-ROC(IBM[,Close],n=129)}

to get the Rate Of Change of the stock price relative to 129 trading days
ago.  This function is in the TTR library which is called by quantmod.

So it strikes me that in one sense this is a simple fixed costs vs.
variable costs question:  Is it worth it to assemble the data frames into
a list and then process them, putatively more quickly than going data
frame by data frame, which does not require the up-front assembly.

A look at the empirical results shows executing this set of functions df
by df consumes 44.15 of elapsed time.

 ptm - proc.time()


   ROCFunc-function(DF){
+ if(nrow(DF)129){DF$Mo129-ROC(DF[,Close],n=129)}
+ if(nrow(DF) 65){DF$Mo65 -ROC(DF[,Close],n= 65)}
+ if(nrow(DF) 21){DF$Mo21 -ROC(DF[,Close],n= 21)}
+ if(nrow(DF) 10){DF$Mo10 -ROC(DF[,Close],n= 10)}
+ if(nrow(DF)  5){DF$Mo5  -ROC(DF[,Close],n=  5)}
+ return(DF)
+ }
 for(i in symbols) assign( i, ROCFunc(get(i)))


 time-proc.time() - ptm
 time
   user  system elapsed
  43.520.58   44.15


Using a list approach, the assembly of the list requires 8.44 and then the
processing requires 39.20 totaling 47.64.  So a slight win for the data
frame approach. [Continued]

 ptm - proc.time()

 list.object - quote(list())
 list.object[ symbols ] - lapply( symbols, as.name )
 biglist-eval(list.object)


 for (i in seq_along(biglist))
+   {
+biglist[[i]]-subset(biglist[[i]],select=-c(Open,High,Low))
+#biglist[[i]]-biglist[[i]][as.character(biglist[[i]]$Index) 
2007-01-01, ]
+#biglist[[i]]$Index- as.Date(biglist[[i]]$Index,format=%Y-%m-%d)
+#biglist[[i]]-xts(biglist[[i]][,-1],biglist[[i]][,1])
+#biglist[[i]]-biglist[[i]]['2005-01-01/']
+}

  proc.time() - ptm
   user  system elapsed
   8.030.408.44
  ptm - proc.time()

 rm(list=ls(pattern=^[A-Z]))

 for (i in seq_along(biglist))
+ {
+if(nrow(biglist[[i]])180)
+   {
+   biglist[[i]][[Mo180]]-ROC(biglist[[i]][[Close]],n=129)
+   }
+   if(nrow(biglist[[i]])90)
+   {
+   biglist[[i]][[Mo90]] -ROC(biglist[[i]][[Close]],n=65)
+   }
+   if(nrow(biglist[[i]])30)
+   {
+   biglist[[i]][[Mo30]] -ROC(biglist[[i]][[Close]],n=21)
+   }
+   if(nrow(biglist[[i]])10)
+   {
+   biglist[[i]][[Mo10]] -ROC(biglist[[i]][[Close]],n=10)
+   }
+   if(nrow(biglist[[i]])5)
+   {
+   biglist[[i]][[Mo5]] -ROC(biglist[[i]][[Close]],n=5)
+   }
+ }
 proc.time() - ptm
   user  system elapsed
  39.190.00   39.20


The larger issue for me, however, is recovering to the set of data frames
with the new calculations completed inside each one.  For this I used the
following syntax that I gleaned from the web:

data.frame(lapply(data.frame(t(sapply(biglist, `[`))), unlist))

But this results in
Error in FUN(X[[2003L]], ...) :
  promise already under evaluation: recursive default argument reference
or earlier problems?
Calls: data.frame - lapply - FUN
Execution halted

In previous executions I have seen the all to familiar error message
'unable to allocate a vector of size...' indicating to me that I have run
out of usable RAM at this last step.  I have 8G on my machine, so RAM
constraints are rarely a problem.  This is the main reason that I said
that I believed that a list approach was not the best for my situation: 
going that route will not result in a finished job.

I hope that this demonstration answers the questions of the posters who
posed the question and can potentially serve to provide an example to
those who, like me recently, are beginning to explore how to execute on
multiple data frames.  I hope that this outweighs the fact that I have not
asked a specific question nor provided re-producible 

[R] Adding Column to Data Frames Using a Loop

2013-05-01 Thread Sparks, John James
Dear R Helpers,

I am trying to do calculations on multiple data frames and do not want to
create a list of them to go through each one.  I know that lists have many
wonderful advantages, but I believe the better thing is to work df by df
for my particular situation.  For background, I have already received some
wonderful help on how to handle some situations, such as removing columns:


x=as.data.frame(matrix(c(1,2,3,
1,2,3,
1,2,2,
1,2,2,
   1,1,1),ncol=3,byrow=T))

y=as.data.frame(matrix(c(1,2,3,
1,2,3,
1,2,2,
1,2,2,
   1,1,1),ncol=3,byrow=T))

z=as.data.frame(matrix(c(1,2,3,
1,2,3,
1,2,2,
1,2,2,
   1,1,1),ncol=3,byrow=T))

for(i in letters[24:26] ) assign( i, subset(get(i), select=-c(V1))  )
x
y
z

And I figured how to do further processing using functions:

myfunc-function(DF){
 DF$V4-DF$V2+DF$V3
 return(DF)
}
for(i in letters[24:26] ) assign( i, myfunc(get(i)))

But if I want to do a rather simple calculation and store it as a new
column in each data frame such as

x$V4-x$V2+x$V3
y$V4-y$V2+y$V3
z$V4-z$V2+z$V3

is there a simpler way to do this than building a function as shown above?
 I tried a few variations of

i-24
assign(paste(i,$V4,sep=),paste(get(i),$V2+,get(i),$V3,sep=))

but keep getting syntax errors.

If anyone could help with the syntax as to how to accomplish the
calculation above without building a function, I would really appreciate
it.

--John Sparks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Function for Data Frame

2013-04-29 Thread Sparks, John James
Dear R Helpers,

I have about 20 data frames that I need to do a series of data scrubbing
steps to.  I have the list of data frames in a list so that I can use
lapply.  I am trying to build a function that will do the data scrubbing
that I need.  However, I am new to functions and there is something
fundamental that I am not understanding.  I use the return function at the
end of the function and this completes the data processing specified in
the function, but leaves the data frame that I want changed unaffected. 
How do I get my function to apply its results to the data frame in
question instead of simply displaying the results to the screen?

Any helpful guidance would be most appreciated.

--John Sparks


x=as.data.frame(matrix(c(1,2,3,
1,2,3,
1,2,2,
1,2,2,
   1,1,1),ncol=3,byrow=T))


myfunc-function(DF){
 DF-subset(DF,select=-c(V1))
 return(DF)
}

myfunc(x)

#How to get this change to data frame x?
#And preferrably not send the results to the screen?
x

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Remove Rows Based on Factor

2013-04-15 Thread Sparks, John James
Dear R Helpers,

I did a search for deleting rows based on conditions but wasn't able to
find an example that addressed the error that I am getting.  I am hoping
that this is a simple syntax phenomenon that somebody else knows off the
top of their head.  My apologies for not providing a reproducible example
but I think that the information given will allow someone to give me a
hint.

I want to delete the rows of the data frame ZZ where Index is earlier that
Jan 1 of 2007.  That Index column is a factor.  When I tired a couple of
different methods, I got the error shown below.  Can anybody tell me what
I am doing wrong?  I would really appreciate it.

--John Sparks

 str(ZZ)
'data.frame':   1584 obs. of  7 variables:
 $ Index   : Factor w/ 1583 levels 2006-04-07,2006-04-10,..: 1 2 3 4 5
6 7 8 9 10 ...
 $ Open: num  17.5 17.6 16.8 17.2 17 ...
 $ High: num  18.2 17.6 17.2 17.2 17.1 ...
 $ Low : num  17.3 16.8 16.8 16.8 16.6 ...
 $ Close   : num  17.5 16.8 17.1 16.8 16.7 ...
 $ Volume  : num  23834500 2916000 1453700 991400 967400 ...
 $ Adjusted: num  16.8 16.2 16.4 16.2 16 ...
 test-ZZ[ZZ$Index2007-01-01,]
Warning message:
In Ops.factor(ZZ$Index, 2007-01-01) :  not meaningful for factors

 test-subset(ZZ,Index2007-01-01)
Warning message:
In Ops.factor(Index, 2007 - 1 - 1) :  not meaningful for factors

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Create New Column Inside Data Frame for Many Data Frames

2013-04-14 Thread Sparks, John James
Dear R Helpers,

I have a large number of data frames and I need to create a new column
inside each data frame.  Because there is a large number, I need to loop
through this, but I don't know the syntax of assigning a new column name
dynamically.

Below is a simple example of what I need to do.  Assume that I have to do
this for all 26 letters and you should see the form of the problem.

Any help would be much appreciated.  If more information is needed, please
let me know.

Many thanks.
--John Sparks



library(quantmod)
A - data.frame(population=c(100, 300, 5000, 2000, 900, 2500))
A$Rate-ROC(A[population])

B - data.frame(population=c(200, 300, 4000, 3000, 2000, 500))
B$Rate-ROC(B[population])

letters-c(A,B)
length(letters)

#for (i in letters){
# HELP!
#}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Try Giving Invalid Argument Type Error

2012-05-19 Thread Sparks, John James
Dear R Helpers,

I am getting an error message from the try function that I don't
understand so I am hoping that someone can help.

I am scraping from web pages, but sometimes they disappear.  When that
happens I need to control for it with some sort of function.

This web page is parsed without a problem.


exh-NASDAQ
tic-EGHT
URL-paste(http://www.advfn.com/p.php?pid=financialsbtn=istart_datemode=quarterly_reportssymbol=;,
exh,%3A,tic,istart_date=0, sep = )
doc - htmlParse(URL)

However, when I change the value of tic it will not.

tic-AACOU
URL-paste(http://www.advfn.com/p.php?pid=financialsbtn=istart_datemode=quarterly_reportssymbol=;,
exh,%3A,tic,istart_date=0, sep = )
doc - htmlParse(URL)
Error in htmlParse(URL) :
  error in creating parser for
http://www.advfn.com/p.php?pid=financialsbtn=istart_datemode=quarterly_reportssymbol=NASDAQ0X1.CP-1072AACOUistart_date=0

I tried to account for this using the try function but I get the error
below that I don't understand.


options(error = expression(NULL))
URL-paste(http://www.advfn.com/p.php?pid=financialsbtn=istart_datemode=quarterly_reportssymbol=;,
exh,%3A,tic,istart_date=0, sep = )
if(
!is(
try(
doc - htmlParse(URL)
,try-error)
)
)

{
qtrstop - xpathApply(doc, count(//select/option))-5
}
Error in !silent : invalid argument type


Any help would be most appreciated.

--John Sparks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] quantmod getOptionChain Not Work

2012-04-08 Thread Sparks, John James
Michael,

I have not had time to look at this for a while but still wanted to say
thanks for looking into it and sending this solution.

By the way, Jeff mentioned that the version of quantmod on the SVN
(0.3.18) works for this.  I tried to figure out how to download that
version, but found the documentation on SVN's quite confusing.  Is there
anyway that you could make that version available?

Much appreciated.
--John Sparks



On Fri, March 23, 2012 5:55 pm, R. Michael Weylandt wrote:
 Sorry about that: two small mistakes and I imagine there are a few
 more I've missed.  This should actually work:

 ###


 library(XML)

 readYahooOptions - function(Symbols, Exp, ...){
   parse.expiry - function(x) {
 if(is.null(x))
   return(NULL)

 if(inherits(x, Date) || inherits(x, POSIXt))
   return(format(x, %Y-%m))

 if (nchar(x) == 5L) {
   x - sprintf(substring(x, 4, 5), match(substring(x,
1, 3),
 month.abb), fmt = 20%s-%02i)
 }
 else if (nchar(x) == 6L) {
   x - paste(substring(x, 1, 4), substring(x, 5, 6),
  sep = -)
 }
 return(x)
   }

   clean.opt.table - function(tableIn){
 tableOut - sapply(tableIn[,-2], function(x)
 as.numeric(gsub(,,,x)))
 rownames(tableOut) - tableIn[,2]
 tableOut
   }

   if(missing(Exp))
 optURL -
 paste(paste(http://finance.yahoo.com/q/op?s,Symbols,sep==;),Options,sep=+)
   else
 optURL -
 paste(paste(http://finance.yahoo.com/q/op?s=,Symbols,m=,parse.expiry(Exp),sep=),Options,sep=+)

   if(!missing(Exp)  is.null(Exp)) {
 optPage - readLines(optURL)
 optPage - optPage[grep(View By Expiration, optPage)]
 allExp - gregexpr(m=, optPage)[[1]][-1] + 2
 allExp - substring(optPage, allExp, allExp + 6)
 allExp - allExp[seq_len(length(allExp)-1)] # Last one seems useless ?
 return(structure(lapply(allExp, readYahooOptions,
 Symbols=Symbols), .Names=format(as.yearmon(allExp
   }

   stopifnot(require(XML))

   optURL - readHTMLTable(optURL)

   # Not smart to hard code these but it's a 'good-enough' hack for now
   # Also, what is table 9 on this page?

   list(calls = clean.opt.table(optURL[[10]]),
puts = clean.opt.table(optURL[[14]]),
symbol = Symbols)
 }



 On Fri, Mar 23, 2012 at 6:44 PM, R. Michael Weylandt
 michael.weyla...@gmail.com wrote:
 I just got around to taking a look at this, but below is a fix. It
 seems like yahoo finance redesigned the page and rather than reparsing
 all their HTML, I'll use Duncan TL's XML package to make life happier.
 (I loathe HTML parsing)

 This isn't thoroughly tested and it'll break if yahoo redesigns things
 again (I hardcode the table numbers for now) but it seems to work well
 enough. Let me know if you have any errors with it. If Jeff likes it,
 it should be a drop-in replacement for the getOptionChain.yahoo for
 quantmod with a name change.

 Feedback welcome,

 Michael

 #

 library(XML)

 readYahooOptions - function(Symbols, Exp, ...){
  parse.expiry - function(x) {
    if(is.null(x))
      return(NULL)

    if(inherits(x, Date) || inherits(x, POSIXt))
      return(format(x, %Y-%m))

    if (nchar(x) == 5L) {
      x - sprintf(substring(x, 4, 5), match(substring(x,
                                                       1, 3),
 month.abb), fmt = 20%s-%02i)
    }
    else if (nchar(x) == 6L) {
      x - paste(substring(x, 1, 4), substring(x, 5, 6),
                 sep = -)
    }
    return(x)
  }

  clean.opt.table - function(tableIn){
    tableOut - lapply(tableIn[,-2], function(x)
 as.numeric(gsub(,,,x)))
    rownames(tableOut) - tableIn[,2]
  }

  if(missing(Exp))
    optURL -
 paste(paste(http://finance.yahoo.com/q/op?s,Symbols,sep==;),Options,sep=+)
  else
    optURL -
 paste(paste(http://finance.yahoo.com/q/op?s=,Symbols,m=,parse.expiry(Exp),sep=),Options,sep=+)

  if(!missing(Exp)  is.null(Exp)) {
    optPage - readLines(optURL)
    optPage - optPage[grep(View By Expiration, optPage)]
    allExp - gregexpr(m=, optPage)[[1]][-1] + 2
    allExp - substring(optPage, allExp, allExp + 6)
    allExp - allExp[seq_len(length(allExp)-1)] # Last one seems
 useless ? Always true?
    return(structure(lapply(allExp, readYahooOptions,
 Symbols=Symbols), .Names=format(as.yearmon(allExp
  }

  stopifnot(require(XML))

  optURL - readHTMLTable(optURL)

  # Not smart to hard code these but it's a 'good-enough' hack for now
  # Also, what is table 9 on this page?
  CALLS - optURL[[10]]
  PUTS - optURL[[14]]

  list(calls = CALLS, puts = PUTS, symbol = Symbols)
 }


 ###

 On Sun, Mar 4, 2012 at 2:18 PM, Sparks, John James jspa...@uic.edu
 wrote:
 Dear R Helpers,

 I am still having trouble with the getOptionChain command in quantmod.
  I
 have the latest version of quantmod, etc. so I was under the impression
 that the problem

[R] quantmod getOptionChain Not Work

2012-03-04 Thread Sparks, John James
Dear R Helpers,

I am still having trouble with the getOptionChain command in quantmod.  I
have the latest version of quantmod, etc. so I was under the impression
that the problem was solved with updates to the package.

If someone could let me know what I need to install in order to make this
work, I would really appreciate it.

My error message as session info are shown below.  Thanks a bunch.
--John Sparks

R version 2.14.2 (2012-02-29)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] pomp_0.40-2  deSolve_1.10-3   subplex_1.1-3mvtnorm_0.9-9992
quantmod_0.3-17  TTR_0.21-0   xts_0.8-2zoo_1.7-7   
Defaults_1.1-1

loaded via a namespace (and not attached):
[1] grid_2.14.2lattice_0.20-0 tools_2.14.2
 AAPL.OPT-getOptionChain(AAPL)
Error in puts[, 2] : incorrect number of dimensions
 AAPL.OPT-getOptionChain(AAPL,NULL)
Error in puts[, 2] : incorrect number of dimensions


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Trouble with Paste and Quotes and List Objects

2011-06-18 Thread Sparks, John James
Dear R Helpers,

I have a list that contains a number of objects, each of them financial
statement data from quantmod (although I don't think that knowledge of
quantmod is necessary to help with this problem).

 str(listfinobj)
 chr [1:4815] A.f AA.f AACC.f AAME.f AAN.f AAON.f AAP.f
AAPL.f AAT.f AATI.f AAU.f ...

I can easily pick out the 3rd object in this list.
 listfinobj[[3]]
[1] AACC.f

Each of the .f objects has a mildly complicated structure (partial results
shown below).
 str(AACC.f)
List of 3
 $ IS:List of 2
  ..$ Q: num [1:49, 1:5] 50.4 NA 50.4 NA 50.4 ...
  .. ..- attr(*, dimnames)=List of 2
  .. .. ..$ : chr [1:49] Revenue Other Revenue, Total Total Revenue
Cost of Revenue, Total ...
  .. .. ..$ : chr [1:5] 2011-03-31 2010-12-31 2010-09-30
2010-06-30 ...
  .. ..- attr(*, col_desc)= chr [1:5] 3 months ending 2011-03-31 3
months ending 2010-12-31 3 months ending 2010-09-30 3 months ending
2010-06-30 ...
  ..$ A: num [1:49, 1:4] 198 NA 198 NA 198 ...
  .. ..- attr(*, dimnames)=List of 2
  .. .. ..$ : chr [1:49] Revenue Other Revenue, Total Total Revenue
Cost of Revenue, Total ...
  .. .. ..$ : chr [1:4] 2010-12-31 2009-12-31 2008-12-31 2007-12-31
  .. ..- attr(*, col_desc)= chr [1:4] 12 months ending 2010-12-31 12
months ending 2009-12-31 12 months ending 2008-12-31 12 months
ending 2007-12-31
 $ BS:List of 2
  ..$ Q: num [1:42, 1:5] NA NA 6.53 326.25 NA ...


I can get the column names for one of the sub-objects of this object.
 colnames(AACC.f$IS$A)
[1] 2010-12-31 2009-12-31 2008-12-31 2007-12-31

Thanks for your patience so far; here's the question.

I want to get the column names from all the sub objects in each of the .f
objects, so I want to build a loop, but I need to be able to refer to the
column names of the sub object dynamically.  My many attempts with paste
and get have not worked, I believe because of the quotes and the $'s.  For
example

 temp-colnames(paste(listfinobj[[3]],$BS$A)[1],sep=,)
Error: unexpected '$' in temp-colnames(paste(listfinobj[[3]],$

 as.name(paste(as.name(listfinobj[[3]]),as.name($BS$A),sep=))
`AACC.f$BS$A`
 colnames(as.name(paste(as.name(listfinobj[[3]]),as.name($BS$A),sep=)))
NULL
 as.factor(paste(as.name(listfinobj[[3]]),as.name($BS$A),sep=))
[1] AACC.f$BS$A
Levels: AACC.f$BS$A
 colnames(as.factor(paste(as.name(listfinobj[[3]]),as.name($BS$A),sep=)))
NULL

Please help me to understand how to refer to the column names in the
sub-objects of the objects in the list dynamically so that I can build a
loop to get at each of them.

Your help would be much appreciated.
--John J. Sparks, Ph.D.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Trouble Combining With Paste

2011-05-25 Thread Sparks, John James
Dear R Helpers,

I am having trouble combining some pieces of programming that work fine
individually, but fall down when I try to get them to work together.

The end goal is to take a data frame, and if any of the variables has more
than 10 values, then use cut2 to reduce the number of (effective) values
to 10.  I want to do this in automated fashion, which is where the
combining comes in.

For example all of these pieces work as I would expect:


tables-lapply(infert,table)
lengths-lapply(tables,length)
toolong-which(lengths10)

require(Hmisc)

foo-as.numeric(cut2(infert$age,g=10,levels.mean=TRUE))
str(foo)
#num [1:248] 2 10 9 7 7 8 1 6 1 3 ...

bar-paste(inftert$,attr(toolong[1],names),sep=)
bar
#[1] inftert$age

But the following gives an error:

foobar-as.numeric(cut2(paste(inftert$,attr(toolong[1],names),sep=),g=10,levels.mean=TRUE))
Error in min(diff(x.unique))/2 : non-numeric argument to binary operator
In addition: Warning message:
In min(diff(x.unique)) : no non-missing arguments, returning NA


Your guidance would be much appreciated.

--John J. Sparks, Ph.D.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Thiel's Uncertainty Coefficient

2011-05-25 Thread Sparks, John James
Dear R Helpers,

I was looking at the email help threads in trying to find a calculation in
R of Thiel's uncertainty coefficient.  One of the writers offered to send
the function in custom code to the inquirer.  Can I get a copy of that
code, or does anyone know if the calculation is now available in an R
package?

Please advise.  Many thanks.
--John J. Sparks, Ph.D.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Apply or Tapply to Build Set of Tables

2011-05-23 Thread Sparks, John James
Dear R Helpers,

First, I apologize for asking for help on the first of my topics.  I have
been looking at the posts and pages for apply, tapply etc, and I know that
the solution to this must be ridiculously easy, but I just can't seem to
get my brain around it.  If I want to produce a set of tables for all the
variables in my data, how can I do that without having to type them into
the table command one by one.  So, I would like to use (t? s? r?)apply to
use one command instead of the following set of table commands:

data(infert, package = datasets)
attach(infert)

table.education-table(education)
table.age-table(age)
table.parity-table(parity)
etc.


To make matters worse, what I subsequently need is the chi-square for each
and all of the pairs of variables.  Such as:

chi.education.age-chisq.test(table(education,age))
chi.education.parity-chisq.test(table(education,parity))
chi.age.parity-chisq.test(table(age,parity))
etc.

Your guidance would be much appreciated.

--John J. Sparks, Ph.D.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Apply or Tapply to Build Set of Tables

2011-05-23 Thread Sparks, John James
Dear R Helpers,

First, I apologize for asking for help on the first of my topics.  I have
been looking at the posts and pages for apply, tapply etc, and I know that
the solution to this must be ridiculously easy, but I just can't seem to
get my brain around it.  If I want to produce a set of tables for all the
variables in my data, how can I do that without having to type them into
the table command one by one.  So, I would like to use (t? s? r?)apply to
use one command instead of the following set of table commands:

data(infert, package = datasets)
attach(infert)

table.education-table(education)
table.age-table(age)
table.parity-table(parity)
etc.


To make matters worse, what I subsequently need is the chi-square for each
and all of the pairs of variables.  Such as:

chi.education.age-chisq.test(table(education,age))
chi.education.parity-chisq.test(table(education,parity))
chi.age.parity-chisq.test(table(age,parity))
etc.

Your guidance would be much appreciated.

--John J. Sparks, Ph.D.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Find String Between Characters

2011-05-15 Thread Sparks, John James
Hi Jim,

Thanks for your note.

Unfortunately, when I attempt your solution in my exact setting, I get a
weird and slightly different answer.

First, let me be more clear.  What I am attempting to do is pull the CIK
number out of the information from the web page itself after it has loaded
to R (this may not be optimal, but I am new at this), not from the web
page reference (as you have done).

So, when I execute the following as per your suggestion:

require(scrapeR)
mmm-scrape(url=http://www.sec.gov/cgi-bin/browse-edgar?action=getcompanyCIK=320193owner=excludecount=40;)

num - sub(^.*CIK=([0-9]+).*, \\1, mmm)

I get
[1] pointer: 0x001265c0

Is this just a hex representation of the same number, or is something else
going on here?

Comments from any and all would be much appreciated.

--John J. Sparks, Ph.D.

On Sat, May 14, 2011 7:57 pm, jim holtman wrote:
 Is this what you want:

 mmm-http://www.sec.gov/cgi-bin/browse-edgar?action=getcompanyCIK=320193owner=excludecount=40;
 num - sub(^.*CIK=([0-9]+).*, \\1, mmm)
 num
 [1] 320193



 On Sat, May 14, 2011 at 8:20 PM, Sparks, John James jspa...@uic.edu
 wrote:
 Dear R Helpers,

 I am trying to isolate a set of characters between two other characters
 in
 a long string file.  I tried some of the examples on the R help pages
 and
 elsewhere, but I am not able to get it.  Your help would be much
 appreciated.

 require(scrapeR)
 mmm-scrape(url=http://www.sec.gov/cgi-bin/browse-edgar?action=getcompanyCIK=320193owner=excludecount=40;)
 str(mmm)

 I want to get the number 320193 that is between the CIK= and the .
  I
 have tried

 g - grep( CIK=|, mmm )
 and
 temp-grep(mmm,\CIK=\)

 and variations on these themes, but all won't run or come bask as an
 empty
 object.  How can I grab this number?

 Best wishes,
 --John J. Sparks, Ph.D.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Changing Attribute With Paste

2011-05-14 Thread Sparks, John James
Dear R Helpers,

I am trying to adjust the attribute of an R object pulled from quantmod. 
Since I want to do this for many such objects, I was trying to make the
adjustment programmatic.  Unfortunately, I am having a huge amount of
trouble using attr in combination with paste (and perhaps get, and perhaps
assign, none of which seem to help).  When I hard-code the change it works
fine.  Your help would be much appreciated.


require(quantmod)
getFin(NYSE:A)

attr(NYSE.A.f,symbol)-A  #works fine

ticker-A
attr(paste(NYSE.,ticker,.f,sep=),symbol)-A #doesn't work
attr(get(paste(NYSE.,ticker,.f,sep=)),symbol)-A#nor does
this, nor the hundred other combinations I have tried

Best wishes,
--John J. Sparks, Ph.D.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Identify Objects that end with .f (and all caps)

2011-05-14 Thread Sparks, John James
Dear R Helpers,

I am trying to find a way to identify all the objects in my environment
that are all caps and then end with .f.  I can do the all caps part pretty
easily, but I have tried a number of variations on the \ and can't get a
recognition of that operator.  As a simple example

A.f-foo1
AA.f-foo2
aa.f-foo3
A.a-foo4
ls()
[1] A.a  A.f  aa.f AA.f
temp1-ls(pattern=[A-Z])
temp1
[1] A.a  A.f  AA.f
 temp2-ls(pattern=\f)
Error: unexpected input in temp2-ls(pattern=\

The end goal is to isolate A.f and AA.f and not the others.
In terms of just getting the 'ending with .f' portion, I have tried a
number of variations in the pattern=\f, but can't get R to recognize what
I want.

Your guidance would be much appreciated.
--John J. Sparks, Ph.D.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Find String Between Characters

2011-05-14 Thread Sparks, John James
Dear R Helpers,

I am trying to isolate a set of characters between two other characters in
a long string file.  I tried some of the examples on the R help pages and
elsewhere, but I am not able to get it.  Your help would be much
appreciated.

require(scrapeR)
mmm-scrape(url=http://www.sec.gov/cgi-bin/browse-edgar?action=getcompanyCIK=320193owner=excludecount=40;)
str(mmm)

I want to get the number 320193 that is between the CIK= and the .  I
have tried

g - grep( CIK=|, mmm )
and
temp-grep(mmm,\CIK=\)

and variations on these themes, but all won't run or come bask as an empty
object.  How can I grab this number?

Best wishes,
--John J. Sparks, Ph.D.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] If Then Trouble

2011-04-24 Thread Sparks, John James
Dear R Helpers,

I have another one of those problems involving a very simple step, but due
to my inexperience I can't find a way to solve it.  I had a look at a
number of on-line references, but they don't speak to this problem.

I have a variable with 20 values

 table (testY2$redgroups)

1 2 3 4 5 6 7 8 9101112   
1314151617181920
   69   734  6079 18578 13693  6412  3548  1646   659   323   12988   
904057333617 613

Values 18,19 and 20 have small counts.  So, I want to set the value of
redgroups for these rows to 17 in order to combine groups.  I would think
that it would be as easy as

if(testY2$redgroups17) testY2$redgroups-17

following the syntax that I have seen in the manuals.  However, I get the
error message

Warning message:
In if (testY2$redgroups  17) testY2$redgroups - 17 :
  the condition has length  1 and only the first element will be used

Can someone please tell me the correct syntax for this?  I would really
appreciate it.

Appreciatively yours,
--John J. Sparks, Ph.D.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Assign Character Value to Data Frame

2011-04-12 Thread Sparks, John James
Dear R Helpers,

I am trying to write a character value to the row of a data frame and am
running into a problem that I don't have when I do this for numeric
arguments.  For example, the following works just fine:

 test-data.frame(number=numeric(1))
 test[1,]-.5
 test
  number
10.5

But the following bombs out:

 hold-data.frame(symbol=character(1))
 hold[1,]-NYSE:MMM
Warning message:
In `[-.factor`(`*tmp*`, iseq, value = NYSE:MMM) :
  invalid factor level, NAs generated

Could someone please guide me as to what adjustment I need to make to
assign this character value to this row of the data frame?  Your help
would be very much appreciated.

--John Sparks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Assign with Paste Problem

2011-04-12 Thread Sparks, John James
Dear R Helpers,

I am trying to change the name of an object using the assign function. 
When I use paste on the new object but not the old, everything is fine:
The new object is a direct copy of the old object.  When I use a paste for
both the new and the old object, however, the new object is simply the
character representation of the old object name, not the old object
itself.

The example below uses quantmod, but you don't need that to see the nature
of the problem.

How can I get the new object to be a complete copy of the old object when
I use paste in both sides of the assign?

Your help would be most appreciated.
--John Sparks

 #Careful!  Removes everything in working directory!
 rm(list = ls())

 ticker-F

 require(quantmod)

 getFin(paste(NYSE:,ticker,sep=))
[1] NYSE.F.f
 NYSE.F.f
Financial Statement for NYSE:F
Retrieved from google at 2011-04-12 20:03:20
Use viewFinancials or viewFin to view

 F.f-NYSE.F.f
 F.f
Financial Statement for NYSE:F
Retrieved from google at 2011-04-12 20:03:20
Use viewFinancials or viewFin to view

 rm(F.f)

 assign(paste(ticker,.f,sep=),NYSE.F.f)
 F.f
Financial Statement for NYSE:F
Retrieved from google at 2011-04-12 20:03:20
Use viewFinancials or viewFin to view

 rm(F.f)

 assign(paste(ticker,.f,sep=),paste(NYSE.,ticker,.f,sep=))
 F.f
[1] NYSE.F.f


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Replacing Period in String

2011-03-20 Thread Sparks, John James
Dear R Users,

I am working with gsub for the first time.  I am trying to remove some
characters from a string.  I have hit the problem where the period is the
shorthand for 'everything' in the R language when what I want to remove is
the actual periods.  In the example below, I simply want to remove the
periods as I have removed the comma, but instead the complete string is
wiped out.  I would appreciate it if someone could let me know how I
communicate that I want to remove the period verbatim to R.

Many thanks.
--John Sparks

 txt=This is a test. However, it is only a test.
 txt2-gsub(,,,txt)
 txt2
[1] This is a test. However it is only a test.
 txt3-gsub(.,,txt)
 txt3
[1] 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] quantmod Some Single Letter Tickers Not getFin

2011-03-17 Thread Sparks, John James
Hi,

I have been learning the quantmod package over the last several days.  I
went to check some of my data pulls against other sources and was
surprised to find that a few tickers that have single characters do not
successfully scrape from Google Finance using getFin().  Particularly

require(quantmod)
getFin(A)
getFin(E)
getFin(F)
getFin(G)
getFin(M)

all result in a file not found error.  I show the last one below.

 getFin(M)
Error in download.file(paste(google.fin, Symbol, sep = ), quiet = TRUE,  :
  cannot open URL 'http://finance.google.com/finance?fstype=iiq=M'
In addition: Warning message:
In download.file(paste(google.fin, Symbol, sep = ), quiet = TRUE,  :
  cannot open: HTTP status was '400 Bad Request'


I checked out the financial statement pages for all of these and they
exist and are as expected:  5 quarters worth of quarterly figures (except
for cash-flow which has 4 quarters) and 4 years of annual figures.  All
the rows are also present by comparing a scrape to excel with the figures
for Y, which does getFin(Y) without a problem.

I was hoping that someone who knows a lot more about scraping then I do
could look into this.

Best wishes to all,
--John Sparks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Set a Numeric Field To Blank

2011-02-08 Thread Sparks, John James
Hi,

I have one of those questions that I suspect is very simple, but hard to
classify, so I have been searching for quite some time and am not able to
find it.

If I have a data frame and I want to change all the values of one of the
columns to blanks, what is the syntax?  I tried a few different spellings
of Null, etc., but can't get it.  Can someone please send me what I
suspect is a one line solution to this?

Many thanks,
--John J. Sparks, Ph.D.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Picking Part of Large R Object

2010-08-01 Thread Sparks, John James
Dear All,

I have imported an HTML document to R (called tables) and wish to select
certain pieces of it for processing.  The first few lines of the object
appear as follows:

 tables
[[1]]
table id=fs-table class=gf-table rgt
  thead
trth class=lm lft nwp
In Millions of USD (except for per share items)
/th
th class=rgt
3 months ending 2010-06-30
/th
th class=rgt
3 months ending 2010-03-31
/th
th class=rgt
3 months ending 2009-12-31
/th
th class=rgt
3 months ending 2009-09-30
/th
th class=rgt rm
3 months ending 2009-06-30
/th
/tr
  /thead
  tbody
!-- 1 row for one coaitem --
trtd class=lft lmRevenue
/td
td class=r16,039.00/td
td class=r14,503.00/td
td class=r19,022.00/td
td class=r12,920.00/td
td class=r rm13,099.00/td
/tr


The next major partition of the object is:

[[2]]
table id=fs-table class=gf-table rgt
  thead
trth class=lm lft nwp
In Millions of USD (except for per share items)
/th
th class=rgt
12 months ending 2010-06-30
/th
th class=rgt
12 months ending 2009-06-30
/th
th class=rgt
12 months ending 2008-06-30
/th
th class=rgt rm
12 months ending 2007-06-30
/th
/tr
  /thead
  tbody
!-- 1 row for one coaitem --
trtd class=lft lmRevenue
/td
td class=r62,484.00/td
td class=r58,437.00/td
td class=r60,420.00/td
td class=r rm51,122.00/td
/tr
trtd class=lft lmOther Revenue, Total
/td
td class=r-/td
td class=r-/td
td class=r-/td
td class=r rm-/td
/tr
tr class=hilitetd class=lft lm bldTotal Revenue
/td
td class=r bld62,484.00/td
td class=r bld58,437.00/td
td class=r bld60,420.00/td
td class=r bld rm51,122.00/td
/tr
trtd class=lft lmCost of Revenue, Total
/td
td class=r12,395.00/td
td class=r12,155.00/td
td class=r11,598.00/td
td class=r rm10,693.00/td


How can I specify the part of the R object denoted by [[1]] and put it
into a new object for processing.  As in table1-...

I have tried many variations of [[1]], c[1], etc. but haven't had any
luck.  Guidance would be much appreciated.

--John Sparks, Ph.D.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ScrapeR Unanticipated XML objects

2010-08-01 Thread Sparks, John James
Dear All,

I have come across a very surprising result as I have started to learn how
to use R to pull data from the web for analysis.

I am trying to isolate that table headers for the quarterly income
statement (qtrinc) that I pulled from Google finance.  I executed the
following commands after installing the scrapeR package.

require(scrapeR)
htmlfile-scrape(url=http://www.google.com/finance?q=NASDAQ:MSFTfstype=ii,headers=TRUE,parse=TRUE)

tables-xpathSApply(htmlfile[[1]],//table)
qtrinc-tables[[1]]
xpathSApply(qtrinc,//thead,xmlValue)


I receive the result:

[1] \nIn Millions of USD (except for per share items)\n\n\n3 months
ending 2010-06-30\n\n\n3 months ending 2010-03-31\n\n\n3 months ending
2009-12-31\n\n\n3 months ending 2009-09-30\n\n\n3 months ending
2009-06-30\n\n
[2] \nIn Millions of USD (except for per share items)\n\n\n12 months
ending 2010-06-30\n\n\n12 months ending 2009-06-30\n\n\n12 months ending
2008-06-30\n\n\n12 months ending 2007-06-30\n\n
[3] \nIn Millions of USD (except for per share items)\n\n\nAs of
2010-06-30\n\n\nAs of 2010-03-31\n\n\nAs of 2009-12-31\n\n\nAs of
2009-09-30\n\n\nAs of 2009-06-30\n\n
[4] \nIn Millions of USD (except for per share items)\n\n\nAs of
2010-06-30\n\n\nAs of 2009-06-30\n\n\nAs of 2008-06-30\n\n\nAs of
2007-06-30\n\n
[5] \nIn Millions of USD (except for per share items)\n\n\n12 months
ending 2010-06-30\n\n\n9 months ending 2010-03-31\n\n\n6 months ending
2009-12-31\n\n\n3 months ending 2009-09-30\n\n
[6] \nIn Millions of USD (except for per share items)\n\n\n12 months
ending 2010-06-30\n\n\n12 months ending 2009-06-30\n\n\n12 months ending
2008-06-30\n\n\n12 months ending 2007-06-30\n\n


Interestingly, only the first of these table headers exists in the list
qtrinc (if you list(qtrinc) you will see what I mean).  These are actually
the table headers for all the tables in the object htmlfile.

Can someone please help me isolate the table headers for only the object
qtrinc?

As long as I am at it, I also don't know how to remove the \n characters
when calling the data.

Help would be much appreciated.

--John Sparks, Ph.D.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.