Re: [R-SIG-Finance] 4-digit SIC codes

2013-02-05 Thread David Reiner
Very nice, Garrett!
More curious than anything, but does anyone know why I get the extraneous 
characters when I do it?
They are present in x as well. I believe they are non-breaking spaces.

 head(SIC)
  SICCode A/D  Office    Industry Title
4 1005 ÂAGRICULTURAL PRODUCTION-CROPS
5 2005 Â AGRICULTURAL PROD-LIVESTOCK  ANIMAL SPECIALTIES
6 7005 ÂAGRICULTURAL SERVICES
7 8005 Â FORESTRY
8 9005 ÂFISHING, HUNTING AND TRAPPING
910009 Â METAL MINING
 sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252  
  LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C   LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] XML_3.95-0.1

loaded via a namespace (and not attached):
[1] tools_2.15.2

Thanks,
-- David Reiner


-Original Message-
From: r-sig-finance-boun...@r-project.org 
[mailto:r-sig-finance-boun...@r-project.org] On Behalf Of G See
Sent: Monday, February 04, 2013 9:30 PM
To: Bastian Offermann
Cc: r-sig-finance@r-project.org
Subject: Re: [R-SIG-Finance] 4-digit SIC codes

I'm not sure, but here's a really quick and dirty way to get it

 library(XML)
 x - readHTMLTable(http://www.sec.gov/info/edgar/siccodes.htm;,
  stringsAsFactors=FALSE)[[4]]
 colnames(x) - x[2, ]
 SIC - x[-c(1:3), ]
 head(SIC)
  SICCode A/D  OfficeIndustry Title
4 100   5 AGRICULTURAL PRODUCTION-CROPS
5 200   5  AGRICULTURAL PROD-LIVESTOCK  ANIMAL SPECIALTIES
6 700   5 AGRICULTURAL SERVICES
7 800   5  FORESTRY
8 900   5 FISHING, HUNTING AND TRAPPING
91000   9  METAL MINING

 SIC[SIC$SICCode == 2834, ]
   SICCode A/D  Office   Industry Title
912834   1  PHARMACEUTICAL PREPARATIONS

HTH,
Garrett

On Mon, Feb 4, 2013 at 9:19 PM, Bastian Offermann
bastian250...@yahoo.co.uk wrote:
 Hi,
 does anybody know whether 4-digit SIC codes are available in R? Something
 along the lines

 2834 Pharmaceutical Preparations

 Thank you.

 ___
 R-SIG-Finance@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-sig-finance
 -- Subscriber-posting only. If you want to post, subscribe first.
 -- Also note that this is not the r-help list where general R questions
 should go.

___
R-SIG-Finance@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should 
go.


This e-mail and any materials attached hereto, including, without limitation, 
all content hereof and thereof (collectively, XR Content) are confidential 
and proprietary to XR Trading, LLC (XR) and/or its affiliates, and are 
protected by intellectual property laws.  Without the prior written consent of 
XR, the XR Content may not (i) be disclosed to any third party or (ii) be 
reproduced or otherwise used by anyone other than current employees of XR or 
its affiliates, on behalf of XR or its affiliates.

THE XR CONTENT IS PROVIDED AS IS, WITHOUT REPRESENTATIONS OR WARRANTIES OF ANY 
KIND.  TO THE MAXIMUM EXTENT PERMISSIBLE UNDER APPLICABLE LAW, XR HEREBY 
DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS AND IMPLIED, RELATING TO THE XR 
CONTENT, AND NEITHER XR NOR ANY OF ITS AFFILIATES SHALL IN ANY EVENT BE LIABLE 
FOR ANY DAMAGES OF ANY NATURE WHATSOEVER, INCLUDING, BUT NOT LIMITED TO, 
DIRECT, INDIRECT, CONSEQUENTIAL, SPECIAL AND PUNITIVE DAMAGES, LOSS OF PROFITS 
AND TRADING LOSSES, RESULTING FROM ANY PERSON'S USE OR RELIANCE UPON, OR 
INABILITY TO USE, ANY XR CONTENT, EVEN IF XR IS ADVISED OF THE POSSIBILITY OF 
SUCH DAMAGES OR IF SUCH DAMAGES WERE FORESEEABLE.

___
R-SIG-Finance@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should 
go.


Re: [R-SIG-Finance] 4-digit SIC codes

2013-02-05 Thread Chirag Anand
On 5 February 2013 19:50, David Reiner david.rei...@xrtrading.com wrote:
 Very nice, Garrett!
 More curious than anything, but does anyone know why I get the extraneous 
 characters when I do it?
 They are present in x as well. I believe they are non-breaking spaces.

Hi David,

I generally get rid of these characters by doing:
$ LC_ALL=C /usr/bin/R

You may want to export this variable during shell spawn, so that you
don't have to do this every time.
HTH


 -Original Message-
 From: r-sig-finance-boun...@r-project.org 
 [mailto:r-sig-finance-boun...@r-project.org] On Behalf Of G See
 Sent: Monday, February 04, 2013 9:30 PM
 To: Bastian Offermann
 Cc: r-sig-finance@r-project.org
 Subject: Re: [R-SIG-Finance] 4-digit SIC codes

 I'm not sure, but here's a really quick and dirty way to get it

 library(XML)
 x - readHTMLTable(http://www.sec.gov/info/edgar/siccodes.htm;,
   stringsAsFactors=FALSE)[[4]]
 colnames(x) - x[2, ]
 SIC - x[-c(1:3), ]
 head(SIC)
   SICCode A/D  OfficeIndustry Title
 4 100   5 AGRICULTURAL PRODUCTION-CROPS
 5 200   5  AGRICULTURAL PROD-LIVESTOCK  ANIMAL SPECIALTIES
 6 700   5 AGRICULTURAL SERVICES
 7 800   5  FORESTRY
 8 900   5 FISHING, HUNTING AND TRAPPING
 91000   9  METAL MINING

 SIC[SIC$SICCode == 2834, ]
SICCode A/D  Office   Industry Title
 912834   1  PHARMACEUTICAL PREPARATIONS

 HTH,
 Garrett

 On Mon, Feb 4, 2013 at 9:19 PM, Bastian Offermann
 bastian250...@yahoo.co.uk wrote:
 Hi,
 does anybody know whether 4-digit SIC codes are available in R? Something
 along the lines

 2834 Pharmaceutical Preparations

 Thank you.

 ___
 R-SIG-Finance@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-sig-finance
 -- Subscriber-posting only. If you want to post, subscribe first.
 -- Also note that this is not the r-help list where general R questions
 should go.

 ___
 R-SIG-Finance@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-sig-finance
 -- Subscriber-posting only. If you want to post, subscribe first.
 -- Also note that this is not the r-help list where general R questions 
 should go.


 This e-mail and any materials attached hereto, including, without limitation, 
 all content hereof and thereof (collectively, XR Content) are confidential 
 and proprietary to XR Trading, LLC (XR) and/or its affiliates, and are 
 protected by intellectual property laws.  Without the prior written consent 
 of XR, the XR Content may not (i) be disclosed to any third party or (ii) be 
 reproduced or otherwise used by anyone other than current employees of XR or 
 its affiliates, on behalf of XR or its affiliates.

 THE XR CONTENT IS PROVIDED AS IS, WITHOUT REPRESENTATIONS OR WARRANTIES OF 
 ANY KIND.  TO THE MAXIMUM EXTENT PERMISSIBLE UNDER APPLICABLE LAW, XR HEREBY 
 DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS AND IMPLIED, RELATING TO THE XR 
 CONTENT, AND NEITHER XR NOR ANY OF ITS AFFILIATES SHALL IN ANY EVENT BE 
 LIABLE FOR ANY DAMAGES OF ANY NATURE WHATSOEVER, INCLUDING, BUT NOT LIMITED 
 TO, DIRECT, INDIRECT, CONSEQUENTIAL, SPECIAL AND PUNITIVE DAMAGES, LOSS OF 
 PROFITS AND TRADING LOSSES, RESULTING FROM ANY PERSON'S USE OR RELIANCE UPON, 
 OR INABILITY TO USE, ANY XR CONTENT, EVEN IF XR IS ADVISED OF THE POSSIBILITY 
 OF SUCH DAMAGES OR IF SUCH DAMAGES WERE FORESEEABLE.

 ___
 R-SIG-Finance@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-sig-finance
 -- Subscriber-posting only. If you want to post, subscribe first.
 -- Also note that this is not the r-help list where general R questions 
 should go.



--
Chirag Anand
http://atvariance.in

___
R-SIG-Finance@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should 
go.


Re: [R-SIG-Finance] 4-digit SIC codes

2013-02-05 Thread G See
There are actually non-break spaces in the source code of the page.
If you look at it, you will see things like this:

BA/D nbsp;BR

Whether or not XML::trim gets rid of them for you may be OS specific.
See and answer to an old question of mine on R-help for example
https://stat.ethz.ch/pipermail/r-help/2012-February/302417.html

Best,
Garrett

On Tue, Feb 5, 2013 at 8:20 AM, David Reiner david.rei...@xrtrading.com wrote:
 Very nice, Garrett!
 More curious than anything, but does anyone know why I get the extraneous 
 characters when I do it?
 They are present in x as well. I believe they are non-breaking spaces.

 head(SIC)
   SICCode A/D  Office    Industry Title
 4 1005 ÂAGRICULTURAL PRODUCTION-CROPS
 5 2005 Â AGRICULTURAL PROD-LIVESTOCK  ANIMAL SPECIALTIES
 6 7005 ÂAGRICULTURAL SERVICES
 7 8005 Â FORESTRY
 8 9005 ÂFISHING, HUNTING AND TRAPPING
 910009 Â METAL MINING
 sessionInfo()
 R version 2.15.2 (2012-10-26)
 Platform: x86_64-w64-mingw32/x64 (64-bit)

 locale:
 [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United 
 States.1252LC_MONETARY=English_United States.1252
 [4] LC_NUMERIC=C   LC_TIME=English_United States.1252

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 other attached packages:
 [1] XML_3.95-0.1

 loaded via a namespace (and not attached):
 [1] tools_2.15.2

 Thanks,
 -- David Reiner


 -Original Message-
 From: r-sig-finance-boun...@r-project.org 
 [mailto:r-sig-finance-boun...@r-project.org] On Behalf Of G See
 Sent: Monday, February 04, 2013 9:30 PM
 To: Bastian Offermann
 Cc: r-sig-finance@r-project.org
 Subject: Re: [R-SIG-Finance] 4-digit SIC codes

 I'm not sure, but here's a really quick and dirty way to get it

 library(XML)
 x - readHTMLTable(http://www.sec.gov/info/edgar/siccodes.htm;,
   stringsAsFactors=FALSE)[[4]]
 colnames(x) - x[2, ]
 SIC - x[-c(1:3), ]
 head(SIC)
   SICCode A/D  OfficeIndustry Title
 4 100   5 AGRICULTURAL PRODUCTION-CROPS
 5 200   5  AGRICULTURAL PROD-LIVESTOCK  ANIMAL SPECIALTIES
 6 700   5 AGRICULTURAL SERVICES
 7 800   5  FORESTRY
 8 900   5 FISHING, HUNTING AND TRAPPING
 91000   9  METAL MINING

 SIC[SIC$SICCode == 2834, ]
SICCode A/D  Office   Industry Title
 912834   1  PHARMACEUTICAL PREPARATIONS

 HTH,
 Garrett

 On Mon, Feb 4, 2013 at 9:19 PM, Bastian Offermann
 bastian250...@yahoo.co.uk wrote:
 Hi,
 does anybody know whether 4-digit SIC codes are available in R? Something
 along the lines

 2834 Pharmaceutical Preparations

 Thank you.

 ___
 R-SIG-Finance@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-sig-finance
 -- Subscriber-posting only. If you want to post, subscribe first.
 -- Also note that this is not the r-help list where general R questions
 should go.

 ___
 R-SIG-Finance@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-sig-finance
 -- Subscriber-posting only. If you want to post, subscribe first.
 -- Also note that this is not the r-help list where general R questions 
 should go.


 This e-mail and any materials attached hereto, including, without limitation, 
 all content hereof and thereof (collectively, XR Content) are confidential 
 and proprietary to XR Trading, LLC (XR) and/or its affiliates, and are 
 protected by intellectual property laws.  Without the prior written consent 
 of XR, the XR Content may not (i) be disclosed to any third party or (ii) be 
 reproduced or otherwise used by anyone other than current employees of XR or 
 its affiliates, on behalf of XR or its affiliates.

 THE XR CONTENT IS PROVIDED AS IS, WITHOUT REPRESENTATIONS OR WARRANTIES OF 
 ANY KIND.  TO THE MAXIMUM EXTENT PERMISSIBLE UNDER APPLICABLE LAW, XR HEREBY 
 DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS AND IMPLIED, RELATING TO THE XR 
 CONTENT, AND NEITHER XR NOR ANY OF ITS AFFILIATES SHALL IN ANY EVENT BE 
 LIABLE FOR ANY DAMAGES OF ANY NATURE WHATSOEVER, INCLUDING, BUT NOT LIMITED 
 TO, DIRECT, INDIRECT, CONSEQUENTIAL, SPECIAL AND PUNITIVE DAMAGES, LOSS OF 
 PROFITS AND TRADING LOSSES, RESULTING FROM ANY PERSON'S USE OR RELIANCE UPON, 
 OR INABILITY TO USE, ANY XR CONTENT, EVEN IF XR IS ADVISED OF THE POSSIBILITY 
 OF SUCH DAMAGES OR IF SUCH DAMAGES WERE FORESEEABLE.

On Tue, Feb 5, 2013 at 8:20 AM, David Reiner david.rei...@xrtrading.com wrote: