Re: [R-SIG-Finance] 4-digit SIC codes
Very nice, Garrett! More curious than anything, but does anyone know why I get the extraneous characters when I do it? They are present in x as well. I believe they are non-breaking spaces. head(SIC) SICCode A/D  Office  Industry Title 4 1005 ÂAGRICULTURAL PRODUCTION-CROPS 5 2005  AGRICULTURAL PROD-LIVESTOCK ANIMAL SPECIALTIES 6 7005 ÂAGRICULTURAL SERVICES 7 8005  FORESTRY 8 9005 ÂFISHING, HUNTING AND TRAPPING 910009  METAL MINING sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] XML_3.95-0.1 loaded via a namespace (and not attached): [1] tools_2.15.2 Thanks, -- David Reiner -Original Message- From: r-sig-finance-boun...@r-project.org [mailto:r-sig-finance-boun...@r-project.org] On Behalf Of G See Sent: Monday, February 04, 2013 9:30 PM To: Bastian Offermann Cc: r-sig-finance@r-project.org Subject: Re: [R-SIG-Finance] 4-digit SIC codes I'm not sure, but here's a really quick and dirty way to get it library(XML) x - readHTMLTable(http://www.sec.gov/info/edgar/siccodes.htm;, stringsAsFactors=FALSE)[[4]] colnames(x) - x[2, ] SIC - x[-c(1:3), ] head(SIC) SICCode A/D OfficeIndustry Title 4 100 5 AGRICULTURAL PRODUCTION-CROPS 5 200 5 AGRICULTURAL PROD-LIVESTOCK ANIMAL SPECIALTIES 6 700 5 AGRICULTURAL SERVICES 7 800 5 FORESTRY 8 900 5 FISHING, HUNTING AND TRAPPING 91000 9 METAL MINING SIC[SIC$SICCode == 2834, ] SICCode A/D Office Industry Title 912834 1 PHARMACEUTICAL PREPARATIONS HTH, Garrett On Mon, Feb 4, 2013 at 9:19 PM, Bastian Offermann bastian250...@yahoo.co.uk wrote: Hi, does anybody know whether 4-digit SIC codes are available in R? Something along the lines 2834 Pharmaceutical Preparations Thank you. ___ R-SIG-Finance@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. ___ R-SIG-Finance@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. This e-mail and any materials attached hereto, including, without limitation, all content hereof and thereof (collectively, XR Content) are confidential and proprietary to XR Trading, LLC (XR) and/or its affiliates, and are protected by intellectual property laws. Without the prior written consent of XR, the XR Content may not (i) be disclosed to any third party or (ii) be reproduced or otherwise used by anyone other than current employees of XR or its affiliates, on behalf of XR or its affiliates. THE XR CONTENT IS PROVIDED AS IS, WITHOUT REPRESENTATIONS OR WARRANTIES OF ANY KIND. TO THE MAXIMUM EXTENT PERMISSIBLE UNDER APPLICABLE LAW, XR HEREBY DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS AND IMPLIED, RELATING TO THE XR CONTENT, AND NEITHER XR NOR ANY OF ITS AFFILIATES SHALL IN ANY EVENT BE LIABLE FOR ANY DAMAGES OF ANY NATURE WHATSOEVER, INCLUDING, BUT NOT LIMITED TO, DIRECT, INDIRECT, CONSEQUENTIAL, SPECIAL AND PUNITIVE DAMAGES, LOSS OF PROFITS AND TRADING LOSSES, RESULTING FROM ANY PERSON'S USE OR RELIANCE UPON, OR INABILITY TO USE, ANY XR CONTENT, EVEN IF XR IS ADVISED OF THE POSSIBILITY OF SUCH DAMAGES OR IF SUCH DAMAGES WERE FORESEEABLE. ___ R-SIG-Finance@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go.
Re: [R-SIG-Finance] 4-digit SIC codes
On 5 February 2013 19:50, David Reiner david.rei...@xrtrading.com wrote: Very nice, Garrett! More curious than anything, but does anyone know why I get the extraneous characters when I do it? They are present in x as well. I believe they are non-breaking spaces. Hi David, I generally get rid of these characters by doing: $ LC_ALL=C /usr/bin/R You may want to export this variable during shell spawn, so that you don't have to do this every time. HTH -Original Message- From: r-sig-finance-boun...@r-project.org [mailto:r-sig-finance-boun...@r-project.org] On Behalf Of G See Sent: Monday, February 04, 2013 9:30 PM To: Bastian Offermann Cc: r-sig-finance@r-project.org Subject: Re: [R-SIG-Finance] 4-digit SIC codes I'm not sure, but here's a really quick and dirty way to get it library(XML) x - readHTMLTable(http://www.sec.gov/info/edgar/siccodes.htm;, stringsAsFactors=FALSE)[[4]] colnames(x) - x[2, ] SIC - x[-c(1:3), ] head(SIC) SICCode A/D OfficeIndustry Title 4 100 5 AGRICULTURAL PRODUCTION-CROPS 5 200 5 AGRICULTURAL PROD-LIVESTOCK ANIMAL SPECIALTIES 6 700 5 AGRICULTURAL SERVICES 7 800 5 FORESTRY 8 900 5 FISHING, HUNTING AND TRAPPING 91000 9 METAL MINING SIC[SIC$SICCode == 2834, ] SICCode A/D Office Industry Title 912834 1 PHARMACEUTICAL PREPARATIONS HTH, Garrett On Mon, Feb 4, 2013 at 9:19 PM, Bastian Offermann bastian250...@yahoo.co.uk wrote: Hi, does anybody know whether 4-digit SIC codes are available in R? Something along the lines 2834 Pharmaceutical Preparations Thank you. ___ R-SIG-Finance@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. ___ R-SIG-Finance@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. This e-mail and any materials attached hereto, including, without limitation, all content hereof and thereof (collectively, XR Content) are confidential and proprietary to XR Trading, LLC (XR) and/or its affiliates, and are protected by intellectual property laws. Without the prior written consent of XR, the XR Content may not (i) be disclosed to any third party or (ii) be reproduced or otherwise used by anyone other than current employees of XR or its affiliates, on behalf of XR or its affiliates. THE XR CONTENT IS PROVIDED AS IS, WITHOUT REPRESENTATIONS OR WARRANTIES OF ANY KIND. TO THE MAXIMUM EXTENT PERMISSIBLE UNDER APPLICABLE LAW, XR HEREBY DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS AND IMPLIED, RELATING TO THE XR CONTENT, AND NEITHER XR NOR ANY OF ITS AFFILIATES SHALL IN ANY EVENT BE LIABLE FOR ANY DAMAGES OF ANY NATURE WHATSOEVER, INCLUDING, BUT NOT LIMITED TO, DIRECT, INDIRECT, CONSEQUENTIAL, SPECIAL AND PUNITIVE DAMAGES, LOSS OF PROFITS AND TRADING LOSSES, RESULTING FROM ANY PERSON'S USE OR RELIANCE UPON, OR INABILITY TO USE, ANY XR CONTENT, EVEN IF XR IS ADVISED OF THE POSSIBILITY OF SUCH DAMAGES OR IF SUCH DAMAGES WERE FORESEEABLE. ___ R-SIG-Finance@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. -- Chirag Anand http://atvariance.in ___ R-SIG-Finance@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go.
Re: [R-SIG-Finance] 4-digit SIC codes
There are actually non-break spaces in the source code of the page. If you look at it, you will see things like this: BA/D nbsp;BR Whether or not XML::trim gets rid of them for you may be OS specific. See and answer to an old question of mine on R-help for example https://stat.ethz.ch/pipermail/r-help/2012-February/302417.html Best, Garrett On Tue, Feb 5, 2013 at 8:20 AM, David Reiner david.rei...@xrtrading.com wrote: Very nice, Garrett! More curious than anything, but does anyone know why I get the extraneous characters when I do it? They are present in x as well. I believe they are non-breaking spaces. head(SIC) SICCode A/D  Office  Industry Title 4 1005 ÂAGRICULTURAL PRODUCTION-CROPS 5 2005  AGRICULTURAL PROD-LIVESTOCK ANIMAL SPECIALTIES 6 7005 ÂAGRICULTURAL SERVICES 7 8005  FORESTRY 8 9005 ÂFISHING, HUNTING AND TRAPPING 910009  METAL MINING sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] XML_3.95-0.1 loaded via a namespace (and not attached): [1] tools_2.15.2 Thanks, -- David Reiner -Original Message- From: r-sig-finance-boun...@r-project.org [mailto:r-sig-finance-boun...@r-project.org] On Behalf Of G See Sent: Monday, February 04, 2013 9:30 PM To: Bastian Offermann Cc: r-sig-finance@r-project.org Subject: Re: [R-SIG-Finance] 4-digit SIC codes I'm not sure, but here's a really quick and dirty way to get it library(XML) x - readHTMLTable(http://www.sec.gov/info/edgar/siccodes.htm;, stringsAsFactors=FALSE)[[4]] colnames(x) - x[2, ] SIC - x[-c(1:3), ] head(SIC) SICCode A/D OfficeIndustry Title 4 100 5 AGRICULTURAL PRODUCTION-CROPS 5 200 5 AGRICULTURAL PROD-LIVESTOCK ANIMAL SPECIALTIES 6 700 5 AGRICULTURAL SERVICES 7 800 5 FORESTRY 8 900 5 FISHING, HUNTING AND TRAPPING 91000 9 METAL MINING SIC[SIC$SICCode == 2834, ] SICCode A/D Office Industry Title 912834 1 PHARMACEUTICAL PREPARATIONS HTH, Garrett On Mon, Feb 4, 2013 at 9:19 PM, Bastian Offermann bastian250...@yahoo.co.uk wrote: Hi, does anybody know whether 4-digit SIC codes are available in R? Something along the lines 2834 Pharmaceutical Preparations Thank you. ___ R-SIG-Finance@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. ___ R-SIG-Finance@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. This e-mail and any materials attached hereto, including, without limitation, all content hereof and thereof (collectively, XR Content) are confidential and proprietary to XR Trading, LLC (XR) and/or its affiliates, and are protected by intellectual property laws. Without the prior written consent of XR, the XR Content may not (i) be disclosed to any third party or (ii) be reproduced or otherwise used by anyone other than current employees of XR or its affiliates, on behalf of XR or its affiliates. THE XR CONTENT IS PROVIDED AS IS, WITHOUT REPRESENTATIONS OR WARRANTIES OF ANY KIND. TO THE MAXIMUM EXTENT PERMISSIBLE UNDER APPLICABLE LAW, XR HEREBY DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS AND IMPLIED, RELATING TO THE XR CONTENT, AND NEITHER XR NOR ANY OF ITS AFFILIATES SHALL IN ANY EVENT BE LIABLE FOR ANY DAMAGES OF ANY NATURE WHATSOEVER, INCLUDING, BUT NOT LIMITED TO, DIRECT, INDIRECT, CONSEQUENTIAL, SPECIAL AND PUNITIVE DAMAGES, LOSS OF PROFITS AND TRADING LOSSES, RESULTING FROM ANY PERSON'S USE OR RELIANCE UPON, OR INABILITY TO USE, ANY XR CONTENT, EVEN IF XR IS ADVISED OF THE POSSIBILITY OF SUCH DAMAGES OR IF SUCH DAMAGES WERE FORESEEABLE. On Tue, Feb 5, 2013 at 8:20 AM, David Reiner david.rei...@xrtrading.com wrote: