Thanks for everyoneâs feedback. It may be that the person who is using
SQL Server 2005 is doing something that is not helpful.
In the data there are supposed to be 10 columns with headers. The first
column is a date time that I was reading in as a string. The second is a
string. The rest are numeric.
While I am focusing on one file I will have to deal with this issue on
an ongoing basis.
There are no accented characters in the dataset. There should not be any
problematic characters in the dataset. When I go into Excel and save as
comma separated and then do text to columns with comma as the separator
and then format the first column as date (m/d/Y) I can import into R and
I am on my way.
When I look at the data in Notepad, it looks like an ordinary comma
separated file.
When I open the file in Excel 2007, I notice that each rowâs data is
only in column A.
When I do nothing with the file and just try to read the file into R,
the following happens.
> test06 = read.table("test2006.csv", sep = ",", header = TRUE, as.is = TRUE)
> test06[1:6, 1:2]
Error in `[.data.frame`(test06, 1:6, 1:2) : undefined columns selected
> test06[1:6]
Error in `[.data.frame`(test06, 1:6) : undefined columns selected
> head(test06)
ÿþD
1NA
2NA
3NA
4NA
5NA
6NA
This alternative produced the same results.
test06 = read.table("test2006.csv", sep = ",", header = TRUE,
stringsAsFactors = FALSE)
Using a dataset based on a SQL Server 2008 book but using SQL Server
2012, I could NOT duplicate the problem. I used two strategies. I copied
the output and then did file save as comma separated and got everything
lined up nicely, but no headers. If I pasted into Excel, then I got the
headers. But otherwise I was able to read everything into R.
Interestingly different things happen depending on whether I try to open
from Windows 7 Explorer or from within Excel. There are no Excel issues
if I open with Explorer. If I try to open with Excel 2007 I get
different behavior depending on how I saved in SQL Server 2012.
If I did select all; copy, save as csv in SQL, then when trying to open
in Excel I get the data import wizard for a DELIMITED file. This issue
does not arise when opening from Explorer.
If I did select all; copy in SQL and then pasted into Excel and saved as
csv, there are no problems whether I open in Explorer or Excel. No
import wizard.
When I try to open the data file that I need in Explorer, the file opens
but everything is in the first column.
When I try to open the data file that I need using Excel 2007, the
import wizard opens and starts off as FIXED WIDTH.
As I began to have a better understanding of the issues, I thought back
to what my colleague wrote when I stated the problem to the colleague. I
think the key may be instruction 2. I did a Find from within Notepad and
also from within Excel and could not find any quote marks.
The colleague's advice. Instruction 3 relates to the fact that the data
values in column 2 are padded on the right with spaces.
These files are created by SQL on a server.Hereâs an easy way to read
the files.
1) Open up each file in Notepad, replace all â with a space, then save
2) Change the file type to â.txtâ
3) Open each file in excel using a comma as a delimitator. [You might
change column B to text format or use âfind and replaceâ to delete the
spaces in the StockID.]
On 10/9/2013 2:37 AM, Milan Bouchet-Valat wrote:
> Le mardi 08 octobre 2013 à 16:02 -0700, Ira Sharenow a écrit :
>> A colleague is sending me quite a few files that have been saved with MS
>> SQL Server 2005. I am using R 2.15.1 on Windows 7.
>>
>> I am trying to read in the files using standard techniques. Although the
>> file has a csv extension when I go to Excel or WordPad and do SAVE AS I
>> see that it is Unicode Text. Notepad indicates that the encoding is
>> Unicode. Right now I have to do a few things from within Excel (such as
>> Text to Columns) and eventually save as a true csv file before I can
>> read it into R and then use it.
>>
>> Is there an easy way to solve this from within R? I am also open to easy
>> SQL Server 2005 solutions.
>>
>> I tried the following from within R.
>>
>> testDF = read.table("Info06.csv", header = TRUE, sep = ",")
>>
>>> testDF2 = iconv(x = testDF, from = "Unicode", to = "")
>> Error in iconv(x = testDF, from = "Unicode", to = "") :
>>
>> unsupported conversion from 'Unicode' to '' in codepage 1252
>>
>> # The next line did not produce an error message
>>
>>> testDF3 = iconv(x = testDF, from = "UTF-8" , to = "")
>>> testDF3[1:6, 1:3]
>> Error in testDF3[1:6, 1:3] : incorrect number of dimensions
>>
>> # The next line did not produce an error message
>>
>>> testDF4 = iconv(x = testDF, from = "macroman" , to = "")
>>> testDF4[1:6, 1:3]
>> Error in testDF4[1:6, 1:3] : incorrect number of dimensions
>>
>>> Encoding(testDF3)
>> [1] "unknown"
>>
>>> Encoding(testDF4)
>> [1] "unknown"
>>
>> This is the first few lines from WordPad
>>
>> Date,StockID,Price,MktCap,ADV,SectorID,Days,A1,std1,std2
>>
>> 2006-01-03
>> 00:00:00.000,@Stock1,2.53,467108197.38,567381.144444444,4,133.14486997089,-0.0162107939626307,0.0346283580367959,0.0126471695454834
>>
>> 2006-01-03
>> 00:00:00.000,@Stock2,1.3275,829803070.531114,6134778.93292,5,124.632223896458,0.071513138376339,0.0410694546850102,0.0172091268025929
> What's the actual problem? You did not state any. Do you get accentuated
> characters that are not printed correctly after importing the file? In
> the two lines above it does not look like there would be any non-ASCII
> characters in this file, so encoding would not matter.
>
>
> Regards
>
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.