Re: [R] iconv question: SQL Server 2005 to R

Ira Sharenow Wed, 09 Oct 2013 10:45:11 -0700

Thanks for everyoneâs feedback. It may be that the person who is using 
SQL Server 2005 is doing something that is not helpful.

In the data there are supposed to be 10 columns with headers. The first 
column is a date time that I was reading in as a string. The second is a 
string. The rest are numeric.

While I am focusing on one file I will have to deal with this issue on 
an ongoing basis.

There are no accented characters in the dataset. There should not be any 
problematic characters in the dataset. When I go into Excel and save as 
comma separated and then do text to columns with comma as the separator 
and then format the first column as date (m/d/Y) I can import into R and 
I am on my way.

When I look at the data in Notepad, it looks like an ordinary comma 
separated file.

When I open the file in Excel 2007, I notice that each rowâs data is 
only in column A.

When I do nothing with the file and just try to read the file into R, 
the following happens.

> test06 =  read.table("test2006.csv", sep = ",", header = TRUE, as.is = TRUE)

> test06[1:6,  1:2]

Error in `[.data.frame`(test06, 1:6, 1:2) : undefined columns selected

> test06[1:6]

Error in `[.data.frame`(test06, 1:6) : undefined columns selected

> head(test06)

Ã¿Ã¾D

1NA

2NA

3NA

4NA

5NA

6NA

This alternative produced the same results.

test06 = read.table("test2006.csv", sep = ",", header = TRUE, 
stringsAsFactors = FALSE)

Using a dataset based on a SQL Server 2008 book but using SQL Server 
2012, I could NOT duplicate the problem. I used two strategies. I copied 
the output and then did file save as comma separated and got everything 
lined up nicely, but no headers. If I pasted into Excel, then I got the 
headers. But otherwise I was able to read everything into R.

Interestingly different things happen depending on whether I try to open 
from Windows 7 Explorer or from within Excel. There are no Excel issues 
if I open with Explorer. If I try to open with Excel 2007 I get 
different behavior depending on how I saved in SQL Server 2012.

If I did select all; copy, save as csv in SQL, then when trying to open 
in Excel I get the data import wizard for a DELIMITED file. This issue 
does not arise when opening from Explorer.

If I did select all; copy in SQL and then pasted into Excel and saved as 
csv, there are no problems whether I open in Explorer or Excel. No 
import wizard.

When I try to open the data file that I need in Explorer, the file opens 
but everything is in the first column.

When I try to open the data file that I need using Excel 2007, the 
import wizard opens and starts off as FIXED WIDTH.

As I began to have a better understanding of the issues, I thought back 
to what my colleague wrote when I stated the problem to the colleague. I 
think the key may be instruction 2. I did a Find from within Notepad and 
also from within Excel and could not find any quote marks.

The colleague's advice. Instruction 3 relates to the fact that the data 
values in column 2 are padded on the right with spaces.

These files are created by SQL on a server.Hereâs an easy way to read 
the files.

1) Open up each file in Notepad, replace all â with a space, then save

2) Change the file type to â.txtâ

3) Open each file in excel using a comma as a delimitator. [You might 
change column B to text format or use âfind and replaceâ to delete the 
spaces in the StockID.]

On 10/9/2013 2:37 AM, Milan Bouchet-Valat wrote:
> Le mardi 08 octobre 2013 Ã  16:02 -0700, Ira Sharenow a Ã©crit :
>> A colleague is sending me quite a few files that have been saved with MS
>> SQL Server 2005. I am using R 2.15.1 on Windows 7.
>>
>> I am trying to read in the files using standard techniques. Although the
>> file has a csv extension when I go to Excel or WordPad and do SAVE AS I
>> see that it is Unicode Text. Notepad indicates that the encoding is
>> Unicode. Right now I have to do a few things from within Excel (such as
>> Text to Columns) and eventually save as a true csv file before I can
>> read it into R and then use it.
>>
>> Is there an easy way to solve this from within R? I am also open to easy
>> SQL Server 2005 solutions.
>>
>> I tried the following from within R.
>>
>> testDF = read.table("Info06.csv", header = TRUE, sep = ",")
>>
>>> testDF2 =  iconv(x = testDF, from = "Unicode", to = "")
>> Error in iconv(x = testDF, from = "Unicode", to = "") :
>>
>> unsupported conversion from 'Unicode' to '' in codepage 1252
>>
>> # The next line did not produce an error message
>>
>>> testDF3 =  iconv(x = testDF, from = "UTF-8" , to = "")
>>> testDF3[1:6,  1:3]
>> Error in testDF3[1:6, 1:3] : incorrect number of dimensions
>>
>> # The next line did not produce an error message
>>
>>> testDF4 =  iconv(x = testDF, from = "macroman" , to = "")
>>> testDF4[1:6,  1:3]
>> Error in testDF4[1:6, 1:3] : incorrect number of dimensions
>>
>>>   Encoding(testDF3)
>> [1] "unknown"
>>
>>>   Encoding(testDF4)
>> [1] "unknown"
>>
>> This is the first few lines from WordPad
>>
>> Date,StockID,Price,MktCap,ADV,SectorID,Days,A1,std1,std2
>>
>> 2006-01-03
>> 00:00:00.000,@Stock1,2.53,467108197.38,567381.144444444,4,133.14486997089,-0.0162107939626307,0.0346283580367959,0.0126471695454834
>>
>> 2006-01-03
>> 00:00:00.000,@Stock2,1.3275,829803070.531114,6134778.93292,5,124.632223896458,0.071513138376339,0.0410694546850102,0.0172091268025929
> What's the actual problem? You did not state any. Do you get accentuated
> characters that are not printed correctly after importing the file? In
> the two lines above it does not look like there would be any non-ASCII
> characters in this file, so encoding would not matter.
>
>
> Regards
>

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] iconv question: SQL Server 2005 to R

Reply via email to