Re: [R] Read data with different column lengths

jim holtman Wed, 07 Mar 2007 16:05:02 -0800

I would have thought that a csv file written out by Excel would have looked
like this for your data:


ADS GY Equity,                 ,ALV GY Equity,
Date        ,   Px Last        ,Date          , Px Last
07/02/04    ,"   41,395          ",07/01/31      ," 130,234"
07/02/05    ,"   42,134          ",07/02/01      ," 133,353"
07/02/06    ,"   41,875          ",07/02/04      ," 133,824"
             ,                 , 07/02/05," 134,734"

Here we have commas separating the fields and this will read in the data:

> x <- read.csv("/tempxx.csv", skip=2, header=FALSE, as.is=TRUE)
> str(x)
'data.frame':   4 obs. of  4 variables:
 $ V1: chr  "07/02/04    " "07/02/05    " "07/02/06    " "             "
 $ V2: chr  "   41,395          " "   42,134          " "   41,875
 " "                 "
 $ V3: chr  "07/01/31      " "07/02/01      " "07/02/04      " " 07/02/05"
 $ V4: chr  " 130,234" " 133,353" " 133,824" " 134,734"
> x
             V1                  V2             V3       V4
1  07/02/04        41,395           07/01/31        130,234
2  07/02/05        42,134           07/02/01        133,353
3  07/02/06        41,875           07/02/04        133,824
4                                         07/02/05  134,734
>

You will have to delete the ',' from the numeric fields, The 'dec=","' on
your command would have read '133,353' in as '133.353' which I don't think
is what you want.  You can delete the commas and convert to numeric with the
following:

> x$V4 <- as.numeric(gsub(',', '', x$V4))
> x
             V1                  V2             V3     V4
1  07/02/04        41,395           07/01/31       130234
2  07/02/05        42,134           07/02/01       133353
3  07/02/06        41,875           07/02/04       133824
4                                         07/02/05 134734
>

So take a look at the options that you have with writing out the Excel data;
you should be able to get either tab or comma delimited.  If not, you can
try reading in each line as a character string and then using 'substr' to
split it apart assuming that you have constant width columns.


On 3/7/07, Hofert Marius <[EMAIL PROTECTED]> wrote:
>
> Dear r-help users,
>
> I have the following simple problem: Reading data from a file. The
> file is a .txt file exported ("save as...") from Excel (see below for
> an example). The Excel file consists of two header rows (first row
> consists of ticker symbols of stocks, the second row consists of
> column explanations ("Date","Px Last"), followed by several rows of
> data. Now forget about the first two rows, I can deal with that (read
> separately, then extract the actual ticker symbols "ADS",
> "ALV", ...). For reading the rest, I tried several things, for example:
> data=read.table(infile,quote="",fill=T,dec=",",skip=2,colClasses=rep(c
> ("character","numeric"),ntickers))
> or
> data=matrix(scan(file=infile,what=rep(c
> ("character","numeric"),ntickers),dec=",",skip=2),ncol=2*ntickers,byrow=
> T)
> where "infile" specifies the path to the input file and "ntickers" is
> the number of ticker-columns in the data set, so in the example
> below, ntickers=2.
>
> Both ways of reading the data work perfectly fine if all columns have
> the same length (i.e. the same number of filled rows), so if the data
> is given in a (filled) "rectangular" form. Now, as you can imagine,
> there are days when one stock is traded but not the other... so,
> there might be columns that do not have the same number of filled
> rows (see below, for the stock with ticker symbol "ADS", only 3
> trading days are shown, so this column is shorter than the data
> column for the stock "ALV"). Now, if I export such a structure to
> a .txt file, then all (by default) blank fields will be replaced by
> "\t", i.e. tabs. Both reading procedures as give above have problems
> as they either display that the number of rows/columns do not fit
> together or as they read the table, but some cells are shifted to the
> left (for the example below, the entry "07/02/05        134,7" appears in
> the empty field of the stock "ADS" which is of course not what we want).
> So the simple question is: How do I read such a structure?
> Can there be a simple solution? The problem is simply that empty
> cells are replace by "\t" which are then ignored for reading. So how
> do we distinguish between the empty cells that are given between the
> columns and the empty cells that actually "fill" a column to have the
> same length as other columns. Of course I could manually put in a
> certain character (e.g. a "*") to fill in the gaps, but the data set
> is simply too large. If it helps, these blank fields only appear in
> the end of each column, not in the middle.
>
> As I work on a Mac (OS X 10.4), it was not possible (at least to me)
> to read the data directly from the Excel file vial the library RODBC
> or read.xls.
>
> Note, that the same problem arises, when I export the Excel file as
> a .csv, then all blank fields are separated by ";" instead of "\t"
> and the reading procedure can also not decide if the field
> corresponds to an empty separating column or actually to a column
> with given entries, but which is simply not as long as another column
> in the file.
>
> Hope, you can help. I would really appreciate it.
>
> Best regards.
>
> Marius
>
> Excel example (I hope it's displayed correctly, the entry in the last
> row should be aligned with the last column):
>
> ADS GY Equity                   ALV GY Equity
> Date            Px Last         Date            Px Last
> 07/02/04        41,395          07/01/31        130,234
> 07/02/05        42,134          07/02/01        133,353
> 07/02/06        41,875          07/02/04        133,824
>                                                07/02/05        134,734
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Read data with different column lengths

Reply via email to