Re: [R] Reading fixed column format

2006-09-13 Thread Gabor Grothendieck
I know you would prefer a 100% R solution but using the unix cut
command (a Windows version is available in tools.zip at:
http://www.murdoch-sutherland.com/Rtools/
) is really easy.  Maybe if you preprocessed it with that you
could then use read.fwf.

For example, look how easy it was to cut this file down to half
extracting columns 2-3 and 6-8:

C:\bintype a.dat
123456789
123456789
123456789

C:\bincut -c2-3,6-8 a.dat
23678
23678
23678


On 9/13/06, Anupam Tyagi [EMAIL PROTECTED] wrote:
 Barry Rowlingson B.Rowlingson at lancaster.ac.uk writes:


   None of these seem to read non-coniguous variables from columns; or
   may be I am missing something. read.fwf is not meant for large
   files according to a post in the archives. Thanks for the pointers. I
   have read the R data input and output. Anupam.
 
First up, how 'large' is your 'large ASCII file'? How many rows and
  columns?

 There are 356,112 records, 326 variables, fixed record length of 1283 
 positions.
 Zipped file is 42MB. There are no field (variable) separaters (delimiters).

Secondly, what are 'non-contiguous' variables?

 Variables that are not in adjoining positions in the file: reading them from 
 the
 file would require skipping columns while reading. For example, below are the
 start positions of the first three variables I would like to read.

 StartingColumn  VariableNameFieldLength
 1   STATE   2
 24  INTVID  3
 30  PSU 10


Perhaps if you posted the first few lines and columns of the file then
  we might get an idea of how to read it in.

 Because a record (row) of the file is 1283 columns, I would not like to post 
 it
 here.

 Thank you for your response.

 Anupam.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading fixed column format

2006-09-13 Thread Anupam Tyagi
Gabor Grothendieck ggrothendieck at gmail.com writes:

 C:\bincut -c2-3,6-8 a.dat
 23678
 23678
 23678

Thanks. I think this will work. How do I redirect the output to a file on
windows? Is there simple way to convert the cut command to a script on windows,
because the entire command may not fit on one line? Anupam.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading fixed column format

2006-09-13 Thread Anupam Tyagi
Barry Rowlingson B.Rowlingson at lancaster.ac.uk writes:

  None of these seem to read non-coniguous variables from columns; or 
  may be I am missing something. read.fwf is not meant for large
  files according to a post in the archives. Thanks for the pointers. I
  have read the R data input and output. Anupam.
 
   First up, how 'large' is your 'large ASCII file'? How many rows and 
 columns?

There are 356,112 records, and 326 variables. It has a fixed record length of
1283 positions, therefore cut -b can not be used.
 
   Secondly, what are 'non-contiguous' variables?

When I do not want to read all columns. For example, I would like to read the
following:

StartingColumn  VariableNameFieldLength
1   STATE   2
24  INTVID  3
27  DISPCODE 3
30  PSU 10

Sometimes I would also like to format the data after it has been read. For
example, the ASCII file has price in columns 100 to 105 written as 005999. I
want to read this and format it as 59.99 (omitting leading zeros in the price).

   Perhaps if you posted the first few lines and columns of the file then 
 we might get an idea of how to read it in.

I have not even downloaded the data onto my computer yet, because I am not sure
I can read it in. The zipped file is 67MB. Using similar data a few years ago, I
recall the unzipped file to be about 350--400 MB. I had used MySQL then, but it
took some doing to get it in, and there were things that did not seem to work as
I wanted them to---I could not figure out how to label the variables. I usually
do not have to work with a dataframe of more than 10-30 MB at a time.

It would be good to have a facility in R which defines the meta-data: labelling
and structure of the dataset: positions of variables, their names, their lables,
their levels (e.g. for ordered choice or group variables: yes, sometimes, no
type responses). This can be saved as a seperate object and passed to a function
that gets the named varibales from the ASCII file (names of variables to get can
be given as arguments or as, attaches the meta data and creates a dataframe with
all the meta-data attached. The meta-data of the dataframe could include notes
at dataframe and variable level, and other information. This information is
passed on to the plotting functions and used when formatting the output of
statistical procedures.

I agree with with Michael Kobovy that this is a very helpful list, and people do
not owe less than what one paid for the software :)

Anupam.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading fixed column format

2006-09-13 Thread Duncan Murdoch
Anupam Tyagi wrote:
 Barry Rowlingson B.Rowlingson at lancaster.ac.uk writes:

   
 None of these seem to read non-coniguous variables from columns; or 
 may be I am missing something. read.fwf is not meant for large
 files according to a post in the archives. Thanks for the pointers. I
 have read the R data input and output. Anupam.
   
   First up, how 'large' is your 'large ASCII file'? How many rows and 
 columns?
 

 There are 356,112 records, and 326 variables. It has a fixed record length of
 1283 positions, therefore cut -b can not be used.
  
   
   Secondly, what are 'non-contiguous' variables?
 

 When I do not want to read all columns. For example, I would like to read the
 following:

 StartingColumn  VariableName  FieldLength
 1 STATE   2
 24INTVID  3
 27DISPCODE 3
 30PSU 10
   

read.fwf() can handle the skipped columns (you use negative column 
values; see the man page).  It will break the read up into blocks, so 
the large size of the original file shouldn't be a problem.

Duncan Murdoch

 Sometimes I would also like to format the data after it has been read. For
 example, the ASCII file has price in columns 100 to 105 written as 005999. I
 want to read this and format it as 59.99 (omitting leading zeros in the 
 price).

   
   Perhaps if you posted the first few lines and columns of the file then 
 we might get an idea of how to read it in.
 

 I have not even downloaded the data onto my computer yet, because I am not 
 sure
 I can read it in. The zipped file is 67MB. Using similar data a few years 
 ago, I
 recall the unzipped file to be about 350--400 MB. I had used MySQL then, but 
 it
 took some doing to get it in, and there were things that did not seem to work 
 as
 I wanted them to---I could not figure out how to label the variables. I 
 usually
 do not have to work with a dataframe of more than 10-30 MB at a time.

 It would be good to have a facility in R which defines the meta-data: 
 labelling
 and structure of the dataset: positions of variables, their names, their 
 lables,
 their levels (e.g. for ordered choice or group variables: yes, sometimes, no
 type responses). This can be saved as a seperate object and passed to a 
 function
 that gets the named varibales from the ASCII file (names of variables to get 
 can
 be given as arguments or as, attaches the meta data and creates a dataframe 
 with
 all the meta-data attached. The meta-data of the dataframe could include notes
 at dataframe and variable level, and other information. This information is
 passed on to the plotting functions and used when formatting the output of
 statistical procedures.

 I agree with with Michael Kobovy that this is a very helpful list, and people 
 do
 not owe less than what one paid for the software :)

 Anupam.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading fixed column format

2006-09-13 Thread Barry Rowlingson
Anupam Tyagi wrote:

 There are 356,112 records, and 326 variables. It has a fixed record length of
 1283 positions, therefore cut -b can not be used.

Okay, thats 'large' enough to be awkward...

 It would be good to have a facility in R which defines the meta-data: 
 labelling
 and structure of the dataset: positions of variables, their names, their 
 lables,
 their levels (e.g. for ordered choice or group variables: yes, sometimes, no
 type responses). This can be saved as a seperate object and passed to a 
 function
 that gets the named varibales from the ASCII file (names of variables to get 
 can
 be given as arguments or as, attaches the meta data and creates a dataframe 
 with
 all the meta-data attached. The meta-data of the dataframe could include notes
 at dataframe and variable level, and other information. This information is
 passed on to the plotting functions and used when formatting the output of
 statistical procedures.

  I think you need the following functions to build that kind of thing in R:

  * z = unz(/tmp/file.zip,data.dat) - to create a connection to a 
file in a zip archive - this saves you having to explicitly unzip it...

  * open(z) - to open the connection to the file in the zip...

  * readLines(z,n) - to read 'n' lines from the current position in the 
file...

  * seek(z,m*lineLength-1) - to jump to line 'm' ready to read it.

  Then its just 'substr' and similar string-chopping functions to build 
up the data from each line you want.

  If I had a spare day...

Barry

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading fixed column format

2006-09-13 Thread Barry Rowlingson
Barry Rowlingson wrote:


   If I had a spare day...

  Or if I'd just read Duncan's message about negative widths in read.fwf.

  Anyway, I've learnt about readLines() and seek() and reading zip files 
now, so I can read _anything_

Barry

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading fixed column format

2006-09-13 Thread Anupam Tyagi
Barry Rowlingson B.Rowlingson at lancaster.ac.uk writes:

   Or if I'd just read Duncan's message about negative widths in read.fwf.
 
   Anyway, I've learnt about readLines() and seek() and reading zip files 
 now, so I can read _anything_

Thanks to everyone who answered my query. I have a lot to think about too.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading fixed column format

2006-09-13 Thread Gabor Grothendieck
On 9/13/06, Anupam Tyagi [EMAIL PROTECTED] wrote:
 Gabor Grothendieck ggrothendieck at gmail.com writes:

  C:\bincut -c2-3,6-8 a.dat
  23678
  23678
  23678

 Thanks. I think this will work. How do I redirect the output to a file on
 windows?

Same as on UNIX

cut -c2-3,6-8 a.dat  a2.dat

 Is there simple way to convert the cut command to a script on windows,

Using notepad or other text editor put it in file a.bat and then
issue this command from the console

a.bat

Note that you could process it multiple time if you like:

cut -c6-8 a.dat  a2.dat
cut -c2-3 a2.dat  a3.dat

produces the same thing but uses 2 passes and so keeps each line shorter.
Be sure you do it from the tail end forward as shown above to avoid having
to recalculate the positions.

 because the entire command may not fit on one line? Anupam.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading fixed column format

2006-09-13 Thread Jason Barnhart
Another possibility:

1) Split the original file into smaller chunks of xx,xxx of rows.
2) Process each file using read.fwf saving the requisite variables.
   (If necessary, save each intermediate matrix/data.frame to disk
   to conserve space)
3) 'rbind' the results.

Not exactly elegant but it works.

- Original Message - 
From: Gabor Grothendieck [EMAIL PROTECTED]
To: Anupam Tyagi [EMAIL PROTECTED]
Cc: r-help@stat.math.ethz.ch
Sent: Wednesday, September 13, 2006 7:21 AM
Subject: Re: [R] Reading fixed column format


 On 9/13/06, Anupam Tyagi [EMAIL PROTECTED] wrote:
 Gabor Grothendieck ggrothendieck at gmail.com writes:

  C:\bincut -c2-3,6-8 a.dat
  23678
  23678
  23678

 Thanks. I think this will work. How do I redirect the output to a file on
 windows?

 Same as on UNIX

 cut -c2-3,6-8 a.dat  a2.dat

 Is there simple way to convert the cut command to a script on windows,

 Using notepad or other text editor put it in file a.bat and then
 issue this command from the console

 a.bat

 Note that you could process it multiple time if you like:

 cut -c6-8 a.dat  a2.dat
 cut -c2-3 a2.dat  a3.dat

 produces the same thing but uses 2 passes and so keeps each line shorter.
 Be sure you do it from the tail end forward as shown above to avoid having
 to recalculate the positions.

 because the entire command may not fit on one line? Anupam.


 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading fixed column format

2006-09-13 Thread Gabor Grothendieck
On 9/13/06, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 On 9/13/06, Anupam Tyagi [EMAIL PROTECTED] wrote:
  Gabor Grothendieck ggrothendieck at gmail.com writes:
 
   C:\bincut -c2-3,6-8 a.dat
   23678
   23678
   23678
 
  Thanks. I think this will work. How do I redirect the output to a file on
  windows?

 Same as on UNIX

 cut -c2-3,6-8 a.dat  a2.dat

  Is there simple way to convert the cut command to a script on windows,

 Using notepad or other text editor put it in file a.bat and then
 issue this command from the console

 a.bat

 Note that you could process it multiple time if you like:

 cut -c6-8 a.dat  a2.dat
 cut -c2-3 a2.dat  a3.dat

Sorry that's wrong.  It should be:

cut -c2-3 a.dat  a1.dat
cut -c6-8 a.dat  a2.dat

Now read in each of the files, a1.dat, a2.dat into R.



 produces the same thing but uses 2 passes and so keeps each line shorter.
 Be sure you do it from the tail end forward as shown above to avoid having
 to recalculate the positions.

  because the entire command may not fit on one line? Anupam.
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading fixed column format

2006-09-13 Thread Steve Miller
How about using python/perl/ruby, designed precisely for this type of
routine data munging, to pipe the processed output into an R dataframe?

 

msci - read.table(pipe(python steve/python/msci.py), header=T, as.is=T)

 

Iteratively, you could deliver the python output in chunks, something like:

 

msci - read.table(pipe(python steve/python/msci.py 1 50), header=T,
as.is=T)

 

msci - rbind(msci, read.table(pipe(python steve/python/msci.py 51
100), header=T, as.is=T))

 

etc.

 

Steve Miller

 

 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jason Barnhart
Sent: Wednesday, September 13, 2006 11:52 AM
To: Gabor Grothendieck; Anupam Tyagi
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Reading fixed column format

 

Another possibility:

 

1) Split the original file into smaller chunks of xx,xxx of rows.

2) Process each file using read.fwf saving the requisite variables.

   (If necessary, save each intermediate matrix/data.frame to disk

   to conserve space)

3) 'rbind' the results.

 

Not exactly elegant but it works.

 

- Original Message - 

From: Gabor Grothendieck [EMAIL PROTECTED]

To: Anupam Tyagi [EMAIL PROTECTED]

Cc: r-help@stat.math.ethz.ch

Sent: Wednesday, September 13, 2006 7:21 AM

Subject: Re: [R] Reading fixed column format

 

 

 On 9/13/06, Anupam Tyagi [EMAIL PROTECTED] wrote:

 Gabor Grothendieck ggrothendieck at gmail.com writes:

 

  C:\bincut -c2-3,6-8 a.dat

  23678

  23678

  23678

 

 Thanks. I think this will work. How do I redirect the output to a file on

 windows?

 

 Same as on UNIX

 

 cut -c2-3,6-8 a.dat  a2.dat

 

 Is there simple way to convert the cut command to a script on windows,

 

 Using notepad or other text editor put it in file a.bat and then

 issue this command from the console

 

 a.bat

 

 Note that you could process it multiple time if you like:

 

 cut -c6-8 a.dat  a2.dat

 cut -c2-3 a2.dat  a3.dat

 

 produces the same thing but uses 2 passes and so keeps each line shorter.

 Be sure you do it from the tail end forward as shown above to avoid having

 to recalculate the positions.

 

 because the entire command may not fit on one line? Anupam.

 

 

 __

 R-help@stat.math.ethz.ch mailing list

 https://stat.ethz.ch/mailman/listinfo/r-help

 PLEASE do read the posting guide 

 http://www.R-project.org/posting-guide.html

 and provide commented, minimal, self-contained, reproducible code.

 

 

__

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading fixed column format

2006-09-12 Thread Anupam Tyagi
Jason Barnhart jasoncbarnhart at msn.com writes:

 
 These posts may be helpful.
 http://tolstoy.newcastle.edu.au/R/help/05/06/5776.html
 https://stat.ethz.ch/pipermail/r-help/2002-May/021145.html
 
 Using scan directly may also work for you rather than read.fwf.
 
 Also, there are posts regarding using other tools such a 'perl' or 'cut' to 
 prepocess the data
 before reading with R.  Searching the archives with those keywords should 
 help.

I new user should not have to learn perl,cut, awk, etc simply to be able
to use R. Does not make sense to me.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading fixed column format

2006-09-12 Thread Michael Kubovy
On Sep 12, 2006, at 2:47 AM, Anupam Tyagi wrote:

 Jason Barnhart jasoncbarnhart at msn.com writes:


 These posts may be helpful.
 http://tolstoy.newcastle.edu.au/R/help/05/06/5776.html
 https://stat.ethz.ch/pipermail/r-help/2002-May/021145.html

 Using scan directly may also work for you rather than read.fwf.

 Also, there are posts regarding using other tools such a 'perl' or  
 'cut' to
 prepocess the data
 before reading with R.  Searching the archives with those keywords  
 should
 help.

 I new user should not have to learn perl,cut, awk, etc simply  
 to be able
 to use R. Does not make sense to me.

Hi Anupam,

You'll get much better help here if you're not ill-tempered. This is  
a group of extraordinarily helpful volunteers who owe you less than  
you paid for the product.

Please consider saving your data in a way that will make it easier to  
read into R. No program can read every dataset.
_
Professor Michael Kubovy
University of Virginia
Department of Psychology
USPS: P.O.Box 400400Charlottesville, VA 22904-4400
Parcels:Room 102Gilmer Hall
 McCormick RoadCharlottesville, VA 22903
Office:B011+1-434-982-4729
Lab:B019+1-434-982-4751
Fax:+1-434-982-4766
WWW:http://www.people.virginia.edu/~mk9y/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading fixed column format

2006-09-12 Thread Barry Rowlingson
Michael Kubovy wrote:

 Please consider saving your data in a way that will make it easier to
  read into R. No program can read every dataset.

going back to the original post, there seems to be a couple of hanging 
questions:

 None of these seem to read non-coniguous variables from columns; or 
 may be I am missing something. read.fwf is not meant for large
 files according to a post in the archives. Thanks for the pointers. I
 have read the R data input and output. Anupam.

  First up, how 'large' is your 'large ASCII file'? How many rows and 
columns?

  Secondly, what are 'non-contiguous' variables?

  Perhaps if you posted the first few lines and columns of the file then 
we might get an idea of how to read it in.

Barry

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading fixed column format

2006-09-12 Thread Petr Pikal
Hi

Well. I use R quite extensively for a quite a long time without 
knowing perl, cut, awk etc. Do you think I shall learn it?

I agree with Barry Rowlingson that best way how to get a correct 
answer is to present all relevant information. Seems to me that 
read.table, read.fwf are obvious choce, but there are other read 
options as you can find out from help index, e.g. readLines, readBin.

Maybe you could try to fine tune readLines.

HTH
Petr

On 12 Sep 2006 at 6:47, Anupam Tyagi wrote:

To: r-help@stat.math.ethz.ch
From:   Anupam Tyagi [EMAIL PROTECTED]
Date sent:  Tue, 12 Sep 2006 06:47:56 + (UTC)
Subject:Re: [R] Reading fixed column format

 Jason Barnhart jasoncbarnhart at msn.com writes:
 
  
  These posts may be helpful.
  http://tolstoy.newcastle.edu.au/R/help/05/06/5776.html
  https://stat.ethz.ch/pipermail/r-help/2002-May/021145.html
  
  Using scan directly may also work for you rather than read.fwf.
  
  Also, there are posts regarding using other tools such a 'perl' or
  'cut' to prepocess the data before reading with R.  Searching the
  archives with those keywords should help.
 
 I new user should not have to learn perl,cut, awk, etc simply to
 be able to use R. Does not make sense to me.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented,
 minimal, self-contained, reproducible code.

Petr Pikal
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading fixed column format

2006-09-12 Thread Anupam Tyagi
Barry Rowlingson B.Rowlingson at lancaster.ac.uk writes:


  None of these seem to read non-coniguous variables from columns; or 
  may be I am missing something. read.fwf is not meant for large
  files according to a post in the archives. Thanks for the pointers. I
  have read the R data input and output. Anupam.
 
   First up, how 'large' is your 'large ASCII file'? How many rows and 
 columns?

There are 356,112 records, 326 variables, fixed record length of 1283 positions.
Zipped file is 42MB. There are no field (variable) separaters (delimiters).

   Secondly, what are 'non-contiguous' variables?

Variables that are not in adjoining positions in the file: reading them from the
file would require skipping columns while reading. For example, below are the
start positions of the first three variables I would like to read.

StartingColumn  VariableNameFieldLength
1   STATE   2
24  INTVID  3
30  PSU 10


   Perhaps if you posted the first few lines and columns of the file then 
 we might get an idea of how to read it in.

Because a record (row) of the file is 1283 columns, I would not like to post it
here.

Thank you for your response.

Anupam.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reading fixed column format

2006-09-11 Thread Anupam Tyagi
How can I read fixed column data (without a delimiter) from a large ASCII file
directly into R? I want to read non-contiguous variables. I am trying to avoid
reading it first into a DBMS and then choosing the variables. I would perfer to
format and label it along while reading if possible. Something like what STATA
does with dictionary. Anupam.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading fixed column format

2006-09-11 Thread Jason Barnhart
Not familiar w/ Stata, but these functions read data files and should 
provide the functionality you wish.
?read.fwf
?read.table
?scan

- Original Message - 
From: Anupam Tyagi [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch
Sent: Monday, September 11, 2006 8:26 AM
Subject: [R] Reading fixed column format


 How can I read fixed column data (without a delimiter) from a large ASCII 
 file
 directly into R? I want to read non-contiguous variables. I am trying to 
 avoid
 reading it first into a DBMS and then choosing the variables. I would 
 perfer to
 format and label it along while reading if possible. Something like what 
 STATA
 does with dictionary. Anupam.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading fixed column format

2006-09-11 Thread Anupam Tyagi
Jason Barnhart jasoncbarnhart at msn.com writes:

 
 Not familiar w/ Stata, but these functions read data files and should 
 provide the functionality you wish.
 ?read.fwf
 ?read.table
 ?scan

None of these seem to read non-coniguous variables from columns; or may be I am
missing something. read.fwf is not meant for large files according to a post
in the archives. Thanks for the pointers. I have read the R data input and
output. Anupam.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading fixed column format

2006-09-11 Thread Jason Barnhart
These posts may be helpful.
http://tolstoy.newcastle.edu.au/R/help/05/06/5776.html
https://stat.ethz.ch/pipermail/r-help/2002-May/021145.html

Using scan directly may also work for you rather than read.fwf.

Also, there are posts regarding using other tools such a 'perl' or 'cut' to 
prepocess the data
before reading with R.  Searching the archives with those keywords should 
help.

- Original Message - 
From: Anupam Tyagi [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch
Sent: Monday, September 11, 2006 9:55 AM
Subject: Re: [R] Reading fixed column format


 Jason Barnhart jasoncbarnhart at msn.com writes:


 Not familiar w/ Stata, but these functions read data files and should
 provide the functionality you wish.
 ?read.fwf
 ?read.table
 ?scan

 None of these seem to read non-coniguous variables from columns; or may be 
 I am
 missing something. read.fwf is not meant for large files according to a 
 post
 in the archives. Thanks for the pointers. I have read the R data input and
 output. Anupam.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.