I did something similar yesterday… Use readLine() to read at in and identify the “*1:*, … with a regex. Than you have your dividers. In a second step, use read.csv(skip = …, Ncollumns = …) to read the enclosed blocks, and last, combine them accordingly.
This is written without an R installation, so the argument names are likely wrong. Rainer > On 31 Jan 2020, at 10:04, Emmanuel Levy <emmanuel.l...@gmail.com> wrote: > > Hi, > > I'd like to use the Netflix challenge data and just can't figure out how to > efficiently "scan" the files. > https://www.kaggle.com/netflix-inc/netflix-prize-data > > The files have two types of row, either an *ID* e.g., "1:" , "2:", etc. or > 3 values associated to each ID: > > The format is as follows: > *1:* > value1,value2, value3 > value1,value2, value3 > value1,value2, value3 > value1,value2, value3 > *2:* > value1,value2, value3 > value1,value2, value3 > *3:* > value1,value2, value3 > value1,value2, value3 > value1,value2, value3 > *4:* > etc ... > > And I want to create a matrix where each line is of the form: > > ID value1, value2, value3 > > Si "ID" needs to be duplicated - I could write a Perl script to convert > this format to CSV, but I'm sure there's a simple R trick. > > Thanks for suggestions! > > Emmanuel > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Orcid ID: 0000-0002-7490-0066 Department of Evolutionary Biology and Environmental Studies University of Zürich Office Y34-J-74 Winterthurerstrasse 190 8075 Zürich Switzerland Office: +41 (0)44 635 47 64 Cell: +41 (0)78 630 66 57 email: rainer.k...@uzh.ch rai...@krugs.de Skype: RMkrug PGP: 0x0F52F982 [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.