Hello everybody, I just started using R and I'm presenting a poster for R day at Kennesaw State University and I really need some help in terms of web scraping. I'm trying to extract used cars data from www.cars.com to include the mileage, year, model, make, price, CARFAX availability and Technology package availability. I've done some research, and everything points to the XML package and RCurl package. I also got my hands on a function that would capture all the text in the web page and store as a huge character vector. I've never done data mining before so when i read the help documents on the packages i mentioned earlier is like reading Chinese. I would appreciate it if you guide me through this process of data extraction. Here's an example of what the data would look like:
Cost Year Mileage Tech CARFAX Make Model $32000 1999 57,987 1 FREE Audi A4 Here's the link to the search:- http://www.cars.com/for-sale/searchresults.action?stkTyp=U&tracktype=usedcc&mkId=20049&AmbMkId=20049&AmbMkNm=Audi&make=Audi&AmbMdNm=A4&model=A4&mdId=20596&AmbMdId=20596&rd=100&zc=30062&searchSource=QUICK_FORM&enableSeo=1 I'm not expecting you to write the whole code for me, but just some guidance and where to start and what functions would be useful in my situation. Thanks a lot anyway. Regards, M. Samir Anany [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.