I'm trying to get some web data, in the past, I had like: wget -O page.html url links -dump page.html > page.txt mail [email protected] < page.txt
that worked well, till server got re-developed whith new server, in browser, can NOT screen scrape, ONLY labels get copied, not contents; each contents is in 'individual field' that can be copied individualy, one by one (never came across such before?) when I run the script, page.html DOES contain desired data, BUT, not page.txt looking at page.html it has like[1]: readonly? is this some sort of attempt to prevent copying of data..? any thoughts how that sort of html/php can be processed to text ? or do I need to manually for get rid of stuff up to 'value="' if that's the way, what do I need to strip data from 'value="' to next '"'? thanks for any pointers [1]/snip/ <label class="pfbc-label">Suburb</label><input type="text" name="SYS_Addresses_e_address_i_0_e_district_tx" value="SYDNEY" readonly="readonly" class="ro pfbc-textbox"/> <label class="pfbc-label">State</label><input type="hidden" value="NSW" name="SYS_Addresses_e_address_i_0_e_state_cd"><input type="text" name="SYS_Addresses_e_address_i_0_e_state_cd_d" value="NSW" readonly="readonly" class="ro pfbc-textbox"/> <label class="pfbc-label">Postcode</label><input type="text" name="SYS_Addresses_e_address_i_0_e_postcode_tx" value="2000" readonly="readonly" class="ro pfbc-textbox"/> -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
