On Wed, Oct 31, 2012 at 6:20 AM, Soichi Ishida <[email protected]> wrote:
> Rails 1.9.3
>
> For http://en.wikipedia.org/wiki/List_of_airports_by_IATA_code:_A
> I would like to make a list of airport names using Nokogiri.
>
> The following code seems to work but it does not insert "\n" as I wish.
>
> Can you tell me why?
>
>
>
> require 'open-uri'
> require 'nokogiri'
>
> test_url =
> "http://en.wikipedia.org/wiki/List_of_airports_by_IATA_code:_A";
>
> url_list_file = "list_page_url.txt"
> test_xpath = "//tr"
> output_file = "list_airport_names_wiki_url.txt"
>
> test = Nokogiri::HTML(open(test_url))
> File.open(output_file, "a") {|f|
>   test.xpath(test_xpath).each do |e|
>     f.write e.xpath("//td[3]/a").text  + "\n"  #### HERE!!! ####
>   end
> }

First of all the XPath looks suspicious: you certainly want only "td"
elements nested below the current "tr".  So you should use any of
td[3]/a
.//td[3]/a

Otherwise the first selection is useless because //td[3]/a will select
all "a" children of the third "td" in the document.  Also, e.xpath
will return a NodeSet which, when converted via #text, will lead to
surprising results:

irb(main):026:0> puts dom
<?xml version="1.0"?>
<table>
  <td>abc</td>
</table>
=> nil
irb(main):027:0> dom.xpath('//*')
=> [#<Nokogiri::XML::Element:0x..fc00768e6 name="table"
children=[#<Nokogiri::XML::Element:0x..fc00766d4 name="td"
children=[#<Nokogiri::XML::Text:0x..fc0076526 "abc">]>]>,
#<Nokogiri::XML::Element:0x..fc00766d4 name="td"
children=[#<Nokogiri::XML::Text:0x..fc0076526 "abc">]>]
irb(main):028:0> dom.xpath('//*').text
=> "abcabc"

Kind regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

-- You received this message because you are subscribed to the Google Groups 
ruby-talk-google group. To post to this group, send email to 
[email protected]. To unsubscribe from this group, send email 
to [email protected]. For more options, visit this 
group at https://groups.google.com/d/forum/ruby-talk-google?hl=en

Reply via email to