subject:"\[wtr\-general\] Re\: Pulling hair out on screen scraping"

[wtr-general] Re: Pulling hair out on screen scraping

2009-01-09 Thread Bissquitt


anyone have any idea why this isnt working? (or should I be making a
new topic for this?)

for x in 0..2 do
  for y in 0..9 do
  numstring = x.to_s + y.to_s
  puts numstring
  if (browser.span(:id, Regexp.new
(rptCourses_ctl00_rptItems_ctl + numstring +
_lblItemTxtTitle)).text) then
  var = browser.span(:id, Regexp.new
(rptCourses_ctl00_rptItems_ctl + numstring +
_lblItemTxtTitle)).text

in theory (assuming 2 books on the page) when it reaches the 3rd book
the if will evaluate as false and the var = statement never gets
executed.
Im getting the first 2 books returning fine, then on the 3rd time
around puts numstring executes and then the program ends exit code 1
(it should go to the next page after uneventfully finishing the 2 for
loops.

The only thing I can think of is that its trying to call the above
with numstring = to 03, not finding it on the page and crashing.
however thats what the If is there to prevent. Any ideas or tips?

On Jan 4, 10:06 am, Bissquitt bissqu...@gmail.com wrote:
 ok thank you all so much. I got the majority of the code working. This
 is what I have so far.

 while contLoop do colVal = worksheet.Cells(row, 'a').Value
       if (colVal) then
           browser.goto(http://bookstore.umbc.edu/SelectCourses.aspx?
 src=2type=2stoid=9trm=Spring%2009cid= + colVal)

                   var = browser.span(:id, /
 rptCourses_ctl00_rptItems_ctl\d\d_lblItemTxtTitle/).text
                   worksheet.Cells(row, 'b').value = var

       else
           contLoop = false
       end

       row +=  1
       sleep 1
 end

 Do you know of an easy way to itterate through each span that watches
 the above regex and only ones that match or do I need to go through
 all and parse each individualy?

 I was trying something like this but i couldnt get it to work. (are
 span and spans the same? I only saw documentation for spans)
                    browser.spans.each(:id, /
 rptCourses_ctl00_rptItems_ctl\d\d_lblItemTxtTitle/).text

 if that can't be done I guess I will just be storing each span into a
 string, look for the regex and go to next.

 Thanks again guys

 On Jan 3, 3:41 pm, Charley Baker charley.ba...@gmail.com wrote:



  It can be a bit overwhelming to learn Ruby and various libraries at the same
  time. I'd recommend taking a look at the Pickaxe 
  book:http://whytheluckystiff.net/ruby/pickaxe/ just to get some general
  familiarity. There are other Ruby tutorials online as well as some good
  books - The Ruby Way, Everyday Scripting, OReilly's Ruby book.
  succ! as you mention below is a Ruby core method. Gotapi also has a good
  searchable reference to Ruby standard api.http://www.gotapi.com/html click
  on the Ruby Standard Packages. The pickaxe book from the link above also has
  an index of the core api, many with examples.
  Here's a link to the Watir rdocs in case you might find that 
  useful.http://wtr.rubyforge.org/rdoc/anda link to supported elements(though
  openqa is down right 
  now):http://wiki.openqa.org/display/WTR/Methods+supported+by+Element

  Strange that the hpricot site is down now as well.

  Another useful way to learn how to use libraries in Ruby is by taking a look
  at their unit tests. Watir has a large number of unit tests, hpricot has
  some too. They're located under your ruby install directory in gems.

  Ruby comes with a few documentation systems: ri and rdoc. For the gems you
  have installed locally you can see all of the rdocs by going to the command
  line, type:
  gem server
  Then browse tohttp://localhost:8808
  ri can also be used from the command line:
  ri String::succ!

  Additional responses inline:

  On Sat, Jan 3, 2009 at 10:31 AM, Bissquitt bissqu...@gmail.com wrote:

   Regarding documentation, I read the Tutorial all the way through but
   it only hit on a few specific examples leaving out other commands all
   together. I've visited MANY ruby and watir sites and never once saw
   the .span command (does it just search for span tags? guess ill
   google it after this post) I never even found a site listing all the
   watir commands (http://us.php.net/manual/en/function.abs.php) as an
   example. In addition there are SO MANY tutorials and such online that
   are all very poorly done it makes finding a good one via google a
   needle in a haystack scenario. ie (oh great, you showed me that
   specific command, but showed me nothing about how that command works
   so unless I want to use it exactly the way you used it, its useless).
   My example here is the ruby on windows site. If I google for
   anything regarding ruby and excel I either get that site, or another
   site that just provides me a link to that site and am forced to make
   due with that site in order to teach myself how to interact with
   excel. The site itself lists a BUNCH of examples but leaves it up to
   you to try and pick apart the syntax to understand what it is

[wtr-general] Re: Pulling hair out on screen scraping

2009-01-09 Thread Charley Baker

if (browser.span(:id, Regexp.new(rptCourses_ctl00_rptItems_ctl + numstring
+_lblItemTxtTitle)).exists?)

otherwise calling text will throw an exception trying to locate the element.


-c


On Fri, Jan 9, 2009 at 10:38 AM, Bissquitt bissqu...@gmail.com wrote:


 anyone have any idea why this isnt working? (or should I be making a
 new topic for this?)

 for x in 0..2 do
  for y in 0..9 do
  numstring = x.to_s + y.to_s
  puts numstring
  if (browser.span(:id, Regexp.new
 (rptCourses_ctl00_rptItems_ctl + numstring +
 _lblItemTxtTitle)).text) then
  var = browser.span(:id, Regexp.new
 (rptCourses_ctl00_rptItems_ctl + numstring +
 _lblItemTxtTitle)).text

 in theory (assuming 2 books on the page) when it reaches the 3rd book
 the if will evaluate as false and the var = statement never gets
 executed.
 Im getting the first 2 books returning fine, then on the 3rd time
 around puts numstring executes and then the program ends exit code 1
 (it should go to the next page after uneventfully finishing the 2 for
 loops.

 The only thing I can think of is that its trying to call the above
 with numstring = to 03, not finding it on the page and crashing.
 however thats what the If is there to prevent. Any ideas or tips?

 On Jan 4, 10:06 am, Bissquitt bissqu...@gmail.com wrote:
  ok thank you all so much. I got the majority of the code working. This
  is what I have so far.
 
  while contLoop do colVal = worksheet.Cells(row, 'a').Value
if (colVal) then
browser.goto(http://bookstore.umbc.edu/SelectCourses.aspx?
  src=2type=2stoid=9trm=Spring%2009cid= + colVal)
 
var = browser.span(:id, /
  rptCourses_ctl00_rptItems_ctl\d\d_lblItemTxtTitle/).text
worksheet.Cells(row, 'b').value = var
 
else
contLoop = false
end
 
row +=  1
sleep 1
  end
 
  Do you know of an easy way to itterate through each span that watches
  the above regex and only ones that match or do I need to go through
  all and parse each individualy?
 
  I was trying something like this but i couldnt get it to work. (are
  span and spans the same? I only saw documentation for spans)
 browser.spans.each(:id, /
  rptCourses_ctl00_rptItems_ctl\d\d_lblItemTxtTitle/).text
 
  if that can't be done I guess I will just be storing each span into a
  string, look for the regex and go to next.
 
  Thanks again guys
 
  On Jan 3, 3:41 pm, Charley Baker charley.ba...@gmail.com wrote:
 
 
 
   It can be a bit overwhelming to learn Ruby and various libraries at the
 same
   time. I'd recommend taking a look at the Pickaxe book:
 http://whytheluckystiff.net/ruby/pickaxe/ just to get some general
   familiarity. There are other Ruby tutorials online as well as some good
   books - The Ruby Way, Everyday Scripting, OReilly's Ruby book.
   succ! as you mention below is a Ruby core method. Gotapi also has a
 good
   searchable reference to Ruby standard api.http://www.gotapi.com/html
  click
   on the Ruby Standard Packages. The pickaxe book from the link above
 also has
   an index of the core api, many with examples.
   Here's a link to the Watir rdocs in case you might find that useful.
 http://wtr.rubyforge.org/rdoc/anda link to supported elements(though
   openqa is down right now):
 http://wiki.openqa.org/display/WTR/Methods+supported+by+Element
 
   Strange that the hpricot site is down now as well.
 
   Another useful way to learn how to use libraries in Ruby is by taking a
 look
   at their unit tests. Watir has a large number of unit tests, hpricot
 has
   some too. They're located under your ruby install directory in gems.
 
   Ruby comes with a few documentation systems: ri and rdoc. For the gems
 you
   have installed locally you can see all of the rdocs by going to the
 command
   line, type:
   gem server
   Then browse tohttp://localhost:8808
   ri can also be used from the command line:
   ri String::succ!
 
   Additional responses inline:
 
   On Sat, Jan 3, 2009 at 10:31 AM, Bissquitt bissqu...@gmail.com
 wrote:
 
Regarding documentation, I read the Tutorial all the way through but
it only hit on a few specific examples leaving out other commands all
together. I've visited MANY ruby and watir sites and never once saw
the .span command (does it just search for span tags? guess ill
google it after this post) I never even found a site listing all the
watir commands (http://us.php.net/manual/en/function.abs.php) as an
example. In addition there are SO MANY tutorials and such online that
are all very poorly done it makes finding a good one via google a
needle in a haystack scenario. ie (oh great, you showed me that
specific command, but showed me nothing about how that command works
so unless I want to use it exactly the way you used it, its useless).
My example here is the ruby on windows site. If

[wtr-general] Re: Pulling hair out on screen scraping

2009-01-09 Thread Bissquitt


thank you very much, you are awesome

On Jan 9, 4:04 pm, Charley Baker charley.ba...@gmail.com wrote:
 if (browser.span(:id, Regexp.new(rptCourses_ctl00_rptItems_ctl + numstring
 +_lblItemTxtTitle)).exists?)

 otherwise calling text will throw an exception trying to locate the element.

 -c



 On Fri, Jan 9, 2009 at 10:38 AM, Bissquitt bissqu...@gmail.com wrote:

  anyone have any idea why this isnt working? (or should I be making a
  new topic for this?)

  for x in 0..2 do
                   for y in 0..9 do
                       numstring = x.to_s + y.to_s
                       puts numstring
                       if (browser.span(:id, Regexp.new
  (rptCourses_ctl00_rptItems_ctl + numstring +
  _lblItemTxtTitle)).text) then
                           var = browser.span(:id, Regexp.new
  (rptCourses_ctl00_rptItems_ctl + numstring +
  _lblItemTxtTitle)).text

  in theory (assuming 2 books on the page) when it reaches the 3rd book
  the if will evaluate as false and the var = statement never gets
  executed.
  Im getting the first 2 books returning fine, then on the 3rd time
  around puts numstring executes and then the program ends exit code 1
  (it should go to the next page after uneventfully finishing the 2 for
  loops.

  The only thing I can think of is that its trying to call the above
  with numstring = to 03, not finding it on the page and crashing.
  however thats what the If is there to prevent. Any ideas or tips?

  On Jan 4, 10:06 am, Bissquitt bissqu...@gmail.com wrote:
   ok thank you all so much. I got the majority of the code working. This
   is what I have so far.

   while contLoop do colVal = worksheet.Cells(row, 'a').Value
         if (colVal) then
             browser.goto(http://bookstore.umbc.edu/SelectCourses.aspx?
   src=2type=2stoid=9trm=Spring%2009cid= + colVal)

                     var = browser.span(:id, /
   rptCourses_ctl00_rptItems_ctl\d\d_lblItemTxtTitle/).text
                     worksheet.Cells(row, 'b').value = var

         else
             contLoop = false
         end

         row +=  1
         sleep 1
   end

   Do you know of an easy way to itterate through each span that watches
   the above regex and only ones that match or do I need to go through
   all and parse each individualy?

   I was trying something like this but i couldnt get it to work. (are
   span and spans the same? I only saw documentation for spans)
                      browser.spans.each(:id, /
   rptCourses_ctl00_rptItems_ctl\d\d_lblItemTxtTitle/).text

   if that can't be done I guess I will just be storing each span into a
   string, look for the regex and go to next.

   Thanks again guys

   On Jan 3, 3:41 pm, Charley Baker charley.ba...@gmail.com wrote:

It can be a bit overwhelming to learn Ruby and various libraries at the
  same
time. I'd recommend taking a look at the Pickaxe book:
 http://whytheluckystiff.net/ruby/pickaxe/just to get some general
familiarity. There are other Ruby tutorials online as well as some good
books - The Ruby Way, Everyday Scripting, OReilly's Ruby book.
succ! as you mention below is a Ruby core method. Gotapi also has a
  good
searchable reference to Ruby standard api.http://www.gotapi.com/html
   click
on the Ruby Standard Packages. The pickaxe book from the link above
  also has
an index of the core api, many with examples.
Here's a link to the Watir rdocs in case you might find that useful.
 http://wtr.rubyforge.org/rdoc/andalink to supported elements(though
openqa is down right now):
 http://wiki.openqa.org/display/WTR/Methods+supported+by+Element

Strange that the hpricot site is down now as well.

Another useful way to learn how to use libraries in Ruby is by taking a
  look
at their unit tests. Watir has a large number of unit tests, hpricot
  has
some too. They're located under your ruby install directory in gems.

Ruby comes with a few documentation systems: ri and rdoc. For the gems
  you
have installed locally you can see all of the rdocs by going to the
  command
line, type:
gem server
Then browse tohttp://localhost:8808
ri can also be used from the command line:
ri String::succ!

Additional responses inline:

On Sat, Jan 3, 2009 at 10:31 AM, Bissquitt bissqu...@gmail.com
  wrote:

 Regarding documentation, I read the Tutorial all the way through but
 it only hit on a few specific examples leaving out other commands all
 together. I've visited MANY ruby and watir sites and never once saw
 the .span command (does it just search for span tags? guess ill
 google it after this post) I never even found a site listing all the
 watir commands (http://us.php.net/manual/en/function.abs.php) as an
 example. In addition there are SO MANY tutorials and such online that
 are all very poorly done it makes finding a good one via google a
 needle in a haystack scenario. ie (oh great, you showed me that
 specific command,

[wtr-general] Re: Pulling hair out on screen scraping

2009-01-09 Thread gem dandy


Bissquitt,

Thank you for poking the proverbial 'beehive'. I am a Watir/Ruby
newbie too.
I've posted some really basic questions here questions on this group
as well.
My background is mainly in hardware test automation, BASIC, and VB
Script.
Kudos to all the responses to this post. I too am trying to run before
walking.
I'm just so excited about the potential of Watir that I can sit still.
After one month of tinkering, with what little I know, I've already
automated 30% of our web configuration tests at my job.

The links included here will keep me busy for awhile.


Thanks again to all,
Gem (newbie) Dandy




On Jan 9, 4:53 pm, Bissquitt bissqu...@gmail.com wrote:
 thank you very much, you are awesome

 On Jan 9, 4:04 pm, Charley Baker charley.ba...@gmail.com wrote:

  if (browser.span(:id, Regexp.new(rptCourses_ctl00_rptItems_ctl + numstring
  +_lblItemTxtTitle)).exists?)

  otherwise calling text will throw an exception trying to locate the element.

  -c

  On Fri, Jan 9, 2009 at 10:38 AM, Bissquitt bissqu...@gmail.com wrote:

   anyone have any idea why this isnt working? (or should I be making a
   new topic for this?)

   for x in 0..2 do
                    for y in 0..9 do
                        numstring = x.to_s + y.to_s
                        puts numstring
                        if (browser.span(:id, Regexp.new
   (rptCourses_ctl00_rptItems_ctl + numstring +
   _lblItemTxtTitle)).text) then
                            var = browser.span(:id, Regexp.new
   (rptCourses_ctl00_rptItems_ctl + numstring +
   _lblItemTxtTitle)).text

   in theory (assuming 2 books on the page) when it reaches the 3rd book
   the if will evaluate as false and the var = statement never gets
   executed.
   Im getting the first 2 books returning fine, then on the 3rd time
   around puts numstring executes and then the program ends exit code 1
   (it should go to the next page after uneventfully finishing the 2 for
   loops.

   The only thing I can think of is that its trying to call the above
   with numstring = to 03, not finding it on the page and crashing.
   however thats what the If is there to prevent. Any ideas or tips?

   On Jan 4, 10:06 am, Bissquitt bissqu...@gmail.com wrote:
ok thank you all so much. I got the majority of the code working. This
is what I have so far.

while contLoop do colVal = worksheet.Cells(row, 'a').Value
      if (colVal) then
          browser.goto(http://bookstore.umbc.edu/SelectCourses.aspx?
src=2type=2stoid=9trm=Spring%2009cid= + colVal)

                  var = browser.span(:id, /
rptCourses_ctl00_rptItems_ctl\d\d_lblItemTxtTitle/).text
                  worksheet.Cells(row, 'b').value = var

      else
          contLoop = false
      end

      row +=  1
      sleep 1
end

Do you know of an easy way to itterate through each span that watches
the above regex and only ones that match or do I need to go through
all and parse each individualy?

I was trying something like this but i couldnt get it to work. (are
span and spans the same? I only saw documentation for spans)
                   browser.spans.each(:id, /
rptCourses_ctl00_rptItems_ctl\d\d_lblItemTxtTitle/).text

if that can't be done I guess I will just be storing each span into a
string, look for the regex and go to next.

Thanks again guys

On Jan 3, 3:41 pm, Charley Baker charley.ba...@gmail.com wrote:

 It can be a bit overwhelming to learn Ruby and various libraries at 
 the
   same
 time. I'd recommend taking a look at the Pickaxe book:
  http://whytheluckystiff.net/ruby/pickaxe/justto get some general
 familiarity. There are other Ruby tutorials online as well as some 
 good
 books - The Ruby Way, Everyday Scripting, OReilly's Ruby book.
 succ! as you mention below is a Ruby core method. Gotapi also has a
   good
 searchable reference to Ruby standard api.http://www.gotapi.com/html
    click
 on the Ruby Standard Packages. The pickaxe book from the link above
   also has
 an index of the core api, many with examples.
 Here's a link to the Watir rdocs in case you might find that useful.
  http://wtr.rubyforge.org/rdoc/andalinkto supported elements(though
 openqa is down right now):
  http://wiki.openqa.org/display/WTR/Methods+supported+by+Element

 Strange that the hpricot site is down now as well.

 Another useful way to learn how to use libraries in Ruby is by taking 
 a
   look
 at their unit tests. Watir has a large number of unit tests, hpricot
   has
 some too. They're located under your ruby install directory in gems.

 Ruby comes with a few documentation systems: ri and rdoc. For the gems
   you
 have installed locally you can see all of the rdocs by going to the
   command
 line, type:
 gem server
 Then browse tohttp://localhost:8808
 ri can also be used from the command line:
 ri String::succ!

[wtr-general] Re: Pulling hair out on screen scraping

2009-01-04 Thread Bissquitt


ok thank you all so much. I got the majority of the code working. This
is what I have so far.

while contLoop do colVal = worksheet.Cells(row, 'a').Value
  if (colVal) then
  browser.goto(http://bookstore.umbc.edu/SelectCourses.aspx?
src=2type=2stoid=9trm=Spring%2009cid= + colVal)

  var = browser.span(:id, /
rptCourses_ctl00_rptItems_ctl\d\d_lblItemTxtTitle/).text
  worksheet.Cells(row, 'b').value = var

  else
  contLoop = false
  end

  row +=  1
  sleep 1
end

Do you know of an easy way to itterate through each span that watches
the above regex and only ones that match or do I need to go through
all and parse each individualy?

I was trying something like this but i couldnt get it to work. (are
span and spans the same? I only saw documentation for spans)
   browser.spans.each(:id, /
rptCourses_ctl00_rptItems_ctl\d\d_lblItemTxtTitle/).text

if that can't be done I guess I will just be storing each span into a
string, look for the regex and go to next.

Thanks again guys


On Jan 3, 3:41 pm, Charley Baker charley.ba...@gmail.com wrote:
 It can be a bit overwhelming to learn Ruby and various libraries at the same
 time. I'd recommend taking a look at the Pickaxe 
 book:http://whytheluckystiff.net/ruby/pickaxe/  just to get some general
 familiarity. There are other Ruby tutorials online as well as some good
 books - The Ruby Way, Everyday Scripting, OReilly's Ruby book.
 succ! as you mention below is a Ruby core method. Gotapi also has a good
 searchable reference to Ruby standard api.http://www.gotapi.com/html click
 on the Ruby Standard Packages. The pickaxe book from the link above also has
 an index of the core api, many with examples.
 Here's a link to the Watir rdocs in case you might find that 
 useful.http://wtr.rubyforge.org/rdoc/and a link to supported elements(though
 openqa is down right 
 now):http://wiki.openqa.org/display/WTR/Methods+supported+by+Element

 Strange that the hpricot site is down now as well.

 Another useful way to learn how to use libraries in Ruby is by taking a look
 at their unit tests. Watir has a large number of unit tests, hpricot has
 some too. They're located under your ruby install directory in gems.

 Ruby comes with a few documentation systems: ri and rdoc. For the gems you
 have installed locally you can see all of the rdocs by going to the command
 line, type:
 gem server
 Then browse tohttp://localhost:8808
 ri can also be used from the command line:
 ri String::succ!

 Additional responses inline:





 On Sat, Jan 3, 2009 at 10:31 AM, Bissquitt bissqu...@gmail.com wrote:

  Regarding documentation, I read the Tutorial all the way through but
  it only hit on a few specific examples leaving out other commands all
  together. I've visited MANY ruby and watir sites and never once saw
  the .span command (does it just search for span tags? guess ill
  google it after this post) I never even found a site listing all the
  watir commands (http://us.php.net/manual/en/function.abs.php) as an
  example. In addition there are SO MANY tutorials and such online that
  are all very poorly done it makes finding a good one via google a
  needle in a haystack scenario. ie (oh great, you showed me that
  specific command, but showed me nothing about how that command works
  so unless I want to use it exactly the way you used it, its useless).
  My example here is the ruby on windows site. If I google for
  anything regarding ruby and excel I either get that site, or another
  site that just provides me a link to that site and am forced to make
  due with that site in order to teach myself how to interact with
  excel. The site itself lists a BUNCH of examples but leaves it up to
  you to try and pick apart the syntax to understand what it is doing.
  For example:

  line = '1'
  while worksheet.Range(a#{line})['Value']
    line.succ!
  end
  #line now holds row number of first empty row

  What on earth does .succ! do? It never tells me. The site, and most
  that ive seen, are written not to target new people and tutor them but
  to target advanced users with a more so heres a cool way to approach
  the problem approach. A simple ok, here is the the excel class, here
  are the comands in it and what they do, here is a syntax example
  would be far more helpful as it doesn't leave anything out. I'm still
  not sure if its possible to return what row the active cell is on.

 Excel is a strange one. :) Agreed that most sites assume a basic familiarity
 with Ruby, and with the links above you should be able to get into it fairly
 quickly. Accessing Excel is done through it's COM interface, so one of the
 best sources of documentation is actually the Excel VBA Microsoft help file.
 There's a link to the standalone version of it somewhere on the internets if
 you don't have it installed. There are some excel libraries on our wiki as
 well as a project on Rubyforge called Rasta which use Excel. You

[wtr-general] Re: Pulling hair out on screen scraping

2009-01-03 Thread Bissquitt


forgot to include the code I have thus far. (currently not working do
to the Hpricot portion)

excel = WIN32OLE.new(excel.application)
excel.visible = true
workbook = excel.workbooks.open('E:\books\spring 09 classes.xls')
worksheet=workbook.worksheets(1)


contLoop = true
row = 1


while contLoop do colVal = worksheet.Cells(row, 'a').Value
  if (colVal) then
  doc = Hpricot(open(http://bookstore.umbc.edu/
SelectCourses.aspx?src=2type=2stoid=9trm=Spring%2009cid=
(colVal)))
  a = doc.search(sp...@id='rptCourses_ctl00_rptItems_ctl\d
\d_lblItemTxtTitle']).inner_text
  worksheet.Cells(row, 'f').value = a


  else
  contLoop = false
  end

  row +=  1
  sleep 1
end


On Jan 3, 8:32 am, Bissquitt bissqu...@gmail.com wrote:
 Granted I am new to Watir and ruby in general but I do have a
 background of programming. My brief experience has been that watir and
 ruby are awesome but VERY poorly documented, which is odd concidering
 the massive amount of web pages dedicated to it.

 anyway, here is the issue I am having.

 I am trying to screen scrape book information from a college
 bookstores website. My first attempt was php (and I had a full script
 done for it) then realized that the site uses javascript to get info
 from their database and all I was scraping was the static HTML and
 missed the generated stuff I need.

 The script in theory:
 opens an excel document,
 looks at (A1) and goes to www.website.com/(A1) where (A1) is a
 course number,
 stores Title, ISBN and other info into B1, C1, D1 etc (I also have to
 take into account more than 1 book per class) though once I get the
 first I should be able to do this.
 goes to (A2) and repeats.

 From what I have seen there are 2 ways to do this each with its own
 problem.

 1) use hpricot or some other parser to find the proper tag. This has 2
 issues.

 span
 id=rptCourses_ctl00_rptItems_ctl00_lblItemTxtISBN9780324574289/
 span
 The second ctl00 itterates to ctl01 for the second book (I am hoping I
 can just use regexp in line)

 The second issue is that I have not been able to figure out how to
 pick out a span tag. There are all sorts of commands for finding links
 and tables and such but I cant figure out how to pick out that
 particular tag (specificly with hpricot)

 2) Load the entire page into a variable, strip out all new lines and
 tabs, scan entire page for specific regexp
 span id=rptCourses_ctl00_rptItems_ctl\d\d_lblItemTxtTitle
 style=font-weight:bold;[^]+\/span
 I know this works, I used rubulator to test it. It returns all titles
 of books on the page, I do forsee an issue of which title belongs to
 which other info if I do it that way though.

 If an exact example is required I can give out all the info you
 require though I figured it would be more clutter than helpful. An
 actual syntax example would be most helpful rather than just refering
 me to a class definition though I will take whatever is offered.

 Many thanks,
 Michael
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Watir General group.
To post to this group, send email to watir-general@googlegroups.com
Before posting, please read the following guidelines: 
http://wiki.openqa.org/display/WTR/Support
To unsubscribe from this group, send email to 
watir-general-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/watir-general
-~--~~~~--~~--~--~---

[wtr-general] Re: Pulling hair out on screen scraping

2009-01-03 Thread Charley Baker

Hi there,
  I'm not sure what you mean by Ruby and Watir being poorly documented. For
Ruby, the first edition of the Pickaxe book which is comprehensive is free
and available online. There are dozens of other tutorials, sites and blogs
about Ruby. Watir also has a lot of examples, a tutorial(
http://wiki.openqa.org/display/WTR/Tutorial) and other information on the
wiki, if there's something you feel is missing, don't hesitate to suggest it
or add it yourself.

  Oddly, your example doesn't use Watir at all. If you wanted to use Watir
to do the same thing here are some possibilities:

browser.spans.each {|s| puts s.text}   #do something else with the span in
the block if you want - e.g. assign some variables, etc
var = browser.span(:id, /ctl/).text   #find the span by a regex and
assign it to a variable

An interesting example using hpricot and regexs to find book information -
ISBN, price, etc.

Scrubyt is another library for screen scraping which internally uses either
Firewatir or Mechanize, here's a link to some examples:
http://wiki.scrubyt.org/index.php?title=Tutorials

HTH,


Charley Baker
blog: http://charleybakersblog.blogspot.com/
Project Manager, Watir, http://wtr.rubyforge.org
QA Architect, Gap Inc Direct


On Sat, Jan 3, 2009 at 7:12 AM, Bissquitt bissqu...@gmail.com wrote:


 forgot to include the code I have thus far. (currently not working do
 to the Hpricot portion)

 excel = WIN32OLE.new(excel.application)
 excel.visible = true
 workbook = excel.workbooks.open('E:\books\spring 09 classes.xls')
 worksheet=workbook.worksheets(1)


 contLoop = true
 row = 1


 while contLoop do colVal = worksheet.Cells(row, 'a').Value
  if (colVal) then
  doc = Hpricot(open(http://bookstore.umbc.edu/
 SelectCourses.aspx?src=2type=2stoid=9trm=Spring%2009cid=http://bookstore.umbc.edu/SelectCourses.aspx?src=2type=2stoid=9trm=Spring%2009cid=
 (colVal)))
  a = doc.search(sp...@id='rptCourses_ctl00_rptItems_ctl\d
 \d_lblItemTxtTitle']).inner_text
  worksheet.Cells(row, 'f').value = a


  else
  contLoop = false
  end

  row +=  1
  sleep 1
 end


 On Jan 3, 8:32 am, Bissquitt bissqu...@gmail.com wrote:
  Granted I am new to Watir and ruby in general but I do have a
  background of programming. My brief experience has been that watir and
  ruby are awesome but VERY poorly documented, which is odd concidering
  the massive amount of web pages dedicated to it.
 
  anyway, here is the issue I am having.
 
  I am trying to screen scrape book information from a college
  bookstores website. My first attempt was php (and I had a full script
  done for it) then realized that the site uses javascript to get info
  from their database and all I was scraping was the static HTML and
  missed the generated stuff I need.
 
  The script in theory:
  opens an excel document,
  looks at (A1) and goes to www.website.com/(A1) where (A1) is a
  course number,
  stores Title, ISBN and other info into B1, C1, D1 etc (I also have to
  take into account more than 1 book per class) though once I get the
  first I should be able to do this.
  goes to (A2) and repeats.
 
  From what I have seen there are 2 ways to do this each with its own
  problem.
 
  1) use hpricot or some other parser to find the proper tag. This has 2
  issues.
 
  span
  id=rptCourses_ctl00_rptItems_ctl00_lblItemTxtISBN9780324574289/
  span
  The second ctl00 itterates to ctl01 for the second book (I am hoping I
  can just use regexp in line)
 
  The second issue is that I have not been able to figure out how to
  pick out a span tag. There are all sorts of commands for finding links
  and tables and such but I cant figure out how to pick out that
  particular tag (specificly with hpricot)
 
  2) Load the entire page into a variable, strip out all new lines and
  tabs, scan entire page for specific regexp
  span id=rptCourses_ctl00_rptItems_ctl\d\d_lblItemTxtTitle
  style=font-weight:bold;[^]+\/span
  I know this works, I used rubulator to test it. It returns all titles
  of books on the page, I do forsee an issue of which title belongs to
  which other info if I do it that way though.
 
  If an exact example is required I can give out all the info you
  require though I figured it would be more clutter than helpful. An
  actual syntax example would be most helpful rather than just refering
  me to a class definition though I will take whatever is offered.
 
  Many thanks,
  Michael
 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Watir General group.
To post to this group, send email to watir-general@googlegroups.com
Before posting, please read the following guidelines: 
http://wiki.openqa.org/display/WTR/Support
To unsubscribe from this group, send email to 
watir-general-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/watir-general

[wtr-general] Re: Pulling hair out on screen scraping

2009-01-03 Thread Bissquitt


Regarding documentation, I read the Tutorial all the way through but
it only hit on a few specific examples leaving out other commands all
together. I've visited MANY ruby and watir sites and never once saw
the .span command (does it just search for span tags? guess ill
google it after this post) I never even found a site listing all the
watir commands ( http://us.php.net/manual/en/function.abs.php ) as an
example. In addition there are SO MANY tutorials and such online that
are all very poorly done it makes finding a good one via google a
needle in a haystack scenario. ie (oh great, you showed me that
specific command, but showed me nothing about how that command works
so unless I want to use it exactly the way you used it, its useless).
My example here is the ruby on windows site. If I google for
anything regarding ruby and excel I either get that site, or another
site that just provides me a link to that site and am forced to make
due with that site in order to teach myself how to interact with
excel. The site itself lists a BUNCH of examples but leaves it up to
you to try and pick apart the syntax to understand what it is doing.
For example:

line = '1'
while worksheet.Range(a#{line})['Value']
   line.succ!
end
#line now holds row number of first empty row

What on earth does .succ! do? It never tells me. The site, and most
that ive seen, are written not to target new people and tutor them but
to target advanced users with a more so heres a cool way to approach
the problem approach. A simple ok, here is the the excel class, here
are the comands in it and what they do, here is a syntax example
would be far more helpful as it doesn't leave anything out. I'm still
not sure if its possible to return what row the active cell is on.

...Which is when I decided to ask actual people and ended up here.
(thanks again btw)


...After that long winded response, I was trying to using Watir to
scrape the page because I was having issues with the the javascript
not being executed before the scrape (when i did it in php) and
figured that a driven web brower would be sure to get it...hence
watir.

The reason my example was not using watir is because I was unable to
find any documentation on how to do what I needed. I saw the
browser.links and browser.table but those were the only 2 I found,
there was no, here is a list of the commands as I mentioned above.
Consiquently I found even less on hpricot since all I get is a 404 on
its main site, and every other site links to it so wether or not it
was documented is irrelevent, all I have to work with is trying to
piece together other peoples code and work with it.

I don't quite follow your first example since I am barely familiar
with ruby syntax (though it appears to be similar to java) what is the
|s| ?
Your second example seems to be much closer to what I need since there
are MANY spans on the page but only a handfull matching the regexp
pattern I gave above.

Would you be able to break down the second example for me?

var = browser.span(:id, /ctl/).text

I know:
var is the variable being stored into
browser is the watir browser object being driven
I'm guessing span just looks for span tags?
I'm also guessing that (:id, /ctl/) looks for any span tag with an id
matching /ctl/ ? (this is where im not following you as much)
what does the : in your example do? what exactly is the second
argument doing, what are the slashes?
and what does the .text at the end do?

Sorry for being rather dense but I have barely delt with web
programming before. I've spent my life doing C++, Java, and BASIC so
I'm pretty much trying to stumble into a final product as gracefully
as I can.

Michael



On Jan 3, 12:37 pm, Charley Baker charley.ba...@gmail.com wrote:
 Hi there,
   I'm not sure what you mean by Ruby and Watir being poorly documented. For
 Ruby, the first edition of the Pickaxe book which is comprehensive is free
 and available online. There are dozens of other tutorials, sites and blogs
 about Ruby. Watir also has a lot of examples, a 
 tutorial(http://wiki.openqa.org/display/WTR/Tutorial) and other information 
 on the
 wiki, if there's something you feel is missing, don't hesitate to suggest it
 or add it yourself.

   Oddly, your example doesn't use Watir at all. If you wanted to use Watir
 to do the same thing here are some possibilities:

 browser.spans.each {|s| puts s.text}   #do something else with the span in
 the block if you want - e.g. assign some variables, etc
 var = browser.span(:id, /ctl/).text       #find the span by a regex and
 assign it to a variable

 An interesting example using hpricot and regexs to find book information -
 ISBN, price, etc.

 Scrubyt is another library for screen scraping which internally uses either
 Firewatir or Mechanize, here's a link to some 
 examples:http://wiki.scrubyt.org/index.php?title=Tutorials

 HTH,

 Charley Baker
 blog:http://charleybakersblog.blogspot.com/
 Project Manager, Watir,http://wtr.rubyforge.org
 QA Architect, Gap

[wtr-general] Re: Pulling hair out on screen scraping

2009-01-03 Thread Alex Collins

Michael,

A fairly rapid reply, so my apologies if it sounds a little terse. A  
clearer, succinct email would be helpful, rather than unduly  
elaborating on your difficulties finding things.

My immediate thought is that you are trying to run before you have  
learnt to walk.

I would:
- Learn the basics of Ruby using a Ruby tutorial eg 
http://poignantguide.net/ruby/ 
  (quirky)
- Read through parts of the Pickaxe book eg http://www.rubycentral.com/book/
- Learn to use IRB (interactive ruby) to understand how to see what  
methods do
- Look at the RDoc for the libraries (gems) you want to use eg Watir,  
excel

Personally, I consider Ruby's documentation and tutorials very good.  
Ruby is a scripting language so most people will provide examples of  
problems they have solved. Plenty of documentation is available  
through RDoc (Ruby documentation).

Your question about Ruby's succ! method is easily answered by  
searching Google for Ruby succ!. You will find that it Returns the  
successor. If you run the command irb having installed Ruby and try  
typing:

a = '1'
a.succ!
= 2 (is returned by irb)
a.succ!
= 3 (is returned by irb)

Charley's first example might be more simply written as:

browser.spans.each do | span |
  puts span.text
end

This could be understood as:

For each span within the browser put the span object into the span  
variable within the following block of code
print the output of span.text to screen

Your understanding of the second example is pretty much correct. In  
answer to your questions:
- Searching for ruby colon character will tell you that the : before  
a name indicates that it is a symbol. A Ruby tutorial will help you  
understand this.
- The // symbols are less easy to find, but they are a regular  
expression or RegExp. Common across many languages and also available  
in Java.
- The .text method returns the text contained within the span

Hope this helps,

Alex

On 3 Jan 2009, at 18:31, Bissquitt wrote:


 Regarding documentation, I read the Tutorial all the way through but
 it only hit on a few specific examples leaving out other commands all
 together. I've visited MANY ruby and watir sites and never once saw
 the .span command (does it just search for span tags? guess ill
 google it after this post) I never even found a site listing all the
 watir commands ( http://us.php.net/manual/en/function.abs.php ) as an
 example. In addition there are SO MANY tutorials and such online that
 are all very poorly done it makes finding a good one via google a
 needle in a haystack scenario. ie (oh great, you showed me that
 specific command, but showed me nothing about how that command works
 so unless I want to use it exactly the way you used it, its useless).
 My example here is the ruby on windows site. If I google for
 anything regarding ruby and excel I either get that site, or another
 site that just provides me a link to that site and am forced to make
 due with that site in order to teach myself how to interact with
 excel. The site itself lists a BUNCH of examples but leaves it up to
 you to try and pick apart the syntax to understand what it is doing.
 For example:

 line = '1'
 while worksheet.Range(a#{line})['Value']
  line.succ!
 end
 #line now holds row number of first empty row

 What on earth does .succ! do? It never tells me. The site, and most
 that ive seen, are written not to target new people and tutor them but
 to target advanced users with a more so heres a cool way to approach
 the problem approach. A simple ok, here is the the excel class, here
 are the comands in it and what they do, here is a syntax example
 would be far more helpful as it doesn't leave anything out. I'm still
 not sure if its possible to return what row the active cell is on.

 ...Which is when I decided to ask actual people and ended up here.
 (thanks again btw)


 ...After that long winded response, I was trying to using Watir to
 scrape the page because I was having issues with the the javascript
 not being executed before the scrape (when i did it in php) and
 figured that a driven web brower would be sure to get it...hence
 watir.

 The reason my example was not using watir is because I was unable to
 find any documentation on how to do what I needed. I saw the
 browser.links and browser.table but those were the only 2 I found,
 there was no, here is a list of the commands as I mentioned above.
 Consiquently I found even less on hpricot since all I get is a 404 on
 its main site, and every other site links to it so wether or not it
 was documented is irrelevent, all I have to work with is trying to
 piece together other peoples code and work with it.

 I don't quite follow your first example since I am barely familiar
 with ruby syntax (though it appears to be similar to java) what is the
 |s| ?
 Your second example seems to be much closer to what I need since there
 are MANY spans on the page but only a handfull matching the regexp
 pattern I gave above.

 Would you be able to

[wtr-general] Re: Pulling hair out on screen scraping

2009-01-03 Thread Anna Gabutero


Hi Michael,

On Sat, Jan 03, 2009 at 10:31:38AM -0800, Bissquitt wrote:
 
 Regarding documentation, I read the Tutorial all the way through but
 it only hit on a few specific examples leaving out other commands all
 together. I've visited MANY ruby and watir sites and never once saw
 the .span command (does it just search for span tags? guess ill
 google it after this post) I never even found a site listing all the
 watir commands ( http://us.php.net/manual/en/function.abs.php ) as an
 example.

This is in the Watir wiki, which seems to be down at the moment, so
here's an alternative link: http://tinyurl.com/watirmethods

 In addition there are SO MANY tutorials and such online that
 are all very poorly done it makes finding a good one via google a
 needle in a haystack scenario. ie (oh great, you showed me that
 specific command, but showed me nothing about how that command works
 so unless I want to use it exactly the way you used it, its useless).

I wouldn't go so far as to call the tutorials poorly done but most
Watir tutorials do seem to be written with non-programmers in mind.
That said, Ruby is a full-fledged language, much like Java and C++ are,
so it would be out of scope for a Watir tutorial or Excel automation
tutorial to teach you language basics too.

However, if you installed Ruby using the one-click installer, you
already have most of the documentation you need in your computer.  To
see the APIs, you can use either fxri (a graphical interface to the
language documentation) or the rubygems rdoc server (a daemon that you
can access as http://localhost:8808/ through your browser).  I can't
give the exact location since I don't have access to a Windows computer
right now, but all of these are somewhere in the Ruby folder in the
Start Menu.  A copy of the Pickaxe book mentioned earlier should be in
there too.

 What on earth does .succ! do? It never tells me. The site, and most
 that ive seen, are written not to target new people and tutor them but
 to target advanced users with a more so heres a cool way to approach
 the problem approach. A simple ok, here is the the excel class, here
 are the comands in it and what they do, here is a syntax example
 would be far more helpful as it doesn't leave anything out. I'm still
 not sure if its possible to return what row the active cell is on.

Ruby interacts with Excel using an OLE automation object, which is more
of a Microsoft thing than a Ruby thing.  It is documented here:

http://msdn.microsoft.com/en-us/library/aa272268(office.11).aspx

 I don't quite follow your first example since I am barely familiar
 with ruby syntax (though it appears to be similar to java) what is the
 |s| ?

This is part of block syntax.  Look it up in the Pickaxe inside the
chapter called Containers, Blocks, and Iterators.

 I'm guessing span just looks for span tags?

Yes.

 I'm also guessing that (:id, /ctl/) looks for any span tag with an id
 matching /ctl/ ? (this is where im not following you as much)

Yes.

 what does the : in your example do?

It references the id symbol.  Without going into too much detail,
span(:id, /ctl/) is more efficient than span('id', /ctl/) due to the way
Ruby allocates memory for strings.  Don't worry too much about this,
just use it (and don't mix up symbols and strings).

 what exactly is the second
 argument doing, what are the slashes?

The slashes denote a regular expression, which means that it will match
any span whose id attribute contains 'ctl'.  You can compare this to
span(:id, 'ctl'), which will match only the span whose id attribute is
exactly equal to 'ctl'.

 and what does the .text at the end do?

It's a method call that returns the text inside the span tag.

 Sorry for being rather dense but I have barely delt with web
 programming before. I've spent my life doing C++, Java, and BASIC so
 I'm pretty much trying to stumble into a final product as gracefully
 as I can.

Don't overthink it.  With Ruby, you're still dealing with objects,
classes and methods, so your experience with OOP concepts should help
you.


HTH,
Anna


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Watir General group.
To post to this group, send email to watir-general@googlegroups.com
Before posting, please read the following guidelines: 
http://wiki.openqa.org/display/WTR/Support
To unsubscribe from this group, send email to 
watir-general-unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/watir-general
-~--~~~~--~~--~--~---

[wtr-general] Re: Pulling hair out on screen scraping

2009-01-03 Thread Charley Baker

It can be a bit overwhelming to learn Ruby and various libraries at the same
time. I'd recommend taking a look at the Pickaxe book:
http://whytheluckystiff.net/ruby/pickaxe/   just to get some general
familiarity. There are other Ruby tutorials online as well as some good
books - The Ruby Way, Everyday Scripting, OReilly's Ruby book.
succ! as you mention below is a Ruby core method. Gotapi also has a good
searchable reference to Ruby standard api. http://www.gotapi.com/html  click
on the Ruby Standard Packages. The pickaxe book from the link above also has
an index of the core api, many with examples.
Here's a link to the Watir rdocs in case you might find that useful.
http://wtr.rubyforge.org/rdoc/ and a link to supported elements(though
openqa is down right now):
http://wiki.openqa.org/display/WTR/Methods+supported+by+Element

Strange that the hpricot site is down now as well.

Another useful way to learn how to use libraries in Ruby is by taking a look
at their unit tests. Watir has a large number of unit tests, hpricot has
some too. They're located under your ruby install directory in gems.

Ruby comes with a few documentation systems: ri and rdoc. For the gems you
have installed locally you can see all of the rdocs by going to the command
line, type:
gem server
Then browse to http://localhost:8808
ri can also be used from the command line:
ri String::succ!

Additional responses inline:



On Sat, Jan 3, 2009 at 10:31 AM, Bissquitt bissqu...@gmail.com wrote:


 Regarding documentation, I read the Tutorial all the way through but
 it only hit on a few specific examples leaving out other commands all
 together. I've visited MANY ruby and watir sites and never once saw
 the .span command (does it just search for span tags? guess ill
 google it after this post) I never even found a site listing all the
 watir commands ( http://us.php.net/manual/en/function.abs.php ) as an
 example. In addition there are SO MANY tutorials and such online that
 are all very poorly done it makes finding a good one via google a
 needle in a haystack scenario. ie (oh great, you showed me that
 specific command, but showed me nothing about how that command works
 so unless I want to use it exactly the way you used it, its useless).
 My example here is the ruby on windows site. If I google for
 anything regarding ruby and excel I either get that site, or another
 site that just provides me a link to that site and am forced to make
 due with that site in order to teach myself how to interact with
 excel. The site itself lists a BUNCH of examples but leaves it up to
 you to try and pick apart the syntax to understand what it is doing.
 For example:

 line = '1'
 while worksheet.Range(a#{line})['Value']
   line.succ!
 end
 #line now holds row number of first empty row

 What on earth does .succ! do? It never tells me. The site, and most
 that ive seen, are written not to target new people and tutor them but
 to target advanced users with a more so heres a cool way to approach
 the problem approach. A simple ok, here is the the excel class, here
 are the comands in it and what they do, here is a syntax example
 would be far more helpful as it doesn't leave anything out. I'm still
 not sure if its possible to return what row the active cell is on.


Excel is a strange one. :) Agreed that most sites assume a basic familiarity
with Ruby, and with the links above you should be able to get into it fairly
quickly. Accessing Excel is done through it's COM interface, so one of the
best sources of documentation is actually the Excel VBA Microsoft help file.
There's a link to the standalone version of it somewhere on the internets if
you don't have it installed. There are some excel libraries on our wiki as
well as a project on Rubyforge called Rasta which use Excel. You can browse
through the source code for those.



 ...Which is when I decided to ask actual people and ended up here.
 (thanks again btw)


 ...After that long winded response, I was trying to using Watir to
 scrape the page because I was having issues with the the javascript
 not being executed before the scrape (when i did it in php) and
 figured that a driven web brower would be sure to get it...hence
 watir.


Yep, makes sense. Watir is great at testing heavy js sites, ajaxy stuff and
the generated DOM instead of the page source.




 The reason my example was not using watir is because I was unable to
 find any documentation on how to do what I needed. I saw the
 browser.links and browser.table but those were the only 2 I found,
 there was no, here is a list of the commands as I mentioned above.
 Consiquently I found even less on hpricot since all I get is a 404 on
 its main site, and every other site links to it so wether or not it
 was documented is irrelevent, all I have to work with is trying to
 piece together other peoples code and work with it.

 I don't quite follow your first example since I am barely familiar
 with ruby syntax (though it appears to be similar to

[wtr-general] Re: Pulling hair out on screen scraping

[wtr-general] Re: Pulling hair out on screen scraping

[wtr-general] Re: Pulling hair out on screen scraping

[wtr-general] Re: Pulling hair out on screen scraping

[wtr-general] Re: Pulling hair out on screen scraping

[wtr-general] Re: Pulling hair out on screen scraping

[wtr-general] Re: Pulling hair out on screen scraping

[wtr-general] Re: Pulling hair out on screen scraping

[wtr-general] Re: Pulling hair out on screen scraping

[wtr-general] Re: Pulling hair out on screen scraping

[wtr-general] Re: Pulling hair out on screen scraping

11 matches

Site Navigation

Mail list logo

Footer information