On Thu, Nov 26, 2009 at 10:01 PM, Robert lzw <[email protected]> wrote:
> Hello folks,
>
> I want to build a SQL database based on data from web page link as the this 
> one:
> http://tubic.tju.edu.cn/deg/information.php?ac=DEG10010001
>
> Since the data would be collected from thousands of such link, I want
> to write code for doing it automatically. Can anyone suggest how to
> deal with the following tasks?
>
> (1) With the above link, how can I store the corresponding data to the
> SQL database, assuming the database has identical fields (Access
> Number, Gene Name, etc.) as the above link?
>
> (2) After processing the above link, how can I open a new link
> automatically, for example,
> http://tubic.tju.edu.cn/deg/information.php?ac=DEG10010002, and doing
> the same thing as in step (1).
>
> (3) How can I repeat steps (1) and (2) for all the pages I want to handle.
>
> Any suggestions and recommendation of framework, book and online
> source for doing it would be highly appreciated.

I'd say Wt is not the right tool here. You'd better use something to
parse o scrape web pages.

If you want to use C++, I'd use use Qt:
- QNetworkAccessManager and QUrl to access and/or download the pages
- QtWebkit from Qt 4.6 (particularly
http://doc.trolltech.com/4.6-snapshot/qwebelement.html , which is new
in Qt 4.6) to parse the data
- QtSql to insert data in a database
- You can implement a thread pool (use QThread) and parallellize
fetching and processing.

Other options are Ruby, using ScrAPI or Hpricot to easily parse the
webpages. Or Python. There are essentially infinite options.

-- 
Pau Garcia i Quiles
http://www.elpauer.org
(Due to my workload, I may need 10 days to answer)

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
witty-interest mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/witty-interest

Reply via email to