On approximately 1/8/2004 9:04 AM, came the following characters from
the keyboard of #SHUCHI MITTAL#:
Hi all
Since everyone here is a perl expert and im a total newbie i would be very very grateful if someone could help me out with my doubts.
Indeed everyone here is a perl expert... but this list is focused on use
of Win32::GUI. So when you have broader questions, rather than GUI
related questions, using a different forum is appropriate, and you will
get a much broader range of experts to give you help.
If you are working on Win32, I'd suggest joining and submitting this
type of question to [EMAIL PROTECTED]
I am doing a project to develop a student professor system including databases etc. To start off I need lots of professor data from various websites of educational institutions( for populating my database) . To extract this data and get started I decided to use perl since its text extraction capabilities are known to one n all.
The problem is all these sites have a totally different HTML format and structure and differ in which the info of all profs is listed, and I cant seem to come up with a generic PERL code to extract this data and put it in text files on my local hard disk. Therefore I think ill need to use REGEX and PATTERN MATCHING to do the task but im not sure how to go about it. I wrote one code that works with www.ntu.edu.sg/sce/staffacad.asp but this is way to specific and doesnt work with any other staff sites.!
I need to do the following:
1. Visit the base site of any institute and extract professor information which includes NAME,EMAIL,DEGREE,RESEARCH INTERESTS AND PUBLICATIONS RELEASED
2. For publications the listing either appears via a link on the profs homepages or as a chunk of data under the heading "PUBLICATIONS" etc. I think i can get the data if its via a link but i dunno hoe to extract that exact chunk in the middle of a page.
3. All this info shud be extracted to external text files
I can manage if someone just helps me with snippets of code to gt started with the extraction...accurate extraction of information from any random site of a intitution which has profs listed etc.
For example some sites are www.ntu.edu.sg/sce/staffacad.asp ,
http://www.ntu.edu.sg/eee/people/, http://www.ie.cuhk.edu.hk/index.php?id=6,
http://www.ntu.edu.sg/mpe/admin/staff.asp
Greatly appreciate any help in any direction...totally lost here..please feel free to ask if u have any doubts regarding my question!
shuchi
-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Perl-Win32-GUI-Users mailing list
Perl-Win32-GUI-Users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perl-win32-gui-users
--
Glenn -- http://nevcal.com/
===========================
The best part about procrastination is that you are never bored,
because you have all kinds of things that you should be doing.