from:"Devin Weaver"

Re: html file to cvs

2006-09-06 Thread Devin Weaver

I don't fully understand what you mean by a cvs file whether that  
refers to a congruent visioning file or if you meant a comma  
separated values file. Based on the sample output I'm assuming a CSV  
file using semi-colons.


I choose PERL at the Swiss-Army knife of scripts and was able to whip  
up a parser in about fifteen minutes. attached is what I came up with.


I left the loading of multiple files to the student. I used mainly  
regular expressions so it could be ported to VIM script in theory but  
this type of parsing would be better suited for a scripting language  
not an editor.


Hope this gives some inspiration.

On Sep 6, 2006, at 06:14, Nikolaos A. Patsopoulos wrote:
I have a huge pack of html files (1000) and I want to extract some  
info on cvs files.


#!/usr/bin/perl

# Very simple script to parse a specific styled HTML document and output a file
# parsed with a delimiter.
# 
# The folowing are the settings. Pick what you need. Using command line
# arguments left for the student.

$file = portal_002.htm;
$output = out.csv;
$csv_delim = ';';
$quiet = 0; # set this to 1 to stop debug output

$months_pat = (JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC);

##
sub msg
{
my $str = shift;
my $line_no = shift;

if (!$quiet)
{
	print $str;
	if ($line_no ne )
	{
	print  (line: $line_no);
	}
	print \n;
}
}

$line_no = 0; # used to track the line number.
open FD, $file || die Could not open file;
open OUT, $output || die Unable to open output file;
while ($line = FD)
{
$line_no++;
if ($line =~ /Source:/i)
{
	$line =~ /$months_pat\s+[0-9]+\s+([0-9]+)/i;
	$year = $2;
	msg (Found 'Source:'; Year = $year, $line_no);
}
elsif ($line =~ /Addresses:/i)
{
	$line =~ /a(\s.+?)?(.+?)\/a/i;
	$univ = $2;
	$univ =~ s/^\s+//;
	$univ =~ s/(\s+|[,;])$//;
	# pull out the HTML amp;
	$univ =~ s/amp;//gi;
	msg (  Child Found 'Addresses:'; Univ = $univ, $line_no);
	# Since this should be the end of the record write to file.
	print OUT $year$csv_delim$univ$csv_delim\n;
}
}
close OUT;
close FD;
msg (Done. (Parsed $line_no lines) CSV output to $output, );

Re: Subversion Access

2006-09-05 Thread Devin Weaver

I still can't figure out what it was I even tried to upgrade my  
router's firmware. Anyway I worked around it like Bill suggested to  
repeat to procedure (I think it was 2 or three times) and it finished.


Thanks.

Re: html file to cvs

Re: Subversion Access

2 matches

Site Navigation

Mail list logo

Footer information