On Sun, 4 Apr 2010, drew wymore wrote:

> Thanks Rich and Michael. I'll give the perl a shot and see what
> happens. As far as the data layout. It's 5 columns with roughly 1100
> rows, the column I'm interested in has a variable number of words per
> entry but doesn't exceed a couple hundred words.
>
> I did enable fulltext searching within mysql which works fine for
> searching but doesn't give me the flexibility I'm looking for to
> actually just get a count of unique words. I did find something in PHP
> that is supposed to work but it's barfing on the array that's being
> returned by the mysql query.
>
> Drew-

If you want a routine that will correctly parse a line from a CSV file
into a list of individual fields, try the following. You will have to
translate it into your language du jour (this is Common Lisp). It
correctly handles quoted fields that contain commas within the text.

ppcre is a library for regular expressions. If you use perl, you can
easily use the native regex support. All I'm doing there is replacing
newline with nothing. (Maybe chomp() is sufficient?) Remember that
this function handles a single line of input. If there is a newline in
there, it's because the file came from a Windows machine. You'll see
^M at the end of the line in Emacs.

(defun parse-csv-line (line)
   "Parses a single line of CSV input into a list of string fields."
   (let ((quoted-string-mode nil)
         (line-list ())
         (field-collector ""))
     (loop for this-char across (ppcre:regex-replace-all #\Return line "") do
          (cond ((equal this-char #\")
                 (setf quoted-string-mode (not quoted-string-mode)))
                ((and (equal this-char #\,) (not quoted-string-mode))
                 (push field-collector line-list)
                 (setf field-collector ""))
                (t
                 (setf field-collector (format nil "~a~a" field-collector 
this-char)))))
     ;; get the field after the last comma
     (push field-collector line-list)
     (nreverse line-list)))

This function is just a piece of a complete CSV reading and parsing
solution. You'll need to open a file, read in a line at a time, and
hand the entire line as one long string to this function. All that is
easy to do in perl; that's what perl is made for. It might look like
this:

open(FILEHANDLE, "<somefile.csv") || die("Could not open somefile.csv for 
reading: $!");
while(my $line = <FILEHANDLE>) {
     chomp($line);
     my @lineList = &parseCsv($line);
     [ do stuff with @lineList ]
}

Carlos
_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug

Reply via email to