On 10/8/2011 3:58 PM, Charles Curley wrote: > On Sat, 8 Oct 2011 13:19:43 -0600 > "S. Dale Morrey"<[email protected]> wrote: > >> Ok I'll admit it, I suck at regex. >> Unfortunately, I now have the task of importing a customers catalog >> price list into a database and I'm not sure where to begin. >> I really think a simple regex could convert the whole thing to a CSV >> and I could then import the CSV directly. >> >> Here is an example of what I'm looking at. >> Item Item Description Page # Retail Member Wholesale Pkg >> 12137 HOLIDAY PENGUIN FIGURINES 335 9.95 4.25 5.95 1 PR >> >> As you can see all the fields are separated by whitespace, but then >> again so are the individual words in the description. > I wonder what kinds of white space? For example, are the fields > delimited by tabs, so that the item description field can contain > spaces. If so, a simple split on \t would do you nicely. > > Perl has an excellent CSV library, which would make exporting to CSV > very easy. > If you have a properly formatted csv file coming in cpan has some great tools but i've yet to see one from a client. I do this kind of thing often and i usually end up doing this. I don't know your db defs so i left that part out. This is perl btw so if you were to add a dbi connection above and alter the names you could run this on any linux powered terminal. Assuming you have the dbi library from cpan. |open FILE, ">filename.txt" or print $!;| while(<FILE>){ my @line = split(/\W/,$_); $db->do("insert into tablename (column names) values ('$line[0]','$line[1]'...)"); }| close FILE;
|If \W is too aggressive \s is a good alternative but if they use tabs and you know they are using it then \t|. from what you sent it does look like they are using tabs to separate columns based on the description having more than one space in it. So you will probably end up splitting on \t. you may need to concatenate some values on the insert. | /* PLUG: http://plug.org, #utah on irc.freenode.net Unsubscribe: http://plug.org/mailman/options/plug Don't fear the penguin. */
