On Sun, Apr 4, 2010 at 12:31 PM, Michael Rasmussen <[email protected]> wrote: > > On Sun, Apr 04, 2010 at 12:10:03PM -0700, drew wymore wrote: >> I have a large data set that is being exported from an Oracle DB, >> unfortunately I can't work with the data directly in Oracle or this >> wouldn't be a problem. I can export it as CSV and work with it. >> ... I don't really care which language I >> do it in and whether I do it directly from csv or a database source >> other than Oracle (because I can't). >> >> Any clue sticks, ideas or links to something that might help me solve >> this problem appreciated. > > With apologies to Randal... > > Assume you export to CSV and, for the purposes of this simple example there > are no text fields that have commas embedded. > > And if the data of interest is in the third column: > > 3,14,word,blah,blech,bz > 4,18,term,more,stuff > > then: > > perl -ne '@F=split /,/; $words{$F[2]}++; \ > END{ foreach $word (sort { $words{$a} <=> $words{$b} } keys %words) \ > { print "$word\t$word_appearance{$word}\n"; } } ' file_of_data.cvs > > Assuming you want it sorted by word frequency. > > Disclaimer: I'm at my in-laws for easter dinner and didn't test that. > I'm reasonably sure that it's close enough that any gaps will serve > as an exercise for the reader. > > -- > Michael Rasmussen, Portland Oregon > Trading kilograms for kilometers since 2003 > Be appropriate && Follow your curiosity > http://www.jamhome.us/ > The Fortune Cookie Fortune today is: > At once it struck me what quality went to form a man of achievement, > especially in literature, and which Shakespeare possessed so enormously > -- I mean negative capability, that is, when a man is capable of being > in uncertainties, mysteries, doubts, without any irritable reaching > after fact and reason. > -- John Keats > _______________________________________________ > PLUG mailing list > [email protected] > http://lists.pdxlinux.org/mailman/listinfo/plug >
Thanks Rich and Michael. I'll give the perl a shot and see what happens. As far as the data layout. It's 5 columns with roughly 1100 rows, the column I'm interested in has a variable number of words per entry but doesn't exceed a couple hundred words. I did enable fulltext searching within mysql which works fine for searching but doesn't give me the flexibility I'm looking for to actually just get a count of unique words. I did find something in PHP that is supposed to work but it's barfing on the array that's being returned by the mysql query. Drew- _______________________________________________ PLUG mailing list [email protected] http://lists.pdxlinux.org/mailman/listinfo/plug
