On Sun, Apr 04, 2010 at 12:10:03PM -0700, drew wymore wrote:
> I have a large data set that is being exported from an Oracle DB,
> unfortunately I can't work with the data directly in Oracle or this
> wouldn't be a problem. I can export it as CSV and work with it.
> ... I don't really care which language I
> do it in and whether I do it directly from csv or a database source
> other than Oracle (because I can't).
>
> Any clue sticks, ideas or links to something that might help me solve
> this problem appreciated.
With apologies to Randal...
Assume you export to CSV and, for the purposes of this simple example there
are no text fields that have commas embedded.
And if the data of interest is in the third column:
3,14,word,blah,blech,bz
4,18,term,more,stuff
then:
perl -ne '@F=split /,/; $words{$F[2]}++; \
END{ foreach $word (sort { $words{$a} <=> $words{$b} } keys %words) \
{ print "$word\t$word_appearance{$word}\n"; } } ' file_of_data.cvs
Assuming you want it sorted by word frequency.
Disclaimer: I'm at my in-laws for easter dinner and didn't test that.
I'm reasonably sure that it's close enough that any gaps will serve
as an exercise for the reader.
--
Michael Rasmussen, Portland Oregon
Trading kilograms for kilometers since 2003
Be appropriate && Follow your curiosity
http://www.jamhome.us/
The Fortune Cookie Fortune today is:
At once it struck me what quality went to form a man of achievement,
especially in literature, and which Shakespeare possessed so enormously
-- I mean negative capability, that is, when a man is capable of being
in uncertainties, mysteries, doubts, without any irritable reaching
after fact and reason.
-- John Keats
_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug