Re: [PLUG] Data extraction

Michael Rasmussen Sun, 04 Apr 2010 12:31:55 -0700

On Sun, Apr 04, 2010 at 12:10:03PM -0700, drew wymore wrote:
> I have a large data set that is being exported from an Oracle DB,
> unfortunately I can't work with the data directly in Oracle or this
> wouldn't be a problem. I can export it as CSV and work with it. 
> ... I don't really care which language I
> do it in and whether I do it directly from csv or a database source
> other than Oracle (because I can't).
> 
> Any clue sticks, ideas or links to something that might help me solve
> this problem appreciated.


With apologies to Randal...

Assume you export to CSV and, for the purposes of this simple example there
are no text fields that have commas embedded.

And if the data of interest is in the third column:

  3,14,word,blah,blech,bz
  4,18,term,more,stuff

then:

  perl -ne '@F=split /,/; $words{$F[2]}++; \
    END{ foreach $word (sort { $words{$a} <=> $words{$b} } keys %words) \
    { print "$word\t$word_appearance{$word}\n"; } } ' file_of_data.cvs

Assuming you want it sorted by word frequency.

Disclaimer:  I'm at my in-laws for easter dinner and didn't test that.
I'm reasonably sure that it's close enough that any gaps will serve
as an exercise for the reader.

-- 
      Michael Rasmussen, Portland Oregon  
  Trading kilograms for kilometers since 2003
    Be appropriate && Follow your curiosity
          http://www.jamhome.us/
The Fortune Cookie Fortune today is:
At once it struck me what quality went to form a man of achievement,
especially in literature, and which Shakespeare possessed so enormously
-- I mean negative capability, that is, when a man is capable of being
in uncertainties, mysteries, doubts, without any irritable reaching
after fact and reason.
                -- John Keats
_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug

Re: [PLUG] Data extraction

Reply via email to