A user reported observing some cases of overflow in Fisher's exact test in
the Ngram Statistics Package (both the left and right variations). My own
conclusion is that there is a bit of rounding error at work here, since we
are summing together a potentially large number of hyper-geometric
probabilities to arrive at the values. So, it's not an alarming situation,
but certainly one that needs to be fixed. Below you can see some specific
cases of overflow:

Right Fisher output:

934064
cat:cc<>h_position_direction:-<>2 1.0490 1728 20006 169317
h_role:object<>relative_position:3<>3 1.0050 144 68362 15501
h_cat:jj<>h_role:locative<>4 1.0032 511 48419 35842
h_group_type:na<>cat:pos<>5 1.0000 14 59756 8709
h_group_type:na<>cat:sym<>5 1.0000 38 59756 6910

Left Fisher output:

934064
cat:nn<>h_relative_position:3<>1 1.0916 801 301347 1890
h_group_type:np<>role:predeterminer<>2 1.0390 1133 445387 1135
group_type:na<>leafp:na<>3 1.0000 1 1 1
group_type:na<>h_leafp:na<>3 1.0000 1 1 59756
h_group_type:na<>leafp:na<>3 1.0000 1 59756 1

The good news is that the Ngram Statistics Package is in line for a long
overdue facelift that will commence in August - there are a number of long
pending issues that will be resolved at that time, and some new
enhancements and features. As we get closer to starting that work, I'll be
posting our list of reported problems, etc. in order to make sure we
have caught everything. And of course, please feel free to let us know
of any other questions or concerns.

Ted

--
Ted Pedersen
http://www.d.umn.edu/~tpederse




 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/ngram/

<*> To unsubscribe from this group, send an email to:
    [EMAIL PROTECTED]

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 


Reply via email to