Also here is what the name finder found after having been trained on 20000 lines of text of the format i just sent you...The file attached contains the results after running the trained model on the same 200000 lines (without the sgml tags of course) - i do know that this is discouraged!

See attachment....

On 08/02/12 17:09, Joern Kottmann wrote:
On Wed, Feb 8, 2012 at 5:56 PM, Jim - FooBar();<jimpil1...@gmail.com>wrote:

aaa ok i see what you mean...but then again if it recognised it as a mere
token it would not throw "IncompatibleFormat" exceptions but rather skip it
as a token that is not of interest wouldn't it? I don't have any patches to
send you, i just think that not including spaces in the sgml tag is a more
wise approach...Unless of course you're extracting the sgml tags via
regex...The truth is i've not looked at the source but i would expect you
to use some sort of xml-ish means to extract the sgml tags. If your parser
is using regex then i'm sure you have your reasons for including the
spaces. But anyway, this is a very small problem for me cos i can indeed
sort it manually...My big problem still remains!!!

The code splits the input string by line and then by white space. Then the
individual parts either
match our start and end tags or not.



Anyway I'll stop bugging you...the fact that you tried to help means a lot
and certainly if i sort everything out i'll post what the problem was for
future users...


We are also interested why it does not work for you, we usually use this
kind of experience to
improve OpenNLP.

Would it be possible for you to show us a sample of your training data?
Maybe one paper.

Jörn


(("Pfizer")
 ("1")
 ("L-glutamine")
 ("Acetic")
 ("Fluorescein")
 ("NADH")
 ("AMPA")
 ("Toluene")
 ("L-tyrosine" "N")
 ("Domoic")
 ("AMPA")
 ("AMPA")
 ("AMPA")
 ("N-α-methylhistamine")
 ("Tamoxifen")
 ("NADH")
 ("Tamoxifen")
 ("Tamoxifen")
 ("Tamoxifen")
 ("Tamoxifen")
 ("Tamoxifen")
 ("Tamoxifen")
 ("Ampicillin" "Erythromycin")
 ("Domoic")
 ("Methamphetamine")
 ("Domoic")
 ("AMPA")
 ("AMPA")
 ("Domoic")
 ("AMPA")
 ("Vancomycin")
 ("Gentamicin")
 ("Domoic")
 ("Domoic")
 ("Domoic")
 ("Domoic")
 ("Virginia")
 ("BZD")
 ("Atorvastatin" "L-Methionine")
 ("Atorvastatin")
 ("L-Methionine")
 ("Atorvastatin" "L-Methionine")
 ("L-Methionine")
 ("Atorvastatin" "L-Methionine")
 ("Atorvastatin")
 ("L-Methionine")
 ("Atorvastatin")
 ("L-Methionine")
 ("L-Methionine" "Atorvastatin" "L-methionine")
 ("L-Methionine" "L-methionine")
 ("L-Methionine" "Acetylcholine")
 ("Atorvastatin")
 ("Atorvastatin")
 ("L-Methionine")
 ("Atorvastatin")
 ("L-Methionine")
 ("Atorvastatin")
 ("L-Methionine")
 ("Atorvastatin")
 ("Atorvastatin")
 ("L-Methionine")
 ("Atorvastatin" "L-methionine")
 ("Atorvastatin")
 ("Atorvastatin" "L-Methionine")
 ("Atorvastatin")
 ("L-Methionine")
 ("L-Methionine")
 ("Atorvastatin" "L-Methionine")
 ("Atorvastatin")
 ("Atorvastatin" "L-Methionine")
 ("Atorvastatin")
 ("L-Methionine")
 ("L-Methionine")
 ("Atorvastatin" "L-Methionine")
 ("Atorvastatin")
 ("Atorvastatin" "L-Methionine")
 ("Atorvastatin")
 ("L-Methionine")
 ("L-Methionine")
 ("Atorvastatin" "L-Methionine")
 ("Atorvastatin")
 ("Atorvastatin" "L-Methionine")
 ("Atorvastatin")
 ("L-Methionine")
 ("Atorvastatin" "L-Methionine")
 ("Atorvastatin")
 ("L-Methionine")
 ("L-Methionine")
 ("Atorvastatin" "L-Methionine")
 ("Atorvastatin" "Simvastatin")
 ("Atorvastatin")
 ("Atorvastatin")
 ("L-Methionine")
 ("Atorvastatin")
 ("Atorvastatin" "Simvastatin")
 ("Atorvastatin" "L-Methionine")
 ("L-Methionine")
 ("L-Methionine")
 ("Atorvastatin" "L-Methionine")
 ("L-Methionine")
 ("L-Methionine")
 ("Atorvastatin")
 ("Atorvastatin")
 ("Atorvastatin" "L-Methionine")
 ("A")
 ("However")
 ("NADH")
 ("Salmon")
 ("Calcium")
 ("Verapamil")
 ("Brimonidine")
 ("Cisplatin")
 ("Gemfibrozil")
 ("Ibuprofen")
 ("Ibuprofen")
 ("Ibuprofen")
 ("Adenosine")
 ("L-glutamine")
 ("L-glutamine")
 ("Biotin")
 ("Nalidixic")
 ("Mannitol")
 ("Domoic")
 ("AMPA")
 ("AMPA")
 ("Gilz")
 ("AMPA")
 ("E")
 ("C")
 ("Filgrastim")
 ("Filgrastim")
 ("Filgrastim")
 ("Filgrastim")
 ("interferon")
 ("Streptomycin")
 ("glucose-6-phosphate")
 ("glucose-6-phosphate")
 ("Bromfenac")
 ("5")
 ("L-carnitine")
 ("L-carnitine")
 ("L-carnitine")
 ("Doxorubicin")
 ("glucose-6-phosphate")
 ("Glutathione")
 ("Atorvastatin")
 ("Haloperidol")
 ("Haloperidol" "Aripiprazole" "Clozapine" "Olanzapine" "Risperidone")
 ("Haloperidol")
 ("Haloperidol")
 ("Haloperidol")
 ("Aripiprazole" "Clozapine" "Haloperidol" "Olanzapine" "Risperidone")
 ("Haloperidol")
 ("Risperidone" "Clozapine")
 ("Risperidone")
 ("Risperidone")
 ("Haloperidol" "Olanzapine")
 ("Haloperidol")
 ("Haloperidol")
 ("Haloperidol")
 ("Haloperidol")
 ("Haloperidol")
 ("Haloperidol")
 ("Haloperidol")
 ("Haloperidol")
 ("Haloperidol")
 ("Haloperidol")
 ("Haloperidol")
 ("Haloperidol")
 ("Haloperidol")
 ("Haloperidol")
 ("Haloperidol")
 ("Risperidone")
 ("Haloperidol" "Olanzapine")
 ("Haloperidol")
 ("Haloperidol")
 ("Diazepam")
 ("Batimastat")
 ("MMPIs")
 ("3")
 ("Indomethacin")
 ("Indomethacin")
 ("Diclofenac")
 ("Ketorolac")
 ("Nepafenac")
 ("Tolmetin")
 ("Ibuprofen")
 ("Ketoprofen")
 ("Naproxen")
 ("Naproxen")
 ("Naproxen")
 ("Oxaprozin")
 ("Suprofen")
 ("Piroxicam")
 ("Ergotamine")
 ("Lysergic")
 ("N-demethylation" "O-demethylation")
 ("Sitagliptin")
 ("Fig")
 ("Mephenytoin")
 ("IgE")
 ("Trabectedin")
 ("Trabectedin")
 ("Calcium")
 ("Calcium")
 ("Guanosine")
 ("Glutathione")
 ("Glutathione")
 ("Urea")
 ("Urea")
 ("AMPA")
 ("Ibuprofen")
 ("Indole")
 ("Acetic")
 ("Hymenialdisine")
 ("Nalidixic")
 ("Calyculin")
 ("Vitamin")
 ("Vitamin")
 ("Vitamin")
 ("Bacitracin")
 ("Ticlopidine")
 ("Montelukast")
 ("Montelukast")
 ("Ticlopidine")
 ("Ticlopidine")
 ("Ticlopidine")
 ("Ticlopidine")
 ("Montelukast")
 ("Morphine")
 ("3")
 ("NADH")
 ("LDH-A")
 ("NADH")
 ("NADH")
 ("Platensimycin")
 ("O")
 ("Nitric")
 ("L-tryptophan" "L-leucine")
 ("Ethanol")
 ("D-glutamic")
 ("L-arginine")
 ("Acetylcysteine")
 ("Tramadol")
 ("Glycerol")
 ("Cholesterol")
 ("The")
 ("Trehalose-6-Phosphate")
 ("BA-TPQ")
 ("NADH")
 ("Epigallocatechin")
 ("PSTs")
 ("PPARs")
 ("Clofibrate")
 ("Clofibrate")
 ("Clofibrate")
 ("Clofibrate")
 ("Clofibrate")
 ("Clofibrate")
 ("NADH-dependent")
 ("Ethanol")
 ("Thiopental")
 ("Ethanol")
 ("Diazepam")
 ("Cholesterol")
 ("Naproxen")
 ("Naproxen")
 ("Naproxen")
 ("Naproxen")
 ("Digitoxin")
 ("Naproxen")
 ("Glutathione")
 ("Gemcitabine")
 ("Gemcitabine")
 ("Ac-DEVD-CHO")
 ("Adenosine")
 ("N-ethylmaleimide")
 ("Guanosine" "Calcium")
 ("Glycine")
 ("Glycine")
 ("Colchicine")
 ("Colchicine")
 ("Cisplatin")
 ("Morphine")
 ("Baclofen")
 ("Heme")
 ("Reserpine")
 ("Reserpine")
 ("Cardioxane")
 ("Daunorubicin")
 ("Methotrexate")
 ("Vincristine")
 ("Pro")
 ("Dau")
 ("Ethanol")
 ("Ethanol")
 ("Ethanol")
 ("Hesperidin")
 ("Hesperidin")
 ("Hesperidin")
 ("Hesperidin")
 ("Hesperidin")
 ("Magnesium")
 ("Magnesium")
 ("Journal")
 ("Fluorescein")
 ("AMPA")
 ("AMPA")
 ("AMPA")
 ("AMPA")
 ("AMPA")
 ("AMPA")
 ("AMPA")
 ("AMPA")
 ("AMPA")
 ("AMPA")
 ("Indomethacin")
 ("Heme")
 ("Simvastatin")
 ("NADH")
 ("NADH"))

Reply via email to