W dniu 2012-01-14 01:39, Jimmy O'Regan pisze:
2012/1/13 Dominique Pellédominique.pe...@gmail.com:
I see that Ukrainian tokenizer uses a MySpell
(UkrainianMyspellTagger class) which reads and
parses a text file dist/resource/uk/ukrainian.dict
of 1,841,900 bytes, so that's not fast. It is
W dniu 2012-01-26 19:02, Mike Unwalla pisze:
Hello All,
I want to ignore a warning if a particular multi-word term is in a
sentence. For example, I want to find 'deploy', except if the term
'drogue chute' is in the sentence.
'Warn for misused word, except in correct context'
W dniu 2012-02-04 16:14, Wojciech Górnaś pisze:
Hello!
Looking for a programmer to help me create a simple spelling rule for
Polish.
There is a set of words that are supposed to be spelled together but are
often misspelled as separate words. Such mistakes are relatively common
and standard
W dniu 2012-02-04 19:54, Daniel Naber pisze:
On Samstag, 4. Februar 2012, Wojciech Górnaś wrote:
I thought about a tool that would allow adding
new pairs of words to one general rule. Do you think that's possible?
You could write a Java rule, i.e. a class that extends the Rule class and
W dniu 2012-02-19 15:30, Daniel Naber pisze:
Hi,
I just tried using LT with LibreOffice 3.5 and it didn't work for English.
This seems to be caused by the embedded grammar checker (Lightproof). Does
anybody know whether it's still possible to activate LT for English?
Interesting, it seems
W dniu 2012-02-19 21:07, Daniel Naber pisze:
On Sonntag, 19. Februar 2012, Daniel Naber wrote:
Interesting, it seems that LO 3.5rc3 works with LT here in English,
even when Lightproof is activated. Weird.
Did you check the pattern rules? A lot of the Lightproof rules come from
LT so it
W dniu 2012-02-26 17:34, Dominique Pellé pisze:
Daniel Naber wrote:
On Sonntag, 26. Februar 2012, Dominique Pellé wrote:
Is there a way to tell
LanguageTool
that all therule in arulegroup are mutually exclusive?
I don't think so. But why limit this to rulegroups? What about iterating
W dniu 2012-04-24 21:13, Serkan Kaba pisze:
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
There's also http://kenai.com/projects/jmyspell which is a pure Java
implementation though it seems to be discontinued.
This one does not include the features of hunspell that we need for
languages
W dniu 2012-04-24 18:53, Daniel Naber pisze:
On Dienstag, 24. April 2012, Yakov Reztsov wrote:
Or use our fsa dictionary?
That won't work well for languages with compounds, like German. I'm all for
including the JNI version of hunspell, as the license seems to be okay.
+1
For converters
Hi again,
W dniu 2012-04-24 17:50, Yakov Reztsov pisze:
May be try dren-dk / HunspellJNA?
http://dren.dk/hunspell.html
https://github.com/dren-dk/HunspellJNA
This code is licensed under LGPL/GPL/MPL.
I tried it yesterday. It works quite nicely but hunspell seems to
perform very slow
W dniu 2012-04-28 14:35, Daniel Naber pisze:
On Samstag, 28. April 2012, Marcin Miłkowski wrote:
I tried it yesterday. It works quite nicely but hunspell seems to
perform very slow compared to our fsa engine.
Are you sure it's not a problem of slow initialization or so? Can you post
some
W dniu 2012-04-29 02:12, Daniel Naber pisze:
On Samstag, 28. April 2012, Marcin Miłkowski wrote:
My impression was that it's slow because I could see lines showing up
on the console. I mean on every call, there was a System.Err printout,
and I could see lines printing.
Could you maybe
W dniu 2012-04-29 21:20, Daniel Naber pisze:
On Sonntag, 29. April 2012, Marcin Miłkowski wrote:
However, checking if the word is misspelled takes virtually no time.
It's only generating suggestions that seems to bog us down.
We could wait until a user actually requests the suggestions
Hi all,
I created a wiki page for people who want to use LT as a grammar checker
for (La)TeX:
http://languagetool.wikidot.com/checking-la-tex-with-languagetool
I could add LyX as a solution but this is much more tricky in using on
different platforms.
Best regards,
Marcin
in TexStudio, so hunspell complains at „OK” (which is in
Polish smart quotes). You should add all kinds of quotes to your word
tokenizer.
Best,
Marcin
Benito
On 05/07/12 19:20, Marcin Miłkowski wrote:
Hi all,
I created a wiki page for people who want to use LT as a grammar checker
for (La)TeX
this construct, you might add support for it at least as an option.
All in all, however, it's great to have LT in a major TeX editor! I will
add the info to the wiki.
Marcin
Benito
On 05/07/12 23:21, Marcin Miłkowski wrote:
Hi Benito,
W dniu 2012-05-07 20:23, Benito van der Zander
it might be a good idea to look at how
they do it.
Marcin
Benito
On 05/09/12 10:29, Marcin Miłkowski wrote:
W dniu 2012-05-09 00:33, Marcin Miłkowski pisze:
(but even if \cite is correctly removed, there remains an additional
space before the
point which LT may detect as error. And if I
W dniu 2012-05-16 22:28, gulp21 pisze:
As much as I hate passing the buck, I'm afraid writing such a rule is
beyond my (pretty much non-existent) Java skills.
I planned to write a Java rule for it, but I'm rather busy at the
moment. Unless somebody is quicker than me, or has a better idea,
Hi all,
I just added preliminary hunspell support to LanguageTool. This is not
yet ready for production, as essential parts are missing. But the
infrastructure and some files are already there.
Now, I didn't add all spelling dictionaries to the source - there's only
one for Polish right now
W dniu 2012-05-18 17:01, Ruud Baars pisze:
Marcin,
I must have missed some info. What is the Hunspell integration meant to
do exactly?
Find spelling mistakes. We have a Java rule that uses Hunspell for this
purpose now.
Marcin
Ruud
On 18-05-12 15:10, Marcin Miłkowski wrote:
Hi all
for Dutch in a Dutch
language Union sponsored project).
OK, that means you'd need a special class for Dutch.
Marcin
Ruud
On 18-05-12 17:27, Marcin Miłkowski wrote:
W dniu 2012-05-18 17:01, Ruud Baars pisze:
Marcin,
I must have missed some info. What is the Hunspell integration meant
the option -r through the library.
Marcin
Ruud
On 18-05-12 19:17, Daniel Naber wrote:
On Freitag, 18. Mai 2012, Marcin Miłkowski wrote:
Hi Marcin,
Now, I didn't add all spelling dictionaries to the source - there's
only one for Polish right now as I needed at least one for testing
W dniu 2012-05-18 19:17, Daniel Naber pisze:
On Freitag, 18. Mai 2012, Marcin Miłkowski wrote:
Hi Marcin,
Now, I didn't add all spelling dictionaries to the source - there's
only one for Polish right now as I needed at least one for testing.
The dictionaries should go to resources
W dniu 2012-05-18 22:57, Daniel Naber pisze:
On Freitag, 18. Mai 2012, Marcin Miłkowski wrote:
thanks for adding Hunspell support! What about moving the hunspell
dict to another sub directory hunspell in our sources to make more
obvious that this is just copied from a different project?
Well
link
to the installed dictionaries in LO/AOO without reinstalling them?
lp, m.
2012/5/18 Marcin Miłkowskilist-addr...@wp.pl:
W dniu 2012-05-18 22:57, Daniel Naber pisze:
On Freitag, 18. Mai 2012, Marcin Miłkowski wrote:
thanks for adding Hunspell support! What about moving the hunspell
dict
, Marcin Miłkowski wrote:
thanks for adding Hunspell support! What about moving the hunspell
dict to another sub directory hunspell in our sources to make more
obvious that this is just copied from a different project?
Well, you mean adding hunspell under resources?
I'd suggest using
src/resource
W dniu 2012-05-19 11:13, Dominique Pellé pisze:
Marcin Miłkowskilist-addr...@wp.pl wrote:
Hi all,
I just added preliminary hunspell support to LanguageTool. This is not
yet ready for production, as essential parts are missing. But the
infrastructure and some files are already there.
Hi
Hi again,
I've just split the builds, and it should work now out of the box for
Polish, also under JNLP (if you care to compile it, the online version
on our web page is not up-to-date - we usually wait for a stable release).
Daniel - it seems that your script for managing the snapshots
W dniu 2012-05-21 19:41, Daniel Naber pisze:
I could add stemming and analyzing options - I think it's interesting to
include it for analyzing compounds in German or Dutch.
We already have our own component that can do that for German
(jwordsplitter).
When reading the GSoC ideas on the wiki,
W dniu 2012-05-21 20:35, Daniel Naber pisze:
But is there anyone that does not want hunspell for a language? I mean
maintainers should be able to decide if they want it or not.
The problem is that some languages are not really maintained I guess... I
also think that all languages should get
W dniu 2012-05-21 23:41, Daniel Naber pisze:
On Montag, 21. Mai 2012, Marcin Miłkowski wrote:
Unfortunately, there is no way to disable rules when using the
server, and this is a major drawback in this case
The reason was probably that there once was only one LT object in the http
server
W dniu 2012-05-22 00:24, Marcin Miłkowski pisze:
My preliminary results (I will send them
later when they are ready in a formatted form) are that the hunspell
rule is around 25 times (!) slower than all other rules.
I profiled the rules. The data were correct, and you can see them on a
graph
Hello,
W dniu 2012-05-22 09:19, Mike Unwalla pisze:
Hello,
I don't know the implications of having hunspell, therefore, possibly, my
comment is nonsense.
I don't want LT to flag a word unless I tell LT to flag that word. If
hunspell causes LT flag a word, then hunspell will cause me
W dniu 2012-05-20 11:19, Jaume Ortolà i Font pisze:
I am already familiar with the methods for inflecting tokens. Now, in
order to make a suggestion, I need the lemma from one token and some
information of the POS tag of another token. In fact, this is rarely
necessary because this information
W dniu 2012-05-21 20:06, Marcin Miłkowski pisze:
I guess if every maintainer needs to add the files, a lot of languages won't
have support for spell checking in 1.8... Is this something you could do
for all languages? How large will the LT zip become then anyway?
I could, of course
W dniu 2012-05-22 23:43, Daniel Naber pisze:
On Dienstag, 22. Mai 2012, Marcin Miłkowski wrote:
It depends on how you are using LT: if from the command-line, then you
have no problem. If it's used as a web server, you might have a
problem, but I'm trying to figure out a good solution.
Two
W dniu 2012-05-24 00:29, Daniel Naber pisze:
On Mittwoch, 23. Mai 2012, Jan Schreiber wrote:
I agree with Daniel that that sounds suspiciously like too many clicks,
but otoh the 1300+ German rules are very difficult to navigate through
in the GUI without folding. The French ruleset is even
W dniu 2012-05-27 11:35, Jan Schreiber pisze:
However, when I use LanguageTool in the command line, I don't see
anything about the URL information:
For all I know, the URL is displayed in LibreOffice only.
I just added it to the commandline interface.
BTW, we might need to add a new
W dniu 2012-05-28 08:33, Ruud Baars pisze:
In Dutch, there are lots of errors because of typing the space too early
or late.
Spell ckecking gets most of them, because one of the words is incorrect
then. But suggestions ar worthless.
Even worse, it happens that both words are technically
That requires some additions to the AnalyzedToken and multiple other
places. But agreed, very useful. Will think of it.
28-05-2012 13:24 użytkownik Jaume Ortolà i Font jaumeort...@gmail.com
napisał:
I have the same difficulty with the disambiguation rules in Catalan. The
proposed solution would
W dniu 2012-05-30 00:07, Dominique Pellé pisze:
Dominique Pellédominique.pe...@gmail.com wrote:
Marcin Miłkowski wrote:
W dniu 2012-05-27 11:35, Jan Schreiber pisze:
However, when I use LanguageTool in the command line, I don't see
anything about the URL information:
For all
W dniu 2012-05-30 11:38, Ruud Baars pisze:
Sounds good. I think the smart compound detection of Hunspell would make it
possible to generate a words list.
Hard to get al these words checked and tagged though.
I will see. Takes more time than available.
I will take care of the process of
Hi again,
Most of the false alarms, by the way, on the first page, are not caused
by this rule but by inappropriate POS tagging or tokenization. I will
have a look.
I just (hopefully) fixed most of the false alarms. Daniel, could you
re-run the English rules on Wikipedia on our community
Hi,
W dniu 2012-06-01 11:47, Jaume Ortolà i Font pisze:
Hi,
When using 'unification', I sometimes find 'out of bounds' errors, and I
think this could be easily prevented. The next rule and
other similar rules seem to work fine with the examples. But when I test
it with the Wikipedia corpus,
, Marcin Miłkowski pisze:
I guess if every maintainer needs to add the files, a lot of languages
won't
have support for spell checking in 1.8... Is this something you could do
for all languages? How large will the LT zip become then anyway?
I could, of course. This is almost trivially easy. But I
W dniu 2012-06-01 17:22, Daniel Naber pisze:
On Freitag, 1. Juni 2012, Marcin Miłkowski wrote:
Daniel, could you
re-run the English rules on Wikipedia on our community site for me to
see if it really helped? If not, it'll be wiser to disable the rule by
default.
done:
http
W dniu 2012-06-02 13:08, Daniel Naber pisze:
On Samstag, 2. Juni 2012, Marcin Miłkowski wrote:
I also had a look on results on Brown corpus, most of them are false
alarms. It's best to disable the rule by default.
I just did. Thanks for hunting down the false alarams.
I fixed a lot of false
Hi all,
Now it is possible to suppress misspelled suggestions altogether in XML
rules by applying an attribute suppress_misspelled=yes on the
suggestion element, AND on the match element. If only match
element has this attribute set to yes, then the suggestion is
displayed, but no content of
W dniu 2012-06-04 11:49, Dominique Pellé pisze:
Marcin Miłkowskilist-addr...@wp.pl wrote:
Bad news: I just tried the Breton version of LanguageTool with
Hunspell. Something is wrong: it highlights as mistakes all
words containing apostrophes ' or dashes -. Such words are
very frequent in
W dniu 2012-06-04 19:16, Daniel Naber pisze:
On Montag, 4. Juni 2012, Marcin Miłkowski wrote:
There is no command line for hunspell library. It's not executed as
standalone program.
Ubuntu comes with such a program:
hunspell -d src/resource/nl/hunspell/nl_NL /tmp/test.txt
It also
W dniu 2012-06-05 12:48, Jaume Ortolà i Font pisze:
Hi,
In Catalan there is exactly the same problem you described here.
2012/6/4 Marcin Miłkowski list-addr...@wp.pl mailto:list-addr...@wp.pl
W dniu 2012-06-03 23:53, Dominique Pellé pisze:
Bad news: I just tried the Breton
W dniu 2012-06-06 09:22, Dominique Pellé pisze:
Marcin Miłkowskilist-addr...@wp.pl wrote:
Well, this was my main reason for not adding the dictionary: I did not
know whether the US version or the GB version should be the default. If
we default to British or American English, we should, IMHO,
W dniu 2012-06-05 23:01, Daniel Naber pisze:
On Montag, 4. Juni 2012, Jaume Ortolà i Font wrote:
I don't fully understand the Unifier.java code. The relation between
negated and non-negated unification seems straightforward, but the
results are bizarre. Any help?
I cannot directly help, but
W dniu 2012-06-07 20:08, Piotr pisze:
Thanks Marcin. Can I have both versions of Java on the same computer?
After all I will point LO to just one of them.
Yes, you can.
Regards,
Marcin
On Thu, Jun 7, 2012 at 7:55 PM, Marcin Miłkowski list-addr...@wp.pl
mailto:list-addr...@wp.pl wrote
W dniu 2012-06-13 08:29, Juan Martorell pisze:
When debugging in command line mode, if the sentence matches a rule then
you get no disambiguator log. Is it easy to change that behaviour?
You should get it anyway, below POS tags. Are you using -v?
Regards,
Marcin
Hi all,
we need to make configuration simpler. While I like Daniel's idea to
suppress the config dialog altogether and display only disabled rules at
the bottom of the dialog box, I think we could have something better.
Now, if you look at most word processors, they offer a simplified view
of
Hi all,
I performed some further tests on Hunspell Rule.
First, I disabled suggestions for Polish. This gave a huge speedup: from
56 sentences per second, I got around 770 sentences per second.
Then, I tested English (American Dictionary). With suggestions, I got 71
sentences per second,
Hi all,
I performed some further tests on Hunspell Rule.
First, I disabled suggestions for Polish. This gave a huge speedup: from
56 sentences per second, I got around 770 sentences per second.
Then, I tested English (American Dictionary). With suggestions, I got 71
sentences per second,
Somehow it didn't get into the list already two times in a row, I'm
resending it...
Hi all,
I performed some further tests on Hunspell Rule.
First, I disabled suggestions for Polish. This gave a huge speedup: from
56 sentences per second, I got around 770 sentences per second.
Then, I
W dniu 2012-06-14 17:19, Daniel Naber pisze:
On Donnerstag, 14. Juni 2012, Marcin Miłkowski wrote:
we need to make configuration simpler. While I like Daniel's idea to
suppress the config dialog altogether and display only disabled rules
at the bottom of the dialog box, I think we could have
W dniu 2012-06-12 22:16, Daniel Naber pisze:
Hi,
we don't need the hunspell rule in OO/LO and it doesn't seem to be active -
so that's okay. Is it only inactive because the language is detected as
English as opposed to American English etc? Wouldn't it be more robust
to explicitly disable
W dniu 2012-06-15 09:31, Ruud Baars pisze:
marcin,
I am assuming Hunspell is only called when a word is 'UNKNOWN', right?
If not, that might be a change worth trying.
No. It's called every time. The problem is that for many languages, the
taggers do contain rare forms.
Note that the
Hi,
when compiling the dictionary using morfologik-tools, you simply use -f
cfsa2 (just add it to the fsa_build command call).
I will update the wiki soon.
Regards,
Marcin
W dniu 2012-06-21 07:34, Dominique Pellé pisze:
Hi
I see that some dictionaries have been re-encoded
using cfsa2 to
Hi all,
considering that hunspell is so slow and that for many languages, we
already have wordlists (for example, there are wordlists for all
variants of English, and all aspell dictionaries may be converted to
word lists), we might replace all hunspell dictionaries with
morfologik-speller
Ruud,
what about aspell-nl? Is that the same? Or bigger?
Marcin
W dniu 2012-06-22 12:34, R.J. Baars pisze:
There is a words list for Dutch at www.opentaal.org. Relatively small,
just 450.000 words, including proper names, but it is the best open one
there is.
To be exact:
. It works quite fast here.
Marcin
W dniu 2012-06-22 17:18, Jan Schreiber pisze:
We could try my wordlist (1.3 million word forms) for German if that
speeds up the process:
http://sourceforge.net/projects/germandict/
--Jan
Am 22.06.2012 11:09, schrieb Marcin Miłkowski:
Hi all
OK, got it. Now, the file contains numerous .txt files and one .csv
file. I understand that the complete speller dictionary should contain
the words from all .txt files?
And - to be sure - do you want me to replace hunspell for NL with the
wordlist-based dictionary? I'm not sure if we have any
W dniu 2012-06-22 22:43, Daniel Naber pisze:
On Freitag, 22. Juni 2012, Dominique Pellé wrote:
One way to speed this up is to cache the last N misspelled
words and their suggestions. So when we encounter the same
misspelled word again, we can quickly find in cache its
suggestions without
it for German, it's irrelevant anyway
Regards,
Marcin
2012/6/23 Daniel Naber list2...@danielnaber.de
On Samstag, 23. Juni 2012, Marcin Miłkowski wrote:
Sounds like a bug to me. Do you get any suggestion for, say, ser gutt?
I got some. M.
I see now: it only works for words with one error
the relative word frequency might help is suggesting the words in
the correct order.
Yes, but for this, I'd need a frequency list. A wikipedia-based one
might be skewed because of the high level of occurrence of proper names.
Marcin
Ruud
Op 23-06-12 12:42, Marcin Miłkowski schreef:
Yes
W dniu 2012-06-23 20:51, Jan Schreiber pisze:
I'm fine with that, if it works fast enough. Hunspell is no doubt the
best free spell checker around for German.
Just a thought - why not have hunspell for checking and
MorfologikSpeller (with edit distance = 2) for suggestions? It's a bit
of a
W dniu 2012-06-24 10:49, Daniel Naber pisze:
On Sonntag, 24. Juni 2012, Marcin Miłkowski wrote:
Just a thought - why not have hunspell for checking and
MorfologikSpeller (with edit distance = 2) for suggestions?
Thought about this too. What would the MorfologikSpeller then use
W dniu 2012-06-24 13:59, Daniel Naber pisze:
On Sonntag, 24. Juni 2012, Marcin Miłkowski wrote:
Okay. For me, having no suggestions for some languages seems okay for
1.8, but if you want to do that, go ahead.
You mean without the trick to do the compounding suggestions as well? Or
including
W dniu 2012-06-24 13:59, Daniel Naber pisze:
On Sonntag, 24. Juni 2012, Marcin Miłkowski wrote:
Okay. For me, having no suggestions for some languages seems okay for
1.8, but if you want to do that, go ahead.
You mean without the trick to do the compounding suggestions as well? Or
including
Hi all,
shouldn't we make also SimpleReplaceRule a spelling rule and display its
results underlined in red?
Regards,
Marcin
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's
W dniu 2012-06-30 22:01, Daniel Naber pisze:
On Samstag, 30. Juni 2012, Marcin Miłkowski wrote:
it works for me under linux when I move
jar href=hunspell-linux-i386.jar/
Works for me, too - please everybody test whether the Start LanguageTool
link on http://www.languagetool.org works
tokenization. You suggested that we use it for
word tokenization.
I understand there are consequences all over the tool range.
No, not at all.
Marcin
Ruud
Op 01-07-12 18:59, Marcin Miłkowski schreef:
Hi Dominique, and all,
I have started conversion of the Breton speller to the MorfologikSpeller
W dniu 2012-07-01 22:53, Dominique Pellé pisze:
Marcin Miłkowski list-addr...@wp.pl wrote:
Hi Dominique, and all,
I have started conversion of the Breton speller to the MorfologikSpeller
format, and thanks to the new support of UTF-8, it was successful. But
there is one outstanding issue
Hi Nathan,
W dniu 2012-07-04 14:47, Nathan Wells pisze:
LibreOffice LibO Master 3.6 incorperates automatic linebreaking for the
Khmer language using the latest version of ICU (see:
http://bugs.icu-project.org/trac/ticket/8329)
I'm confused. Do you mean *line* breaking or *word* breaking?
W dniu 2012-08-10 21:34, Daniel Naber pisze:
Hi,
a bug report[1] complains about incorrect position information and
concludes that it's caused by filtering XML. Does anybody know why we should
filter XML? I tend to agree with the report that this filtering should happen
outside of
Hello,
W dniu 2012-08-13 12:17, Mike Unwalla pisze:
Hello,
Examples of characters that cause tokenization: space . ! { } [ /
Examples of characters that do not cause tokenization: # $ % ^ _ + = * @ ~
I looked on languagetool.wikidot.com and on http://www.languagetool.org/, but
I did
W dniu 2012-08-13 20:57, Daniel Naber pisze:
Am Mo 13.08.2012, 20:07:26 schrieb Dominique Pellé:
I would also split words with at least the backticks ` and pipe |.
I don't really disadvantages in not splitting at those characters.
I don't see a problem with that either, so feel free to make
W dniu 2012-09-04 23:31, Daniel Naber pisze:
On 04.09.2012, 10:38:55 Jaume Ortolà i Font wrote:
Hi Jaume,
After the path changes, I made a clean checkout. I'm using Windows 7 and
Eclipse and I have found some problems:
thanks, both issues should be fixed now. Please let me know if there's
W dniu 2012-09-17 22:22, Daniel Naber pisze:
Hi,
I'm glad to announce that we have three very nice proposals for a new
LanguageTool logo!
See them at
http://www.danielnaber.de/tmp/lt-logo-proposals.png
Note: please do not spread or use these logos yet: as long as we haven't
made a
W dniu 2012-09-18 23:06, Daniel Naber pisze:
On 18.09.2012, 20:05:24 gulp21 wrote:
Proposal #1 reminds me of a map api. Proposal #2B is a bit overcrowded,
thus I think that proposal #2A is the best one. But I also agree with
Nathan that it looks more like a logo for a translation tool.
As
Hi,
W dniu 2012-09-30 20:00, Mauro Condarelli pisze:
Hi, thanks.
Comments below.
On 30/09/2012 11:52, Marcin Miłkowski wrote:
W dniu 2012-09-30 05:05, Mauro Condarelli pisze:
Hi,
I'm coding my test app with LT and I'm almost ready to become productive.
I need a deeper insight in two
W dniu 2012-10-01 00:09, Daniel Naber pisze:
On 30.09.2012, 11:31:54 Marcin MiÅkowski wrote:
I'm not sure about morfologik* libraries. There might be an unreleasedÂ
version in our code (I fixed a bug with UTF-8 but 1.5.4 was not releasedÂ
yet).
Could we release the Maven version of LT with
W dniu 2012-10-02 12:56, Mauro Condarelli pisze:
On domenica 30 settembre 2012 20:30:17, Marcin Miłkowski wrote:
I would also like to know if it's possible to add further information to
the words (e.g.: pointers to synonyms and antinonyms).
This is not part-of-speech information, so it does
W dniu 2012-10-22 21:11, Dominique Pellé pisze:
Daniel Naber list2...@danielnaber.de wrote:
On 16.10.2012, 16:52:26 Ruud Baars wrote:
By the way, I read the instruction in the wiki, but these are quite
complex (for me).
I think you can ignore everything not related to Java - both exporting
W dniu 2012-11-15 22:58, Daniel Naber pisze:
Hi,
the load test I did for our HTTPS service showed that we have kind of a
performance problem for languages that have a lot of rules. Testing a
random German text with 125 sentences takes 2 seconds on my machine at
average. About half of that
W dniu 2012-11-16 09:28, R.J. Baars pisze:
Hi,
the load test I did for our HTTPS service showed that we have kind of a
performance problem for languages that have a lot of rules. Testing a
random German text with 125 sentences takes 2 seconds on my machine at
average. About half of that time
W dniu 2012-11-16 10:10, R.J. Baars pisze:
Hi,
the load test I did for our HTTPS service showed that we have kind of a
performance problem for languages that have a lot of rules. Testing a
random German text with 125 sentences takes 2 seconds on my machine at
average. About half of that time
W dniu 2012-11-16 16:17, Dominique Pellé pisze:
Marcin Miłkowski list-addr...@wp.pl wrote:
Hm, just a stupid question, but did you remember that loading rules is
also quite expensive? We have to parse XML and compile regexes, and read
from disk.
I'm hoping that XML parsing and loading other
W dniu 2012-11-16 23:51, Daniel Naber pisze:
On 16.11.2012, 15:24:16 Marcin Miłkowski wrote:
Hm, just a stupid question, but did you remember that loading rules is
also quite expensive? We have to parse XML and compile regexes, and read
from disk.
Loading rules takes about 130ms once
W dniu 2012-11-17 23:03, Daniel Naber pisze:
On 17.11.2012, 17:17:56 Marcin Miłkowski wrote:
Do you have stats per call of the method? It may be more informative.
See attached file, is that what you mean? It's after checking three or so
German documents.
This is what I mean.
The method
W dniu 2012-11-19 01:12, Dominique Pellé pisze:
Marcin Miłkowski list-addr...@wp.pl wrote:
W dniu 2012-11-18 22:48, Daniel Naber pisze:
On 18.11.2012, 21:33:32 Marcin Miłkowski wrote:
The easiest way to see whether this could speed things up at all is
change the line 162
W dniu 2012-11-19 11:39, Dominique Pellé pisze:
Marcin Miłkowski wrote:
In Breton at least, I experience sometimes a combinatorial
explosion or rules in order to implement what I want. Of
course many rules probably slow down.
I don't think that having extra 16 rules changes
W dniu 2012-11-19 18:35, Daniel Naber pisze:
On 19.11.2012, 10:39:33 Marcin Miłkowski wrote:
Yeah, but to reuse the pattern, we would have to build a finite-state
machine in memory (or on disk) first. This is far from trivial because
we would have to flatly encode all features (token, pos
W dniu 2012-11-21 23:28, Dominique Pellé pisze:
Hi
I see that my script to produce the French POS tag dictionary
still uses morfologik 1.4.0. I just tried to update it to use
morfologik 1.5.2.
The script creates the dictionary french.dict without error, but the created
dictionary with
W dniu 2012-11-22 08:14, Dominique Pellé pisze:
Jan Schreiberwrote:
Ruud Baars wrote:
Suppose there is a slow/expensive rule, that almost never matches
to warn
about some low-priority issue; maybe it is worth switching it off
then.
That looks like a good idea
1 - 100 of 402 matches
Mail list logo