I am currently checking the output of all the rules on the 20GB corpus;
some rules are perfect, some less (though hard to tweak).
Result will be a major update, I guess...
Ruud
> On 2014-08-18 17:18, R.J. Baars wrote:
>
>> I was able to test, and removed 2 of my additions to make it work.
>
> Th
On 2014-08-18 17:18, R.J. Baars wrote:
> I was able to test, and removed 2 of my additions to make it work.
Thanks, I have committed it. It will be part of the daily builds
tonight:
https://languagetool.org/download/snapshots/?C=M;O=D
Regards
Daniel
-
On 2014-08-18 16:56, R.J. Baars wrote:
> But how can I test it if it is not in the runtime version?
It's inside libs/languagetool-core.jar, which is just a ZIP file. Unzip
it, edit segment.srx, re-zip it and test it.
Regards
Daniel
---
I am not qualified to edit sources. Just no programmer.
Unfortunately, the srx is not separate per languages too.
I found the source on Github (which I don't really understand) so I will
be able to adjust, and send it to you.
But how can I test it if it is not in the runtime version?
Ruud
On 2014-08-18 16:16, R.J. Baars wrote:
> There is an adjustment to make in the sentence splitter. But where did
> the
> .srx go?
It's at
languagetool-core/src/main/resources/org/languagetool/resource/segment.srx
> Could this be added to the Dutch srx rules?
Sure, could
Same applies to [0-9]{1,2}[-]pers.
Ruud
> There is an adjustment to make in the sentence splitter. But where did the
> .srx go?
>
> I detected an abbreviation that is commonly used and as for now seen as
> sentence end:
>
> milj.
>
> Could this be added to th
There is an adjustment to make in the sentence splitter. But where did the
.srx go?
I detected an abbreviation that is commonly used and as for now seen as
sentence end:
milj.
Could this be added to the Dutch srx rules?
Ruud
On 2014-04-12 11:45, Marcin Miłkowski wrote:
> Of course, we could make it possible to use another .srx file but then
> a
> new language module would be incompatible with others, and more work
> would be needed to integrate it. Do we want it?
There's now a new class LocalSRX
W dniu 2014-04-12 09:55, Daniel Naber pisze:
> On 2014-04-12 09:34, Marcin Miłkowski wrote:
>
>> SRX file can be easily edited and we will happily accept all patches,
>> also for languages without complete support in LT. Where's the problem?
>
> Today, you can extend
On 2014-04-12 09:34, Marcin Miłkowski wrote:
> SRX file can be easily edited and we will happily accept all patches,
> also for languages without complete support in LT. Where's the problem?
Today, you can extend the Language class and have a Regex-based
tokenizer with your
W dniu 2014-04-11 22:16, Daniel Naber pisze:
> Hi,
>
> the following languages have been switched to use an SRX-based sentence
> tokenizer so we use the same approach for all languages and not a
> mixture of different methods:
>
> Asturian, Italian, Lithuanian, Malayalam, S
Hi,
the following languages have been switched to use an SRX-based sentence
tokenizer so we use the same approach for all languages and not a
mixture of different methods:
Asturian, Italian, Lithuanian, Malayalam, Swedish, Tagalog
I don't speak these languages so I cannot properly tes
Most srx-compliant software uses a single file for all languages, AFAIK.
Regards, Marcin
02-05-2013 09:08 użytkownik "Daniel Naber"
napisał:
> On 01.05.2013, 12:18:41 Andriy Rysin wrote:
>
> > P.S. BTW would not it make sense to split segement.srx by language
> > m
On 01.05.2013, 12:18:41 Andriy Rysin wrote:
> P.S. BTW would not it make sense to split segement.srx by language
> modules?
Absolutely. This isn't very high on my personal TODO list though, so any
help/patches are welcome.
Regards
Daniel
--
http://www.danielnaber.de
---
Thanks, that helped!
Andriy
On 05/01/2013 02:54 PM, Piotr wrote:
Maybe the part after the \. should be in the afterbreak element?
Regards,
Piotr
On Wed, May 1, 2013 at 6:18 PM, Andriy Rysin <mailto:ary...@gmail.com>> wrote:
Hi all
I need a bit help with srx sentence
Maybe the part after the \. should be in the afterbreak element?
Regards,
Piotr
On Wed, May 1, 2013 at 6:18 PM, Andriy Rysin wrote:
> Hi all
>
> I need a bit help with srx sentence tokenizer, I've added this rule to
> prevent sentence split on Name abbreviation+Surname,
Hi all
I need a bit help with srx sentence tokenizer, I've added this rule to
prevent sentence split on Name abbreviation+Surname, e.g. "Т.Шевченко"
which is often met in texts.
The rule will need to be a bit more complex but I am trying something
simple first.
\b[А-ЯІЇЄҐ]\.[А-Я
17 matches
Mail list logo