Hi Stephan, If you are referring to the analysis-nori package, I made an attempt to port it here: https://github.com/NightOwl888/lucenenet/commits/feature/analysis-nori
However, there were a couple of issues with it. 1. It isn't clear whether System.Decimal is a close enough match to java.math.BigDecimal to fully support all of the functionality 2. It has dependencies on org.apache.lucene.util.fst, which had been redesigned in Lucene 6.x The latter proved to make it difficult to get all the tests to pass because at a low level the numbers being returned from FST didn’t correspond to Lucene, so there is no way to tell where the paths diverge that cause the test failures. Our current plan is to leave this on the table until we get to at least Lucene 6.x, but we really didn't expect there to be much demand for it. If you are interested in completing the port and contributing it back to the Lucene.NET project, I suspect that porting the org.apache.lucene.util.fst namespace from the corresponding version and putting it into the Lucene.Net.Analysis.Nori project would be the best option for getting it working, but it has to be done in such a way not to cause conflicts with the existing FST namespace. I believe this was ported from 8.2.0 based on the date it was done (unfortunately, this seems to be missing the documentation headers of which version it was ported from), but this will require some analysis of the history of the analyzer to see how much it actually changed over time, as it may not make any difference. Reviewing the history of Lucene in Git may help you understand where (and if) Lucene diverges from Solr for the Nori analyzer. Generally speaking, the behavior of analyzers doesn't change over time unless a bug is found or some update to the language of the analyzer needs to be accounted for. Lucene had a major design change in version 4.x where it grew in size by a factor of 10, which is why porting it has been ongoing from September 2014 until present. There is a plan to do the upgrade to the latest version of Lucene after the 4.8.0 stable release is done. Anyway, let me know if you are interested in doing the work to port this. The branch I pointed to is very stale and will require several updates to the project structure to integrate it with the current state of the repository. Thanks, Shad Storhaug (NightOwl888) Project Chairperson - Apache Lucene.NET -----Original Message----- From: Stephen Lewis Bianamara <stephen.bianam...@gmail.com> Sent: Tuesday, January 4, 2022 2:43 AM To: user@lucenenet.apache.org Subject: Re: Korean analyzer? Thanks Ron, that clarifies things. I hadn't realized the Lucenenet version was a reflection of the Lucene version it was porting. To that end, I'm wondering about the overall trajectory of Lucenenet. It looks to me as a newbie to this project that development was not very active for a while but is picking up with the 4.8.0 port. Does it look like things will be active more generally now? To say more about why I'm asking -- I'm looking to leverage Lucenet for an application, perhaps with a somewhat nonstandard requirement. My application using lucenenet would be used as a search service for certain "small data" queries. In general there are also "big data" queries coming from a solr engine, and the two should behave the same on "similar" data. The solr engine is regularly kept up to date and so I'm concerned about having a fork in the versions between an older version of lucene and the latest solr in the two realms. This is also why the dedicated KoreanAnalyzer would be preferred, as this is what is leveraged within the solr arm of the appilcation. Does it seem to you that the lucenenet project is likely to "catch up" to the latest lucene versions and then keep up with them in general (with some latency of course)? Thanks, Stephen On Mon, Jan 3, 2022 at 11:04 AM Ron Clabo <roncl...@giftoasis.com> wrote: > LuceneNET 4.8 Beta contains a CJKAnalyzer that is for use with > Chinese, Japanese and Korean. You can find docs here: > https://lucenenet.apache.org/docs/4.8.0-beta00015/api/analysis-common/ > Lucene.Net.Analysis.Cjk.html > and source code here: > https://github.com/apache/lucenenet/blob/Lucene.Net_4_8_0_beta00015/sr > c/Lucene.Net.Analysis.Common/Analysis/Cjk/CJKAnalyzer.cs > > I see that there is a dedicated Korean Analyzer called KoreanAnalyzer > that was added to the Java Lucene version 8.8.1 but that analyzer is > of course not available in LuceneNET 4.8 as it didn't exist in the 4.8 > version of Java Lucene. You can find the java source code for that analyzer > here: > https://github.com/apache/lucene/commit/e851b89cbeb1f55edc0f2c1276e2ae > 812eca2643#diff-8c2c6507d8d4e26c1399cdafebd1580820a298d2fa078aa0f952fd > b3dc22a537 > > > It might be possible to port that analyzer to Lucene 4.8 if for some > reason the CJKAnalyzer isn't good enough for your intended use but > you'd need to do that port yourself. > > -Ron > > > -----Original Message----- > From: Stephen Lewis Bianamara [mailto:stephen.bianam...@gmail.com] > Sent: Monday, January 3, 2022 1:31 PM > To: user@lucenenet.apache.org > Subject: Korean analyzer? > > Hi LuceneNet Community, > > I've been experimenting with the Lucene.NET 4.8.0 betas and they are > working great for me. One question I have is whether there is a plan > to have the korean analyzer implemented? I couldn't find it and I > didn't find it when I searched open issues > <https://github.com/apache/lucenenet/issues?q=is%3Aissue+is%3Aopen+kor > ean > >. > > Thanks! > Stephen > > > >