[ANNOUNCE] Apache Lucene 9.11.1 released

2024-06-27 Thread Ignacio Vera
The Lucene PMC is pleased to announce the release of Apache Lucene 9.11.1.

Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for
nearly any application that requires full-text search, especially
cross-platform.

This patch release contains bug fixes that are highlighted below. The
release is available for immediate download at:

https://lucene.apache.org/core/downloads.html

Lucene 9.11.1 Release Highlights

Bug fixes

* Fix performance regression in NumericComparator.
* Remove intra-merge parallelism for everything except HNSW graph merges.
* Fix bug that prevented adding a parent field to an index with no fields.
* Fix IndexOutOfBoundsException thrown in DefaultPassageFormatter by
unordered matches.
* StringValueFacetCounts stops throwing NPE when faceting over an
empty match-set.

Further details of changes are available in the change log available
at: http://lucene.apache.org/core/9_11_1/changes/Changes.html.

Please report any feedback to the mailing lists
(http://lucene.apache.org/core/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring
network for distributing releases. It is possible that the mirror you
are using may not have replicated the release yet. If that is the
case, please try another mirror. This also applies to Maven access.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



[ANNOUNCE] Apache Lucene 9.4.1 released

2022-10-24 Thread Ignacio Vera
The Lucene PMC is pleased to announce the release of Apache Lucene 9.4.1.


Apache Lucene is a high-performance, full-featured search engine library
written entirely in Java. It is a technology suitable for nearly any
application that requires structured search, full-text search, faceting,
nearest-neighbor search across high-dimensionality vectors, spell
correction or query suggestions.


This release contains numerous bug fixes, optimizations, and improvements,
some of which are highlighted below. The release is available for immediate
download at:


  


### Lucene 9.4.1 Release Highlights:


 * When reading large segments, the kNN vectors format could fail with a
validation error, preventing further writes or searches on the index. This
bug is now fixed. Only version 9.4.0 was affected, so it is recommended to
skip 9.4.0 if you are using kNN vectors.


Please read CHANGES.txt for a full list of changes:


  



Note: The Apache Software Foundation uses an extensive mirroring network for

distributing releases. It is possible that the mirror you are using may not
have

replicated the release yet. If that is the case, please try another mirror.

This also applies to Maven access.


[ANNOUNCE] Apache Lucene 9.3.0 released

2022-07-29 Thread Ignacio Vera
29 July 2022 - Apache Lucene™ 9.3.0 available

The Lucene PMC is pleased to announce the release of Apache Lucene 9.3.0.

Apache Lucene is a high-performance, full-featured search engine library
written entirely in Java. It is a technology suitable for nearly any
application that requires structured search, full-text search, faceting,
nearest-neighbor search across high-dimensionality vectors, spell
correction or query suggestions.

This release contains numerous bug fixes, optimizations, and improvements,
some of which are highlighted below. The release is available for immediate
download at:
https://lucene.apache.org/core/downloads.html

Lucene 9.3.0 Release Highlights:
• Merge on full flush is enabled now by default with a timeout of
500ms.
• Add getAllChildren functionality to facets.
• Added facetsets module for high dimensional (hyper-rectangle)
faceting.
• Top-level two-clause disjunctions sorted by score now use the
block-max MAXSCORE algorithm.
• When running KnnVectorQuery with a filter, reuse the cached
filter bit set.

Please read CHANGES.txt for a full list of new features and changes:

https://lucene.apache.org/core/9_3_0/changes/Changes.html

Note: The Apache Software Foundation uses an extensive mirroring network for
distributing releases. It is possible that the mirror you are using may not
have
replicated the release yet. If that is the case, please try another mirror.
This also applies to Maven access.


Re: Slower document retrieval in 8.7.0 comparing to 7.5.0

2021-01-23 Thread Ignacio Vera
Hi!

This slowdown is expected, see LUCENE-9477
 & LUCENE-9486
.The trade-off here is
index size vs fetch time, we have introduced a more aggressive compression
strategy for stored fields with the cost of a small increase in fetch
times. In your example, you can see that the index size has been reduced
around 20%.

If your workflow depends on those fetch times, you can always override the
stored field format through a filter codec and add your custom
compression parameters?

Cheers,

Ignacio




On Sat, Jan 23, 2021 at 8:36 AM Rob Audenaerde 
wrote:

> I did some testing for you :)
>
> I modified your code to run in a JMH benchmark; and changed the number of
> retrieved docs to 1000 out of 1M in the index. This is what I got:
>
> Lucene 7.5
> Benchmark Mode  Cnt   Score   Error  Units
> DocRetrievalBenchmark.retrieveDocuments  thrpt4  37.147 ± 6.218  ops/s
>
> Lucene 8.7
> Benchmark Mode  Cnt   Score   Error  Units
> DocRetrievalBenchmark.retrieveDocuments  thrpt4  18.680 ± 5.755  ops/s
>
> This is much in line with your observations, (lucene 8.7 seems almost twice
> as slow) so something is going on when running out-of-the-box.
>
> The code can be found : (not really beautiful, but gets the job done. If
> you want to switch lucene-versions, edit the pom and make sure to set the
> proper index version)
> https://gist.github.com/d2a-raudenaerde/93a490e5b0d17b2fa88862473429aeb3
>
> JMH details:
> # JMH version: 1.21
> # VM version: JDK 11.0.9.1, OpenJDK 64-Bit Server VM,
> 11.0.9.1+1-Ubuntu-0ubuntu1.20.04
> # VM invoker: /usr/lib/jvm/java-11-openjdk-amd64/bin/java
> # VM options: -Xms2G -Xmx2G
> # Warmup: 2 iterations, 10 s each
> # Measurement: 4 iterations, 10 s each
> # Timeout: 10 min per iteration
> # Threads: 1 thread, will synchronize iterations
> # Benchmark mode: Throughput, ops/time
> # Benchmark: org.audenaerde.lucene.DocRetrievalBenchmark.retrieveDocuments
>
>
> On Fri, Jan 22, 2021 at 4:22 PM Martynas L  wrote:
>
> > Just played with my reading sample. I do not have a goal to show the
> exact
> > numbers, but it is a fact that document retrieval IndexSearcher.doc(int)
> is
> > much slower.
> > All our performance tests showed performance degradation after changing
> to
> > 8.7.0, even without measurement we can "see/feel" the operations
> involving
> > documents retrieval became slower.
> >
> >
> >
> > On Fri, Jan 22, 2021 at 4:48 PM Rob Audenaerde  >
> > wrote:
> >
> > > Hi Martynas
> > >
> > > How did you measure that?
> > >
> > > I ask, because writing a good benchmark is not an easy task,  since
> there
> > > are so many factors (class loading times, JIT effects, etc). You should
> > use
> > > Java Microbenchmark Harness or similar; and set up a random document
> > > retrieval task, with warm-up etc.etc.
> > >
> > > (I'm not aware of any big slowdowns, but as you see them, the best way
> is
> > > to build a robust benchmark and then start comparing)
> > >
> > > -Rob
> > >
> > >
> > > On Fri, Jan 22, 2021 at 3:43 PM Martynas L 
> > wrote:
> > >
> > > > Even retrieving single document 8.7.0 is more than x2 slower
> > > >
> > > > On Fri, Jan 22, 2021 at 2:28 PM Diego Ceccarelli (BLOOMBERG/ LONDON)
> <
> > > > dceccarel...@bloomberg.net> wrote:
> > > >
> > > > > >  I think it will be similar ratio retrieving any number of
> > documents.
> > > > >
> > > > > I'm not sure this is true, if you retrieve a huge amount of
> documents
> > > you
> > > > > might cause troubles to the GC.
> > > > >
> > > > > From: java-user@lucene.apache.org At: 01/22/21 12:11:19To:
> > > > > java-user@lucene.apache.org
> > > > > Subject: Re: Slower document retrieval in 8.7.0 comparing to 7.5.0
> > > > >
> > > > > The accent should not be on retrieved documents number, but on the
> > > > duration
> > > > > ratio - 8.7.0 is 3 times slower. I think it will be similar ratio
> > > > > retrieving any number of documents.
> > > > >
> > > > > On Fri, Jan 22, 2021 at 1:39 PM Rob Audenaerde <
> > > rob.audenae...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Martrynas,
> > > > > >
> > > > > > In your sample code you are retrieving all (1 million!) documents
> > > from
> > > > > the
> > > > > > index, that surely is not a good match for lucene  :)
> > > > > >
> > > > > > Is that a good reflection of your use-case?
> > > > > >
> > > > > > On Fri, Jan 22, 2021 at 9:52 AM Martynas L <
> martynas@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > >  Please see the sample at
> > > > > > >
> > > > >
> > >
> https://drive.google.com/drive/folders/1ufVZXzkugBAFnuy8HLAY6mbPWzjknrfE
> > > > > > >
> > > > > > > IndexGenerator - creates a dummy index.
> > > > > > > IndexReader - retrieves documents - duration time with 7.5.0
> > > version
> > > > is
> > > > > > > ~2s, while ~6s with 8.7.0
> > > > > > >
> > > > > > > Regards,
> > > > > > > Martynas

Re: Tessellator failure for a certain polygon

2020-10-20 Thread Ignacio Vera
hello,

Yes, indeed that looks like a bug, please create JIRA issue so we can track
it.

Thank you!

On Mon, Oct 19, 2020 at 9:07 PM Yuri Vishnevsky 
wrote:

> Hello,
>
> This bug was discovered while using ElasticSearch (checked with versions
> 7.6.2 and 7.9.2).
> But I've created an isolated test case just for Lucene:
> https://github.com/apache/lucene-solr/pull/2006/files
>
> The unit test fails with "java.lang.IllegalArgumentException: Unable to
> Tessellate shape".
>
> The polygon contains two holes that share the same vertex and one more
> standalone hole.
> Removing any of them makes the unit test pass.
>
> Changing the least significant digit in any coordinate of the "common
> vertex" in any of two first holes, so that these vertices become different
> in each hole - also makes unit test pass.
>
> This looks like a bug to me, so should I create an issue for this in JIRA?
>
> Regards,
> Yuri.
>


Re: Indexing & Searching Geometries ( MultiLine & MultiPolygon )

2020-10-02 Thread Ignacio Vera
Hello!

Let's consider polygons. I imagine you are doing something like this to
index one polygon:

Polygon polygon = 
Document document = new Document();
Field[] fields = LatLonShape.createIndexableFields(FIELDNAME, polygon);
for (Field f : fields) {
  document.add(f);
}

So a multipolygon is just an array of polygons, so you can easily do:

Polygon[] multiPolygon = 
Document document = new Document();
for (Polygon polygon: multiPolygon) {
  Field[] fields = LatLonShape.createIndexableFields(FIELDNAME, polygon);
  for (Field f : fields) {
document.add(f);
  }
}

Hope it makes sense,

I.

On Fri, Oct 2, 2020 at 1:48 PM thturk  wrote:

> Hello Forum,
>
> I want to create a Reverse Geocode service for data i have in this data set
> there is Points, Lines, MultiLines, Polygons and MultiPolygons  some how i
> was able to Index Point, Line And Poligon And created search queries for
> those indexes .But  i can not understand how i will index Other Geometry
> Types is there any documents or code examples for  Lucene Spatial Indexing
> I
> have seen Component2D but There is only InMemeory search as i understand.
>
> Lucene 8.6.0
> Jdk 12
>
>
>
> --
> Sent from:
> https://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: [VOTE] Lucene logo contest, here we go again

2020-09-01 Thread Ignacio Vera
here is my vote:

A1, A2 (binding)

On Tue, Sep 1, 2020 at 1:49 PM Aurélien MAZOYER 
wrote:

> Hi community,
>
> I would vote for A1.
>
> Aurélien
>
>
>
> On Tue, 1 Sep 2020 at 13:26, Andrzej Białecki  wrote:
>
> > A1, D (binding)
> >
> > > On 1 Sep 2020, at 02:26, Ryan Ernst  wrote:
> > >
> > > Dear Lucene and Solr developers!
> > >
> > > In February a contest was started to design a new logo for Lucene
> > > [jira-issue]. The initial attempt [first-vote] to call a vote resulted
> in
> > > some confusion on the rules, as well the request for one additional
> > > submission. I would like to call a new vote, now with more explicit
> > > instructions on how to vote.
> > >
> > > *Please read the following rules carefully* before submitting your
> vote.
> > >
> > > *Who can vote?*
> > >
> > > Anyone is welcome to cast a vote in support of their favorite
> > > submission(s). Note that only PMC member's votes are binding. If you
> are
> > a
> > > PMC member, please indicate with your vote that the vote is binding, to
> > > ease collection of votes. In tallying the votes, I will attempt to
> verify
> > > only those marked as binding.
> > >
> > >
> > > *How do I vote?*
> > > Votes can be cast simply by replying to this email. It is a
> ranked-choice
> > > vote [rank-choice-voting]. Multiple selections may be made, where the
> > order
> > > of preference must be specified. If an entry gets more than half the
> > votes,
> > > it is the winner. Otherwise, the entry with the lowest number of votes
> is
> > > removed, and the votes are retallied, taking into account the next
> > > preferred entry for those whose first entry was removed. This process
> > > repeats until there is a winner.
> > >
> > > The entries are broken up by variants, since some entries have multiple
> > > color or style variations. The entry identifiers are first a capital
> > > letter, followed by a variation id (described with each entry below),
> if
> > > applicable. As an example, if you prefer variant 1 of entry A, followed
> > by
> > > variant 2 of entry A, variant 3 of entry C, entry D, and lastly variant
> > 4e
> > > of entry B, the following should be in your reply:
> > >
> > > (binding)
> > > vote: A1, A2, C3, D, B4e
> > >
> > > *Entries*
> > >
> > > The entries are as follows:
> > >
> > > A*.* Submitted by Dustin Haver. This entry has two variants, A1 and A2.
> > >
> > > [A1]
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12999548/Screen%20Shot%202020-04-10%20at%208.29.32%20AM.png
> > > [A2]
> > >
> https://issues.apache.org/jira/secure/attachment/12997172/LuceneLogo.png
> > >
> > > B. Submitted by Stamatis Zampetakis. This has several variants. Within
> > the
> > > linked entry there are 7 patterns and 7 color palettes. Any vote for B
> > > should contain the pattern number followed by the lowercase letter of
> the
> > > color palette. For example, B3e or B1a.
> > >
> > > [B]
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12997768/zabetak-1-7.pdf
> > >
> > > C. Submitted by Baris Kazar. This entry has 8 variants.
> > >
> > > [C1]
> > >
> >
> https://issues.apache.org/jira/secure/attachment/13006392/lucene_logo1_full.pdf
> > > [C2]
> > >
> >
> https://issues.apache.org/jira/secure/attachment/13006392/lucene_logo2_full.pdf
> > > [C3]
> > >
> >
> https://issues.apache.org/jira/secure/attachment/13006392/lucene_logo3_full.pdf
> > > [C4]
> > >
> >
> https://issues.apache.org/jira/secure/attachment/13006392/lucene_logo4_full.pdf
> > > [C5]
> > >
> >
> https://issues.apache.org/jira/secure/attachment/13006392/lucene_logo5_full.pdf
> > > [C6]
> > >
> >
> https://issues.apache.org/jira/secure/attachment/13006392/lucene_logo6_full.pdf
> > > [C7]
> > >
> >
> https://issues.apache.org/jira/secure/attachment/13006392/lucene_logo7_full.pdf
> > > [C8]
> > >
> >
> https://issues.apache.org/jira/secure/attachment/13006392/lucene_logo8_full.pdf
> > >
> > > D. The current Lucene logo.
> > >
> > > [D]
> > https://lucene.apache.org/theme/images/lucene/lucene_logo_green_300.png
> > >
> > > Please vote for one of the above choices. This vote will close one week
> > > from today, Mon, Sept 7, 2020 at 11:59PM.
> > >
> > > Thanks!
> > >
> > > [jira-issue] https://issues.apache.org/jira/browse/LUCENE-9221
> > > [first-vote]
> > >
> >
> http://mail-archives.apache.org/mod_mbox/lucene-dev/202006.mbox/%3cCA+DiXd74Mz4H6o9SmUNLUuHQc6Q1-9mzUR7xfxR03ntGwo=d...@mail.gmail.com%3e
> > > [rank-choice-voting]
> https://en.wikipedia.org/wiki/Instant-runoff_voting
> >
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>


[ANNOUNCE] Apache Lucene 8.6.2 released

2020-09-01 Thread Ignacio Vera
01 September 2020, Apache Lucene™ 8.6.2 available

The Lucene PMC is pleased to announce the release of Apache Lucene 8.6.2.

Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for nearly
any application that requires full-text search, especially cross-platform.

This release contains one bug fix. The release is available for immediate
download at:

https://lucene.apache.org/core/downloads.html

Lucene 8.6.2 Bug Fixes:

LUCENE-9478: IndexWriter leaked about 500 byte of heap space for each
full-flush, getReader or commit. This was a regression in 6.8.0

Further details of changes are available in the change log available at:
https://lucene.apache.org/core/8_6_2/changes/Changes.html


Please report any feedback to the mailing lists (
https://lucene.apache.org/core/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases. It is possible that the mirror you are using may
not have replicated the release yet. If that is the case, please try
another mirror. This also applies to Maven access.


Re: [VOTE] Lucene logo contest

2020-06-16 Thread Ignacio Vera Sequeiros
PMC vote:  option A

On Wed, Jun 17, 2020 at 7:36 AM Jeroen Lauwers 
wrote:

> A. Definitely.
>
> Verstuurd vanaf mijn telefoon
>
> > Op 17 jun. 2020 om 03:46 heeft Jason Gerlowski 
> het volgende geschreven:
> >
> > Option "A"
> >
> >> On Tue, Jun 16, 2020 at 8:37 PM Man with No Name
> >>  wrote:
> >>
> >> A, clean and modern.
> >>
> >>> On Mon, Jun 15, 2020 at 6:08 PM Ryan Ernst  wrote:
> >>>
> >>> Dear Lucene and Solr developers!
> >>>
> >>> In February a contest was started to design a new logo for Lucene [1].
> That contest concluded, and I am now (admittedly a little late!) calling a
> vote.
> >>>
> >>> The entries are labeled as follows:
> >>>
> >>> A. Submitted by Dustin Haver [2]
> >>>
> >>> B. Submitted by Stamatis Zampetakis [3] Note that this has several
> variants. Within the linked entry there are 7 patterns and 7 color
> palettes. Any vote for B should contain the pattern number, like B1 or B3.
> If a B variant wins, we will have a followup vote on the color palette.
> >>>
> >>> C. The current Lucene logo [4]
> >>>
> >>> Please vote for one of the three (or nine depending on your
> perspective!) above choices. Note that anyone in the Lucene+Solr community
> is invited to express their opinion, though only Lucene+Solr PMC cast
> binding votes (indicate non-binding votes in your reply, please). This vote
> will close one week from today, Mon, June 22, 2020.
> >>>
> >>> Thanks!
> >>>
> >>> [1] https://issues.apache.org/jira/browse/LUCENE-9221
> >>> [2]
> https://issues.apache.org/jira/secure/attachment/12999548/Screen%20Shot%202020-04-10%20at%208.29.32%20AM.png
> >>> [3]
> https://issues.apache.org/jira/secure/attachment/12997768/zabetak-1-7.pdf
> >>> [4]
> https://lucene.apache.org/theme/images/lucene/lucene_logo_green_300.png
> >>
> >> --
> >> Sent from Gmail for IPhone
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Tessellate exception in Elasticsearch

2020-06-04 Thread Ignacio Vera
I think this is not a lucene issue. Elasticsearch geo_shape only supports
(and it assumes) polygons on the WGS-84 reference system.

On Thu, Jun 4, 2020 at 1:38 PM Claeys Wouter 
wrote:

> Hi,
>
> This is the original polygon:
>
> {
>"crs":{
>   "type":"name",
>   "properties":{
>  "name":"urn:ogc:def:crs:EPSG::31370"
>   }
>},
>"type":"MultiPolygon",
>"coordinates":[
>   [
>  [
> [
>171044.231002,
>175818.094268
> ],
> [
>170996.799514,
>175850.678652
> ],
> [
>170957.441562,
>175877.716668
> ],
> [
>170946.243418,
>175861.052668
> ],
> [
>170935.531674,
>175845.112572
> ],
> [
>170923.57865,
>175827.325308
> ],
> [
>170906.675354,
>175802.171388
> ],
> [
>170886.642266,
>175772.360124
> ],
> [
>170886.478554,
>175772.116476
> ],
> [
>170951.311002,
>175727.607548
> ],
> [
>171026.378266,
>175676.072188
> ],
> [
>171098.875162,
>175780.555004
> ],
>     [
>171090.729754,
>175786.150716
> ],
> [
>171044.231002,
>175818.094268
> ]
>  ]
>   ]
>]
> }
>
> Thanks!
>
> 
> Van: Ignacio Vera Sequeiros 
> Verzonden: donderdag 4 juni 2020 12:24
> Aan: java-user@lucene.apache.org 
> Onderwerp: Re: Tessellate exception in Elasticsearch
>
> Hi,
>
> I think your polygon has intersecting edges but it is difficult to
> reproduce with that output. Could you provide the original polygon you are
> trying to index?
>
> Thanks!
>
> On Thu, Jun 4, 2020 at 11:30 AM Claeys Wouter  >
> wrote:
>
> > Hi,
> >
> > This is an error which we get in Elasticsearch when trying to index
> > geo_shape fields but it seems this can be narrowed down to a problem in
> > Lucene. We can reproduce the problem withe ES 6.8.x and ES 7.7.x. This is
> > the error we are getting:
> >
> > Caused by: java.lang.IllegalArgumentException: Unable to Tessellate shape
> > [[9.3213479767, -3.20048586995] [-17.71666808212,
> > -42.558438073] [-1.052667893531, -53.756582018]
> > [14.88742787267, -64.4683260198] [32.67469200505,
> > -76.4213498962] [57.8286121207, -93.3246459366]
> > [87.639876011, -113.3577339016] [87.883524044,
> > -113.5214459883] [47.6075479995, -48.6889979781]
> > [-3.927812089984, 26.37826614264] [79.4449960563,
> > 98.8751621119] [73.8492839626, 90.7297539996]
> > [41.9057321381, 44.2310018589] [9.3213479767,
> > -3.20048586995] ]. Possible malformed shape detected.
> > at
> > org.apache.lucene.geo.Tessellator.tessellate(Tessellator.java:114)
> > ~[lucene-sandbox-7.7.3.jar:7.7.3
> 1a0d2a901dfec93676b0fe8be425101ceb754b85 -
> > noble - 2020-04-21 10:31:55]
> > at
> >
> org.apache.lucene.document.LatLonShape.createIndexableFields(LatLonShape.java:73)
> > ~[lucene-sandbox-7.7.3.jar:7.7.3
> 1a0d2a901dfec93676b0fe8be425101ceb754b85 -
> > noble - 2020-04-21 10:31:55]
> > at
> >
> org.elasticsearch.index.mapper.GeoShapeFieldMapper.indexShape(GeoShapeFieldMapper.java:146)
> > ~[elasticsearch-6.8.9.jar:6.8.9]
> >
> > This is a very basic geometry. Could someone please explain why this
> shape
> > is invalid?
> >
> >
> >
> >
> > Thanks in advance,
> >
> > Wouter Claeys
> >
>


Re: Tessellate exception in Elasticsearch

2020-06-04 Thread Ignacio Vera Sequeiros
Hi,

I think your polygon has intersecting edges but it is difficult to
reproduce with that output. Could you provide the original polygon you are
trying to index?

Thanks!

On Thu, Jun 4, 2020 at 11:30 AM Claeys Wouter 
wrote:

> Hi,
>
> This is an error which we get in Elasticsearch when trying to index
> geo_shape fields but it seems this can be narrowed down to a problem in
> Lucene. We can reproduce the problem withe ES 6.8.x and ES 7.7.x. This is
> the error we are getting:
>
> Caused by: java.lang.IllegalArgumentException: Unable to Tessellate shape
> [[9.3213479767, -3.20048586995] [-17.71666808212,
> -42.558438073] [-1.052667893531, -53.756582018]
> [14.88742787267, -64.4683260198] [32.67469200505,
> -76.4213498962] [57.8286121207, -93.3246459366]
> [87.639876011, -113.3577339016] [87.883524044,
> -113.5214459883] [47.6075479995, -48.6889979781]
> [-3.927812089984, 26.37826614264] [79.4449960563,
> 98.8751621119] [73.8492839626, 90.7297539996]
> [41.9057321381, 44.2310018589] [9.3213479767,
> -3.20048586995] ]. Possible malformed shape detected.
> at
> org.apache.lucene.geo.Tessellator.tessellate(Tessellator.java:114)
> ~[lucene-sandbox-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 -
> noble - 2020-04-21 10:31:55]
> at
> org.apache.lucene.document.LatLonShape.createIndexableFields(LatLonShape.java:73)
> ~[lucene-sandbox-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 -
> noble - 2020-04-21 10:31:55]
> at
> org.elasticsearch.index.mapper.GeoShapeFieldMapper.indexShape(GeoShapeFieldMapper.java:146)
> ~[elasticsearch-6.8.9.jar:6.8.9]
>
> This is a very basic geometry. Could someone please explain why this shape
> is invalid?
>
>
>
>
> Thanks in advance,
>
> Wouter Claeys
>


[ANNOUNCE] Apache Lucene 8.5.1 released

2020-04-16 Thread Ignacio Vera
## 16 April 2020, Apache Lucene™ 8.5.1 available


The Lucene PMC is pleased to announce the release of Apache Lucene 8.5.1.


Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for nearly
any application that requires full-text search, especially cross-platform.


This release contains one bug fix. The release is available for immediate
download at:


  


### Lucene 8.5.1 Bug Fixes:

LUCENE-9300: Index corruption with doc values updates and addIndexes.

Please read CHANGES.txt for a full list of changes:


  


Note: The Apache Software Foundation uses an extensive mirroring network for

distributing releases. It is possible that the mirror you are using may not
have

replicated the release yet. If that is the case, please try another mirror.

This also applies to Maven access.


[ANNOUNCE] Apache Lucene 8.2.0 released

2019-07-26 Thread Ignacio Vera
## 26 July 2019, Apache Lucene™ 8.2.0 available


The Lucene PMC is pleased to announce the release of Apache Lucene 8.2.0.


Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for nearly
any application that requires full-text search, especially cross-platform.


This release contains numerous bug fixes, optimizations, and improvements,
some of which are highlighted below. The release is available for immediate
download at:


  


### Lucene 8.2.0 Release Highlights:


 API Changes:


  * Intervals queries has been moved from the sandbox to the queries module.


 New Features


  * New XYShape Field and Queries for indexing and querying general
cartesian geometries.

  * Snowball stemmer/analyzer for the Estonian language.

  * Provide a FeatureSortfield to allow sorting search hits by descending
value of a feature.

  * Add new KoreanNumberFilter that can change Hangul character to number
and process decimal point.

  * Add doc-value support to range fields.

  * Add monitor subproject (previously Luwak monitoring library) that
allows a stream of documents to be matched against a set of registered
queriesin an efficient manner.

  * Add a numeric range query in sandbox that takes advantage of index
sorting.Add a numeric range query in sandbox that takes advantage of index
sorting.


 Optimizations


  * Use exponential search instead of binary search in
IntArrayDocIdSet#advance method.

  * Use incoming thread for execution if IndexSearcher has an executor. Now
caller threads execute at least one search on an index even if there is an
executor provided to minimize thread context switching.

  * New storing strategy for BKD tree leaves with low cardinality that can
lower storage costs and It can be used at search time to speed up queries.

  * Load frequencies lazily only when needed in BlockDocsEnum and
BlockImpactsEverythingEnum.

  * Phrase queries now leverage impacts.


Please read CHANGES.txt for a full list of new features and changes:


  


Note: The Apache Software Foundation uses an extensive mirroring network for

distributing releases. It is possible that the mirror you are using may not
have

replicated the release yet. If that is the case, please try another mirror.

This also applies to Maven access.