[GitHub] [lucene-jira-archive] stanislawosinski commented on issue #102: Missing accounts (committers)

2022-08-01 Thread GitBox


stanislawosinski commented on issue #102:
URL: 
https://github.com/apache/lucene-jira-archive/issues/102#issuecomment-1202064626

   Hi Tomoko,
   
   I added my GitHub user name at id.apache.org yesterday, this is what I see
   when I log in again:
   
   [image: image.png]
   
   I don't see Apache in my list of GitHub organizations yet though. Is there
   anything else I need to do?
   
   Thanks,
   
   Stanislaw
   
   On Tue, Aug 2, 2022 at 7:18 AM Tomoko Uchida ***@***.***>
   wrote:
   
   > Hi @sigram , @areek ,
   > @ChrisHegarty , @cmoen
   > , @GregBowyer ,
   > @hgadre , @chatman ,
   > @DaddyWri , @martijnvg
   > , @otisg ,
   > @stanislawosinski , @ovalhub
   >  and @whoschek :
   >
   > I wanted to let you know that I sent an email (DM) about the coming
   > Lucene's GitHub issue migration to your @apache.org email address. To
   > make the migration complete, we need your action. Three minutes would be
   > sufficient - please check your inbox.
   > Thank you.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > 
,
   > or unsubscribe
   > 

   > .
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta commented on issue #102: Missing accounts (committers)

2022-08-01 Thread GitBox


mocobeta commented on issue #102:
URL: 
https://github.com/apache/lucene-jira-archive/issues/102#issuecomment-1202027907

   Hi @sigram, @areek, @ChrisHegarty, @cmoen, @GregBowyer, @hgadre, @chatman, 
@DaddyWri, @martijnvg, @otisg, @stanislawosinski, @ovalhub and @whoschek:
   
   I wanted to let you know that I sent an email (DM) about the coming Lucene's 
GitHub issue migration to your `@apache.org` email address. To make the 
migration complete, we need your action. Three minutes would be sufficient - 
please check your inbox.
   Thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] tang-hi commented on pull request #1016: LUCENE-10646: Add some comment on LevenshteinAutomata

2022-08-01 Thread GitBox


tang-hi commented on PR #1016:
URL: https://github.com/apache/lucene/pull/1016#issuecomment-1202020494

   > > > I will restore `Automaton.java` in this pr. And may be open a new pr 
to add back `Operations.removeDeadStates(automata);`?
   > > 
   > > 
   > > Except, please remove that misleading `// TODO` comment about switching 
to BFS! It is clearly wrong ;) Thank you for persisting on this.
   > > And also please keep that nice improvements to Lev1 documentation!
   > > I think it's OK to do all of this in PR, or open a separate one if you 
want.
   > 
   > done. And I will raise another pr to add back 
Operations.removeDeadStates(automata) when I am free.
   
   Oh! I misunderstamd what you mean, I thought the method removeDeadStates was 
lost. But actually it was `Operations.removeDeadStates(automata)` lost. I have 
already add it back in the latest commit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] tang-hi commented on pull request #1016: LUCENE-10646: Add some comment on LevenshteinAutomata

2022-08-01 Thread GitBox


tang-hi commented on PR #1016:
URL: https://github.com/apache/lucene/pull/1016#issuecomment-1201974056

   > > I will restore `Automaton.java` in this pr. And may be open a new pr to 
add back `Operations.removeDeadStates(automata);`?
   > 
   > Except, please remove that misleading `// TODO` comment about switching to 
BFS! It is clearly wrong ;) Thank you for persisting on this.
   > 
   > And also please keep that nice improvements to Lev1 documentation!
   > 
   > I think it's OK to do all of this in PR, or open a separate one if you 
want.
   
   done. And I will raise another pr to add back 
Operations.removeDeadStates(automata) when I am free.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta closed issue #102: Missing accounts (committers)

2022-08-01 Thread GitBox


mocobeta closed issue #102: Missing accounts (committers)
URL: https://github.com/apache/lucene-jira-archive/issues/102


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta commented on issue #102: Missing accounts (committers)

2022-08-01 Thread GitBox


mocobeta commented on issue #102:
URL: 
https://github.com/apache/lucene-jira-archive/issues/102#issuecomment-1201943380

   I'm closing this. Thanks for your help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on pull request #1054: LUCENE-10577: enable quantization of HNSW vectors to 8 bits

2022-08-01 Thread GitBox


msokolov commented on PR #1054:
URL: https://github.com/apache/lucene/pull/1054#issuecomment-1201912373

   OK - I pushed a commit which actually adds the byte encoding to the whole 
Fields API - I had missed a few bits there :) So I think this is complete, but 
I still need to do some end-to-end testing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10671) Lucene

2022-08-01 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573919#comment-17573919
 ] 

Uwe Schindler commented on LUCENE-10671:


We can delete the whole issue.

> Lucene
> --
>
> Key: LUCENE-10671
> URL: https://issues.apache.org/jira/browse/LUCENE-10671
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/hnsw
>Affects Versions: 8.11.2
>Reporter: allnewcracksoftwares
>Priority: Minor
>
> [link title|http://example.com]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Deleted] (LUCENE-10671) Lucene

2022-08-01 Thread Uwe Schindler (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler deleted LUCENE-10671:
---


> Lucene
> --
>
> Key: LUCENE-10671
> URL: https://issues.apache.org/jira/browse/LUCENE-10671
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: allnewcracksoftwares
>Priority: Minor
>
> [link title|http://example.com]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10671) Lucene

2022-08-01 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573918#comment-17573918
 ] 

Michael Sokolov commented on LUCENE-10671:
--

The "bad" links are still visible in the History tab - I wonder if we can erase 
them from there?

> Lucene
> --
>
> Key: LUCENE-10671
> URL: https://issues.apache.org/jira/browse/LUCENE-10671
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/hnsw
>Affects Versions: 8.11.2
>Reporter: allnewcracksoftwares
>Priority: Minor
>
> [link title|http://example.com]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] nknize commented on a diff in pull request #1017: LUCENE-10654: Add new ShapeDocValuesField for LatLonShape and XYShape

2022-08-01 Thread GitBox


nknize commented on code in PR #1017:
URL: https://github.com/apache/lucene/pull/1017#discussion_r934857341


##
lucene/core/src/java/org/apache/lucene/document/ShapeDocValuesField.java:
##
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.document;
+
+import org.apache.lucene.analysis.Analyzer;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.document.ShapeField.DecodedTriangle.TYPE;
+import org.apache.lucene.document.ShapeField.QueryRelation;
+import org.apache.lucene.geo.Geometry;
+import org.apache.lucene.index.DocValuesType;
+import org.apache.lucene.index.IndexableFieldType;
+import org.apache.lucene.search.Query;
+
+/**
+ * A doc values field for {@link LatLonShape} and {@link XYShape} that uses 
{@link ShapeDocValues}
+ * as the underlying binary doc value format.
+ *
+ * Note that this class cannot be instantiated directly due to different 
encodings {@link
+ * org.apache.lucene.geo.XYEncodingUtils} and {@link 
org.apache.lucene.geo.GeoEncodingUtils}
+ *
+ * Concrete Implementations include: {@link LatLonShapeDocValuesField} and 
{@link
+ * XYShapeDocValuesField}
+ *
+ * @lucene.experimental
+ */
+abstract class ShapeDocValuesField extends Field {

Review Comment:
   Yes, this is a good point. If the ShapeDocValueField is not public then the 
calling project will not be able to use it abstractly. I'll make that change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10627) Using ByteBuffersDataInput reduce memory copy on compressing data

2022-08-01 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-10627.
---
Fix Version/s: 9.4
   Resolution: Fixed

> Using ByteBuffersDataInput reduce memory copy on compressing data
> -
>
> Key: LUCENE-10627
> URL: https://issues.apache.org/jira/browse/LUCENE-10627
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/store
>Reporter: LuYunCheng
>Priority: Major
> Fix For: 9.4
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Code: [https://github.com/apache/lucene/pull/987]
> I see When Lucene Do flush and merge store fields, need many memory copies:
> {code:java}
> Lucene Merge Thread #25940]" #906546 daemon prio=5 os_prio=0 cpu=20503.95ms 
> elapsed=68.76s tid=0x7ee990002c50 nid=0x3aac54 runnable  
> [0x7f17718db000]
>    java.lang.Thread.State: RUNNABLE
>     at 
> org.apache.lucene.store.ByteBuffersDataOutput.toArrayCopy(ByteBuffersDataOutput.java:271)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:239)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:169)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:654)
>     at 
> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:228)
>     at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
>     at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4760)
>     at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4364)
>     at 
> org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5923)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624)
>     at 
> org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:100)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:682)
>  {code}
> When Lucene *CompressingStoredFieldsWriter* do flush documents, it needs many 
> memory copies:
> With Lucene90 using {*}LZ4WithPresetDictCompressionMode{*}:
>  # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk 
> compress
>  # compressor copy dict and data into one block buffer
>  # do compress
>  # copy compressed data out
> With Lucene90 using {*}DeflateWithPresetDictCompressionMode{*}:
>  # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk 
> compress
>  # do compress
>  # copy compressed data out
>  
> I think we can use -CompositeByteBuf- to reduce temp memory copies:
>  # we do not have to *bufferedDocs.toArrayCopy* when just need continues 
> content for chunk compress
>  
> I write a simple mini benchamrk in test code ([link 
> |https://github.com/apache/lucene/blob/5a406a5c483c7fadaf0e8a5f06732c79ad174d11/lucene/core/src/test/org/apache/lucene/codecs/lucene90/compressing/TestCompressingStoredFieldsFormat.java#L353]):
> *LZ4WithPresetDict run* Capacity:41943040(bytes) , iter 10times: Origin 
> elapse:5391ms , New elapse:5297ms
> *DeflateWithPresetDict run* Capacity:41943040(bytes), iter 10times: Origin 
> elapse:{*}115ms{*}, New elapse:{*}12ms{*}
>  
> And I run runStoredFieldsBenchmark with doc_limit=-1:
> shows:
> ||Msec to index||BEST_SPEED ||BEST_COMPRESSION||
> |Baseline|318877.00|606288.00|
> |Candidate|314442.00|604719.00|
>  
> --{-}UPDATE{-}--
>  
>  I try to *reuse ByteBuffersDataInput* to reduce memory copy because it can 
> get from ByteBuffersDataOutput.toDataInput.  and it could reduce this 
> complexity ([PR|https://github.com/apache/lucene/pull/987])
> BUT i am not sure whether can change Compressor interface compress input 
> param from byte[] to ByteBuffersDataInput. If change this interface 
> [like|https://github.com/apache/lucene/blob/382962f22df3ee3af3fb538b877c98d61a622ddb/lucene/core/src/java/org/apache/lucene/codecs/compressing/Compressor.java#L35],
>  it increased the backport code 
> [like|https://github.com/apache/lucene/blob/382962f22df3ee3af3fb538b877c98d61a622ddb/lucene/core/src/java/org/apache/lucene/codecs/compressing/CompressionMode.java#L274],
>  however if we change the interface with ByteBuffersDataInput, we can 
> optimize memory copy into different compress algorithm code.
> Also, i found we can do more memory copy reduce in 
> *{{{}CompressingStoredFieldsWriter.{}}}{{{}copyOneDoc 
> [like|https://github.com/apache/lucene/blob/382962f22df3ee3af3fb538b877c98d61a622ddb/lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFi

[GitHub] [lucene-jira-archive] mocobeta commented on issue #102: Missing accounts (committers)

2022-08-01 Thread GitBox


mocobeta commented on issue #102:
URL: 
https://github.com/apache/lucene-jira-archive/issues/102#issuecomment-1201509891

   > I think we'd need to ask committers to link their GitHub account with your 
ASF account.
   > https://infra.apache.org/apache-github.html
   
   I sent an email to committers whose GH account is not linked to any ASF 
account yet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] (LUCENE-10671) Lucene

2022-08-01 Thread Dawid Weiss (Jira)


[ https://issues.apache.org/jira/browse/LUCENE-10671 ]


Dawid Weiss deleted comment on LUCENE-10671:
--

was (Author: JIRAUSER293699):
https://allnewcracksoftwares.com/typing-master-pro-11-crack-with-serial-keys-download/

> Lucene
> --
>
> Key: LUCENE-10671
> URL: https://issues.apache.org/jira/browse/LUCENE-10671
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/hnsw
>Affects Versions: 8.11.2
>Reporter: allnewcracksoftwares
>Priority: Minor
>
> [link title|http://example.com]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10671) Lucene

2022-08-01 Thread allnewcracksoftwares (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573864#comment-17573864
 ] 

allnewcracksoftwares commented on LUCENE-10671:
---

https://allnewcracksoftwares.com/typing-master-pro-11-crack-with-serial-keys-download/

> Lucene
> --
>
> Key: LUCENE-10671
> URL: https://issues.apache.org/jira/browse/LUCENE-10671
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/hnsw
>Affects Versions: 8.11.2
>Reporter: allnewcracksoftwares
>Priority: Minor
>
> [link title|http://example.com]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10627) Using ByteBuffersDataInput reduce memory copy on compressing data

2022-08-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573859#comment-17573859
 ] 

ASF subversion and git services commented on LUCENE-10627:
--

Commit 2b75fe6d2005785e5214364a0563fdcba5d66c50 in lucene's branch 
refs/heads/branch_9x from luyuncheng
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=2b75fe6d200 ]

LUCENE-10627: Using ByteBuffersDataInput reduce memory copy on compressing data 
(#987)


> Using ByteBuffersDataInput reduce memory copy on compressing data
> -
>
> Key: LUCENE-10627
> URL: https://issues.apache.org/jira/browse/LUCENE-10627
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/store
>Reporter: LuYunCheng
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Code: [https://github.com/apache/lucene/pull/987]
> I see When Lucene Do flush and merge store fields, need many memory copies:
> {code:java}
> Lucene Merge Thread #25940]" #906546 daemon prio=5 os_prio=0 cpu=20503.95ms 
> elapsed=68.76s tid=0x7ee990002c50 nid=0x3aac54 runnable  
> [0x7f17718db000]
>    java.lang.Thread.State: RUNNABLE
>     at 
> org.apache.lucene.store.ByteBuffersDataOutput.toArrayCopy(ByteBuffersDataOutput.java:271)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:239)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:169)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:654)
>     at 
> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:228)
>     at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
>     at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4760)
>     at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4364)
>     at 
> org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5923)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624)
>     at 
> org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:100)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:682)
>  {code}
> When Lucene *CompressingStoredFieldsWriter* do flush documents, it needs many 
> memory copies:
> With Lucene90 using {*}LZ4WithPresetDictCompressionMode{*}:
>  # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk 
> compress
>  # compressor copy dict and data into one block buffer
>  # do compress
>  # copy compressed data out
> With Lucene90 using {*}DeflateWithPresetDictCompressionMode{*}:
>  # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk 
> compress
>  # do compress
>  # copy compressed data out
>  
> I think we can use -CompositeByteBuf- to reduce temp memory copies:
>  # we do not have to *bufferedDocs.toArrayCopy* when just need continues 
> content for chunk compress
>  
> I write a simple mini benchamrk in test code ([link 
> |https://github.com/apache/lucene/blob/5a406a5c483c7fadaf0e8a5f06732c79ad174d11/lucene/core/src/test/org/apache/lucene/codecs/lucene90/compressing/TestCompressingStoredFieldsFormat.java#L353]):
> *LZ4WithPresetDict run* Capacity:41943040(bytes) , iter 10times: Origin 
> elapse:5391ms , New elapse:5297ms
> *DeflateWithPresetDict run* Capacity:41943040(bytes), iter 10times: Origin 
> elapse:{*}115ms{*}, New elapse:{*}12ms{*}
>  
> And I run runStoredFieldsBenchmark with doc_limit=-1:
> shows:
> ||Msec to index||BEST_SPEED ||BEST_COMPRESSION||
> |Baseline|318877.00|606288.00|
> |Candidate|314442.00|604719.00|
>  
> --{-}UPDATE{-}--
>  
>  I try to *reuse ByteBuffersDataInput* to reduce memory copy because it can 
> get from ByteBuffersDataOutput.toDataInput.  and it could reduce this 
> complexity ([PR|https://github.com/apache/lucene/pull/987])
> BUT i am not sure whether can change Compressor interface compress input 
> param from byte[] to ByteBuffersDataInput. If change this interface 
> [like|https://github.com/apache/lucene/blob/382962f22df3ee3af3fb538b877c98d61a622ddb/lucene/core/src/java/org/apache/lucene/codecs/compressing/Compressor.java#L35],
>  it increased the backport code 
> [like|https://github.com/apache/lucene/blob/382962f22df3ee3af3fb538b877c98d61a622ddb/lucene/core/src/java/org/apache/lucene/codecs/compressing/CompressionMode.java#L274],
>  however if we change the interface with ByteBuffersDataInput, we can 
> optimize memory copy into different compress algorithm code.
> Also, i fou

[jira] [Commented] (LUCENE-10627) Using ByteBuffersDataInput reduce memory copy on compressing data

2022-08-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573851#comment-17573851
 ] 

ASF subversion and git services commented on LUCENE-10627:
--

Commit 34154736c6ed241d7d9d0c6f4a0e6419936490b7 in lucene's branch 
refs/heads/main from luyuncheng
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=34154736c6e ]

LUCENE-10627: Using ByteBuffersDataInput reduce memory copy on compressing data 
(#987)



> Using ByteBuffersDataInput reduce memory copy on compressing data
> -
>
> Key: LUCENE-10627
> URL: https://issues.apache.org/jira/browse/LUCENE-10627
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/store
>Reporter: LuYunCheng
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Code: [https://github.com/apache/lucene/pull/987]
> I see When Lucene Do flush and merge store fields, need many memory copies:
> {code:java}
> Lucene Merge Thread #25940]" #906546 daemon prio=5 os_prio=0 cpu=20503.95ms 
> elapsed=68.76s tid=0x7ee990002c50 nid=0x3aac54 runnable  
> [0x7f17718db000]
>    java.lang.Thread.State: RUNNABLE
>     at 
> org.apache.lucene.store.ByteBuffersDataOutput.toArrayCopy(ByteBuffersDataOutput.java:271)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:239)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:169)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:654)
>     at 
> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:228)
>     at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
>     at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4760)
>     at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4364)
>     at 
> org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5923)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624)
>     at 
> org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:100)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:682)
>  {code}
> When Lucene *CompressingStoredFieldsWriter* do flush documents, it needs many 
> memory copies:
> With Lucene90 using {*}LZ4WithPresetDictCompressionMode{*}:
>  # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk 
> compress
>  # compressor copy dict and data into one block buffer
>  # do compress
>  # copy compressed data out
> With Lucene90 using {*}DeflateWithPresetDictCompressionMode{*}:
>  # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk 
> compress
>  # do compress
>  # copy compressed data out
>  
> I think we can use -CompositeByteBuf- to reduce temp memory copies:
>  # we do not have to *bufferedDocs.toArrayCopy* when just need continues 
> content for chunk compress
>  
> I write a simple mini benchamrk in test code ([link 
> |https://github.com/apache/lucene/blob/5a406a5c483c7fadaf0e8a5f06732c79ad174d11/lucene/core/src/test/org/apache/lucene/codecs/lucene90/compressing/TestCompressingStoredFieldsFormat.java#L353]):
> *LZ4WithPresetDict run* Capacity:41943040(bytes) , iter 10times: Origin 
> elapse:5391ms , New elapse:5297ms
> *DeflateWithPresetDict run* Capacity:41943040(bytes), iter 10times: Origin 
> elapse:{*}115ms{*}, New elapse:{*}12ms{*}
>  
> And I run runStoredFieldsBenchmark with doc_limit=-1:
> shows:
> ||Msec to index||BEST_SPEED ||BEST_COMPRESSION||
> |Baseline|318877.00|606288.00|
> |Candidate|314442.00|604719.00|
>  
> --{-}UPDATE{-}--
>  
>  I try to *reuse ByteBuffersDataInput* to reduce memory copy because it can 
> get from ByteBuffersDataOutput.toDataInput.  and it could reduce this 
> complexity ([PR|https://github.com/apache/lucene/pull/987])
> BUT i am not sure whether can change Compressor interface compress input 
> param from byte[] to ByteBuffersDataInput. If change this interface 
> [like|https://github.com/apache/lucene/blob/382962f22df3ee3af3fb538b877c98d61a622ddb/lucene/core/src/java/org/apache/lucene/codecs/compressing/Compressor.java#L35],
>  it increased the backport code 
> [like|https://github.com/apache/lucene/blob/382962f22df3ee3af3fb538b877c98d61a622ddb/lucene/core/src/java/org/apache/lucene/codecs/compressing/CompressionMode.java#L274],
>  however if we change the interface with ByteBuffersDataInput, we can 
> optimize memory copy into different compress algorithm code.
> Also, i found w

[GitHub] [lucene] jpountz merged pull request #987: LUCENE-10627: Using ByteBuffersDataInput reduce memory copy on compressing data

2022-08-01 Thread GitBox


jpountz merged PR #987:
URL: https://github.com/apache/lucene/pull/987


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on pull request #1016: LUCENE-10646: Add some comment on LevenshteinAutomata

2022-08-01 Thread GitBox


mikemccand commented on PR #1016:
URL: https://github.com/apache/lucene/pull/1016#issuecomment-1201421088

   > I will restore `Automaton.java` in this pr. And may be open a new pr to 
add back `Operations.removeDeadStates(automata);`?
   
   Except, please remove that misleading `// TODO` comment about switching to 
BFS!  It is clearly wrong ;)  Thank you for persisting on this.
   
   And also please keep that nice improvements to Lev1 documentation!
   
   I think it's OK to do all of this in PR, or open a separate one if you want.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] luyuncheng commented on a diff in pull request #987: LUCENE-10627: Using ByteBuffersDataInput reduce memory copy on compressing data

2022-08-01 Thread GitBox


luyuncheng commented on code in PR #987:
URL: https://github.com/apache/lucene/pull/987#discussion_r934647956


##
lucene/CHANGES.txt:
##
@@ -54,7 +54,8 @@ Improvements
 
 Optimizations
 -
-(No changes)
+
+* LUCENE-10627: Using ByteBuffersDataInput reduce memory copy on compressing 
data. (luyuncheng)

Review Comment:
   > Move it to the 9.4 section?
   
   DONE



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] tang-hi commented on pull request #1016: LUCENE-10646: Add some comment on LevenshteinAutomata

2022-08-01 Thread GitBox


tang-hi commented on PR #1016:
URL: https://github.com/apache/lucene/pull/1016#issuecomment-1201341143

   > 
   
   
   
   
   > I think we should keep the simple `toDot` and not try to coerce graphviz 
in a special way with magic like this. It has a lot of options to tweak its 
output.
   > 
   > Biggest win to make the automata more readable visually is to add back the 
missing removal of dead states to LevensteinAutomata.java, we should do this at 
the end:
   > 
   > ```
   > automata = Operations.removeDeadStates(automata);
   > ```
   > 
   > It seems this was lost in a refactoring.
   
   I compare the state between `use bfs`  and `not use bfs`.It seems like bfs 
has little effect when dead state  is included.
   So I think keep the simple `toDot` is a good idea. 
   I will restore `Automaton.java` in this pr. And may be open a new pr to add 
back `Operations.removeDeadStates(automata);`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] whoschek commented on issue #102: Missing accounts (committers)

2022-08-01 Thread GitBox


whoschek commented on issue #102:
URL: 
https://github.com/apache/lucene-jira-archive/issues/102#issuecomment-1201340757

   Hi Uwe, Yeah, that's me.
   Thanks!
   Wolfgang Hoschek
   
   > On Aug 1, 2022, at 3:59 PM, Uwe Schindler ***@***.***> wrote:
   > 
   > 
   > @whoschek  is Wolfgang Hoschek. I know him 
from the picture. Also the repositories listed in his account look fine.
   > 
   > Bernhard Messer is not on GitHub. He is no longer active. He works as 
senior/founder at Intrafind Software. His colleague Christoph Goller (same 
company) I meet regularily, but as far as I know also has no GitHub account. As 
far as I remember, they use Gitlab.
   > 
   > Michael Busch was active in Lucene before Got was used. He worked for 
Twitter and they had no public repositories. Nowadays he's not active in Lzcene 
anymore and I did not find any account.
   > 
   > The Doron Cohen ones all look wrong.
   > 
   > —
   > Reply to this email directly, view it on GitHub 
,
 or unsubscribe 
.
   > You are receiving this because you were mentioned.
   > 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta commented on issue #102: Missing accounts (committers)

2022-08-01 Thread GitBox


mocobeta commented on issue #102:
URL: 
https://github.com/apache/lucene-jira-archive/issues/102#issuecomment-1201315446

   There is one more issue for committers - many committers do not associate 
their GitHub accounts with [the ASF organization](https://github.com/apache), 
which means they have no push access to `apache/lucene` and we cannot "assign" 
them to GitHub issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta commented on issue #102: Missing accounts (committers)

2022-08-01 Thread GitBox


mocobeta commented on issue #102:
URL: 
https://github.com/apache/lucene-jira-archive/issues/102#issuecomment-1201307643

   > `@whoschek` is Wolfgang Hoschek. 
   
   Thank you @uschindler - added in 
https://github.com/apache/lucene-jira-archive/commit/836f08afbc748a727b0952e3e5243f6699176810.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10216) Add concurrency to addIndexes(CodecReader…) API

2022-08-01 Thread Michael McCandless (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-10216.
-
Fix Version/s: 9.4
   Resolution: Fixed

> Add concurrency to addIndexes(CodecReader…) API
> ---
>
> Key: LUCENE-10216
> URL: https://issues.apache.org/jira/browse/LUCENE-10216
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Vigya Sharma
>Priority: Major
> Fix For: main, 9.4
>
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>
> I work at Amazon Product Search, and we use Lucene to power search for the 
> e-commerce platform. I’m working on a project that involves applying 
> metadata+ETL transforms and indexing documents on n different _indexing_ 
> boxes, combining them into a single index on a separate _reducer_ box, and 
> making it available for queries on m different _search_ boxes (replicas). 
> Segments are asynchronously copied from indexers to reducers to searchers as 
> they become available for the next layer to consume.
> I am using the addIndexes API to combine multiple indexes into one on the 
> reducer boxes. Since we also have taxonomy data, we need to remap facet field 
> ordinals, which means I need to use the {{addIndexes(CodecReader…)}} version 
> of this API. The API leverages {{SegmentMerger.merge()}} to create segments 
> with new ordinal values while also merging all provided segments in the 
> process.
> _This is however a blocking call that runs in a single thread._ Until we have 
> written segments with new ordinal values, we cannot copy them to searcher 
> boxes, which increases the time to make documents available for search.
> I was playing around with the API by creating multiple concurrent merges, 
> each with only a single reader, creating a concurrently running 1:1 
> conversion from old segments to new ones (with new ordinal values). We follow 
> this up with non-blocking background merges. This lets us copy the segments 
> to searchers and replicas as soon as they are available, and later replace 
> them with merged segments as background jobs complete. On the Amazon dataset 
> I profiled, this gave us around 2.5 to 3x improvement in addIndexes() time. 
> Each call was given about 5 readers to add on average.
> This might be useful add to Lucene. We could create another {{addIndexes()}} 
> API with a {{boolean}} flag for concurrency, that internally submits multiple 
> merge jobs (each with a single reader) to the {{ConcurrentMergeScheduler}}, 
> and waits for them to complete before returning.
> While this is doable from outside Lucene by using your thread pool, starting 
> multiple addIndexes() calls and waiting for them to complete, I felt it needs 
> some understanding of what addIndexes does, why you need to wait on the merge 
> and why it makes sense to pass a single reader in the addIndexes API.
> Out of box support in Lucene could simplify this for folks a similar use case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10216) Add concurrency to addIndexes(CodecReader…) API

2022-08-01 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573796#comment-17573796
 ] 

Michael McCandless commented on LUCENE-10216:
-

Awesome!  I think we can close this now [~vigyas]?

> Add concurrency to addIndexes(CodecReader…) API
> ---
>
> Key: LUCENE-10216
> URL: https://issues.apache.org/jira/browse/LUCENE-10216
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Vigya Sharma
>Priority: Major
> Fix For: main
>
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>
> I work at Amazon Product Search, and we use Lucene to power search for the 
> e-commerce platform. I’m working on a project that involves applying 
> metadata+ETL transforms and indexing documents on n different _indexing_ 
> boxes, combining them into a single index on a separate _reducer_ box, and 
> making it available for queries on m different _search_ boxes (replicas). 
> Segments are asynchronously copied from indexers to reducers to searchers as 
> they become available for the next layer to consume.
> I am using the addIndexes API to combine multiple indexes into one on the 
> reducer boxes. Since we also have taxonomy data, we need to remap facet field 
> ordinals, which means I need to use the {{addIndexes(CodecReader…)}} version 
> of this API. The API leverages {{SegmentMerger.merge()}} to create segments 
> with new ordinal values while also merging all provided segments in the 
> process.
> _This is however a blocking call that runs in a single thread._ Until we have 
> written segments with new ordinal values, we cannot copy them to searcher 
> boxes, which increases the time to make documents available for search.
> I was playing around with the API by creating multiple concurrent merges, 
> each with only a single reader, creating a concurrently running 1:1 
> conversion from old segments to new ones (with new ordinal values). We follow 
> this up with non-blocking background merges. This lets us copy the segments 
> to searchers and replicas as soon as they are available, and later replace 
> them with merged segments as background jobs complete. On the Amazon dataset 
> I profiled, this gave us around 2.5 to 3x improvement in addIndexes() time. 
> Each call was given about 5 readers to add on average.
> This might be useful add to Lucene. We could create another {{addIndexes()}} 
> API with a {{boolean}} flag for concurrency, that internally submits multiple 
> merge jobs (each with a single reader) to the {{ConcurrentMergeScheduler}}, 
> and waits for them to complete before returning.
> While this is doable from outside Lucene by using your thread pool, starting 
> multiple addIndexes() calls and waiting for them to complete, I felt it needs 
> some understanding of what addIndexes does, why you need to wait on the merge 
> and why it makes sense to pass a single reader in the addIndexes API.
> Out of box support in Lucene could simplify this for folks a similar use case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #1016: LUCENE-10646: Add some comment on LevenshteinAutomata

2022-08-01 Thread GitBox


rmuir commented on PR #1016:
URL: https://github.com/apache/lucene/pull/1016#issuecomment-1201298595

   I think we should keep the simple `toDot` and not try to coerce graphviz in 
a special way with magic like this. It has a lot of options to tweak its output.
   
   Biggest win to make the automata more readable visually is to add back the 
missing removal of dead states to LevensteinAutomata.java, we should do this at 
the end:
   
   ```
   automata = Operations.removeDeadStates(automata);
   ```
   
   It seems this was lost in a refactoring.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] uschindler commented on issue #102: Missing accounts (committers)

2022-08-01 Thread GitBox


uschindler commented on issue #102:
URL: 
https://github.com/apache/lucene-jira-archive/issues/102#issuecomment-1201243766

   @whoschek is Wolfgang Hoschek. I know him from the picture. Also the 
repositories listed in his account look fine.
   
   Bernhard Messer is not on GitHub. He is no longer active. He works as 
senior/founder at Intrafind Software. His colleague Christoph Goller (same 
company) I meet regularily, but as far as I know also has no GitHub account. As 
far as I remember, they use Gitlab.
   
   Michael Busch was active in Lucene before Got was used. He worked for 
Twitter and they had no public repositories. Nowadays he's not active in Lzcene 
anymore and I did not find any account.
   
   The Doron Cohen ones all look wrong.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on a diff in pull request #987: LUCENE-10627: Using ByteBuffersDataInput reduce memory copy on compressing data

2022-08-01 Thread GitBox


jpountz commented on code in PR #987:
URL: https://github.com/apache/lucene/pull/987#discussion_r934557353


##
lucene/CHANGES.txt:
##
@@ -54,7 +54,8 @@ Improvements
 
 Optimizations
 -
-(No changes)
+
+* LUCENE-10627: Using ByteBuffersDataInput reduce memory copy on compressing 
data. (luyuncheng)

Review Comment:
   Move it to the 9.4 section?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta commented on issue #106: Inlined patches don't migrate correctly

2022-08-01 Thread GitBox


mocobeta commented on issue #106:
URL: 
https://github.com/apache/lucene-jira-archive/issues/106#issuecomment-1201224245

   A
   Perhaps we could apply special treatments (e.g. escape all Markdown syntax) 
for old issues in the CVS era? According to this Wikipedia article, Markdown 
was released in 2004, and people are unlikely to use inline patches with 
Markdown.
   https://en.wikipedia.org/wiki/Markdown


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mikemccand commented on pull request #105: #93: insert extra newline when we see markdown-styled quoting (>) without extra newline after

2022-08-01 Thread GitBox


mikemccand commented on PR #105:
URL: 
https://github.com/apache/lucene-jira-archive/pull/105#issuecomment-1201214903

   NOTE: please don't merge this yet -- it is buggy, screws up inlined diffs, 
etc.  I am still iterating.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on pull request #1051: LUCENE-10216: Use MergeScheduler and MergePolicy to run addIndexes(CodecReader[]) merges.

2022-08-01 Thread GitBox


mikemccand commented on PR #1051:
URL: https://github.com/apache/lucene/pull/1051#issuecomment-1201213143

   Wooot!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10629) Add fastMatchQuery param to MatchingFacetSetCounts

2022-08-01 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573757#comment-17573757
 ] 

Adrien Grand commented on LUCENE-10629:
---

Sure thing, it was an easy fix!

> Add fastMatchQuery param to MatchingFacetSetCounts
> --
>
> Key: LUCENE-10629
> URL: https://issues.apache.org/jira/browse/LUCENE-10629
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Marc D'Mello
>Assignee: Shai Erera
>Priority: Minor
> Fix For: 9.4
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Some facet counters, like {{RangeFacetCounts}}, allow the user to pass in a 
> {{fastMatchQuery}} parameter in order to quickly and efficiently filter out 
> documents in the passed in match set. We should create this same parameter in 
> {{MatchingFacetSetCounts}} as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mikemccand opened a new issue, #106: Inlined patches don't migrate correctly

2022-08-01 Thread GitBox


mikemccand opened a new issue, #106:
URL: https://github.com/apache/lucene-jira-archive/issues/106

   While testing my #105 PR for #93, I came across [this poorly migrated 
issue](https://github.com/mocobeta/forks-migration-test/issues/108).
   
   It harks back from the CVS days!!
   
   It contains an inline patch, which is not rendered right, and my change in 
#105 perhaps makes it worse.  I'll try to identify inlined patch files and put 
them in a code block.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mikemccand opened a new pull request, #105: #93: insert extra newline when we see markdown-styled quoting (>) without extra newline after

2022-08-01 Thread GitBox


mikemccand opened a new pull request, #105:
URL: https://github.com/apache/lucene-jira-archive/pull/105

   This just inserts another newline when it sees what looks like a MD quote 
attempt (`> `) in Jira.
   
   So this:
   
   ```
   > hello
   wow you said hello to me!
   ```
   
   becomes:
   ```
   > hello
   
   wow you said hello to me!
   ```
   
   Rendered by GitHub:
   
   This:
   
   > hello
   wow you said hello to me!
   
   becomes this:
   
   > hello
   
   wow you said hello to me!
   
   It's sort of odd that GitHub MD renders in this way.
   
   I tested on the one issue I saw this on (LUCENE-2328).  I'm also counting 
how often this applies (alters the text) across my `jira-dump` from ~3 weeks 
ago maybe.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10629) Add fastMatchQuery param to MatchingFacetSetCounts

2022-08-01 Thread Shai Erera (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573743#comment-17573743
 ] 

Shai Erera commented on LUCENE-10629:
-

Oops, thanks [~jpountz] !

> Add fastMatchQuery param to MatchingFacetSetCounts
> --
>
> Key: LUCENE-10629
> URL: https://issues.apache.org/jira/browse/LUCENE-10629
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Marc D'Mello
>Assignee: Shai Erera
>Priority: Minor
> Fix For: 9.4
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Some facet counters, like {{RangeFacetCounts}}, allow the user to pass in a 
> {{fastMatchQuery}} parameter in order to quickly and efficiently filter out 
> documents in the passed in match set. We should create this same parameter in 
> {{MatchingFacetSetCounts}} as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10648) Fix TestAssertingPointsFormat.testWithExceptions failure

2022-08-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573741#comment-17573741
 ] 

ASF subversion and git services commented on LUCENE-10648:
--

Commit 5dd8e9bdc5ae72fc726a98a64bfce5119c77b558 in lucene's branch 
refs/heads/branch_9x from Vigya Sharma
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=5dd8e9bdc5a ]

LUCENE-10216: Use MergeScheduler and MergePolicy to run 
addIndexes(CodecReader[]) merges. (#1051)

Use merge policy and merge scheduler to run addIndexes merges.

This is a back port of the following commits from main:
 * LUCENE-10216: Use MergeScheduler and MergePolicy to run 
addIndexes(CodecReader[]) merges. (#633)
 * LUCENE-10648: Fix failures in TestAssertingPointsFormat.testWithExceptions 
(#1012)


> Fix TestAssertingPointsFormat.testWithExceptions failure
> 
>
> Key: LUCENE-10648
> URL: https://issues.apache.org/jira/browse/LUCENE-10648
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Vigya Sharma
>Priority: Major
> Fix For: 10.0 (main)
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We are seeing build failures due to 
> TestAssertingPointsFormat.testWithExceptions. I am able to repro this on my 
> box with the random seed. Tracking the issue here.
> Sample Failing Build: 
> https://ci-builds.apache.org/job/Lucene/job/Lucene-Check-main/6057/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10216) Add concurrency to addIndexes(CodecReader…) API

2022-08-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573740#comment-17573740
 ] 

ASF subversion and git services commented on LUCENE-10216:
--

Commit 5dd8e9bdc5ae72fc726a98a64bfce5119c77b558 in lucene's branch 
refs/heads/branch_9x from Vigya Sharma
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=5dd8e9bdc5a ]

LUCENE-10216: Use MergeScheduler and MergePolicy to run 
addIndexes(CodecReader[]) merges. (#1051)

Use merge policy and merge scheduler to run addIndexes merges.

This is a back port of the following commits from main:
 * LUCENE-10216: Use MergeScheduler and MergePolicy to run 
addIndexes(CodecReader[]) merges. (#633)
 * LUCENE-10648: Fix failures in TestAssertingPointsFormat.testWithExceptions 
(#1012)


> Add concurrency to addIndexes(CodecReader…) API
> ---
>
> Key: LUCENE-10216
> URL: https://issues.apache.org/jira/browse/LUCENE-10216
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Vigya Sharma
>Priority: Major
> Fix For: main
>
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> I work at Amazon Product Search, and we use Lucene to power search for the 
> e-commerce platform. I’m working on a project that involves applying 
> metadata+ETL transforms and indexing documents on n different _indexing_ 
> boxes, combining them into a single index on a separate _reducer_ box, and 
> making it available for queries on m different _search_ boxes (replicas). 
> Segments are asynchronously copied from indexers to reducers to searchers as 
> they become available for the next layer to consume.
> I am using the addIndexes API to combine multiple indexes into one on the 
> reducer boxes. Since we also have taxonomy data, we need to remap facet field 
> ordinals, which means I need to use the {{addIndexes(CodecReader…)}} version 
> of this API. The API leverages {{SegmentMerger.merge()}} to create segments 
> with new ordinal values while also merging all provided segments in the 
> process.
> _This is however a blocking call that runs in a single thread._ Until we have 
> written segments with new ordinal values, we cannot copy them to searcher 
> boxes, which increases the time to make documents available for search.
> I was playing around with the API by creating multiple concurrent merges, 
> each with only a single reader, creating a concurrently running 1:1 
> conversion from old segments to new ones (with new ordinal values). We follow 
> this up with non-blocking background merges. This lets us copy the segments 
> to searchers and replicas as soon as they are available, and later replace 
> them with merged segments as background jobs complete. On the Amazon dataset 
> I profiled, this gave us around 2.5 to 3x improvement in addIndexes() time. 
> Each call was given about 5 readers to add on average.
> This might be useful add to Lucene. We could create another {{addIndexes()}} 
> API with a {{boolean}} flag for concurrency, that internally submits multiple 
> merge jobs (each with a single reader) to the {{ConcurrentMergeScheduler}}, 
> and waits for them to complete before returning.
> While this is doable from outside Lucene by using your thread pool, starting 
> multiple addIndexes() calls and waiting for them to complete, I felt it needs 
> some understanding of what addIndexes does, why you need to wait on the merge 
> and why it makes sense to pass a single reader in the addIndexes API.
> Out of box support in Lucene could simplify this for folks a similar use case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] vigyasharma merged pull request #1051: LUCENE-10216: Use MergeScheduler and MergePolicy to run addIndexes(CodecReader[]) merges.

2022-08-01 Thread GitBox


vigyasharma merged PR #1051:
URL: https://github.com/apache/lucene/pull/1051


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-site] vigyasharma merged pull request #67: Add Vigya Sharma to the list of committers.

2022-08-01 Thread GitBox


vigyasharma merged PR #67:
URL: https://github.com/apache/lucene-site/pull/67


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on pull request #1054: LUCENE-10577: enable quantization of HNSW vectors to 8 bits

2022-08-01 Thread GitBox


msokolov commented on PR #1054:
URL: https://github.com/apache/lucene/pull/1054#issuecomment-1201127574

   ehh - I missed adding bytes-suport to KnnVectorField; will follow up 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10629) Add fastMatchQuery param to MatchingFacetSetCounts

2022-08-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573728#comment-17573728
 ] 

ASF subversion and git services commented on LUCENE-10629:
--

Commit 18f839bbf408abe8816e0647a06a062f9086fdce in lucene's branch 
refs/heads/branch_9x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=18f839bbf40 ]

LUCENE-10629: Fix NullPointerException.

I hit a NPE while running tests. `Weight#scorer` may return `null`, but not
`Scorer#iterator`.


> Add fastMatchQuery param to MatchingFacetSetCounts
> --
>
> Key: LUCENE-10629
> URL: https://issues.apache.org/jira/browse/LUCENE-10629
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Marc D'Mello
>Assignee: Shai Erera
>Priority: Minor
> Fix For: 9.4
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Some facet counters, like {{RangeFacetCounts}}, allow the user to pass in a 
> {{fastMatchQuery}} parameter in order to quickly and efficiently filter out 
> documents in the passed in match set. We should create this same parameter in 
> {{MatchingFacetSetCounts}} as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10629) Add fastMatchQuery param to MatchingFacetSetCounts

2022-08-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573729#comment-17573729
 ] 

ASF subversion and git services commented on LUCENE-10629:
--

Commit 04e4f317cb210158dd10c68ac2b970a688c9ae2c in lucene's branch 
refs/heads/main from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=04e4f317cb2 ]

LUCENE-10629: Fix NullPointerException.

I hit a NPE while running tests. `Weight#scorer` may return `null`, but not
`Scorer#iterator`.


> Add fastMatchQuery param to MatchingFacetSetCounts
> --
>
> Key: LUCENE-10629
> URL: https://issues.apache.org/jira/browse/LUCENE-10629
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Marc D'Mello
>Assignee: Shai Erera
>Priority: Minor
> Fix For: 9.4
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Some facet counters, like {{RangeFacetCounts}}, allow the user to pass in a 
> {{fastMatchQuery}} parameter in order to quickly and efficiently filter out 
> documents in the passed in match set. We should create this same parameter in 
> {{MatchingFacetSetCounts}} as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta commented on issue #102: Missing accounts (committers)

2022-08-01 Thread GitBox


mocobeta commented on issue #102:
URL: 
https://github.com/apache/lucene-jira-archive/issues/102#issuecomment-1201093145

   I think I found Stefan Matheis's GH account - https://github.com/steffkes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] tang-hi commented on pull request #1016: LUCENE-10646: Add some comment on LevenshteinAutomata

2022-08-01 Thread GitBox


tang-hi commented on PR #1016:
URL: https://github.com/apache/lucene/pull/1016#issuecomment-1201077186

   > > @mikemccand I have added the dead state back to method `toDot.`
   > > I will first record all state need to be checked. Then bfs the smallest 
state which hasn't been checked until we have met all state .
   > > So, I didn't add a check to find whether the number of states emitted is 
equal to the number of states in the incoming automaton ,because when the loop 
has finished, they will be the same.
   > 
   > Great, thank you!
   > 
   > Can you post the resulting `toDot` / Graphviz image of this dead-state 
infested Lev1 automaton?
   
   there are graphvizs of Lev1 with transponse ,input is  "abcdedg"
   ## **use bfs**
   
![bfs](https://user-images.githubusercontent.com/72755185/182137598-5359da67-2b84-432a-8984-84811bcb46f1.svg)
   
   ## **not use bfs**
   
![non-bfs](https://user-images.githubusercontent.com/72755185/182137780-979f1304-dfce-48be-9cf4-6c5680235a98.svg)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta commented on issue #104: Should we regenerate another full export?

2022-08-01 Thread GitBox


mocobeta commented on issue #104:
URL: 
https://github.com/apache/lucene-jira-archive/issues/104#issuecomment-1201026632

   Yes - I plan another full import. I don't think we need to walk through the 
complete migration steps written in #7 again, but at least we can pick the most 
critical parts - conversion and import.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mikemccand commented on pull request #103: Add ohrphaned jira usernames

2022-08-01 Thread GitBox


mikemccand commented on PR #103:
URL: 
https://github.com/apache/lucene-jira-archive/pull/103#issuecomment-1201019643

   > > I didn't commit this scratchy code
   > 
   > Oh no! You should commit scratchy code! Progress not perfection. It's an 
awesome start, and future people struggling with Jira -> GitHub migration, 
might want to handle such orphan'd cases too.
   
   OK, I'm trying to smooth a bit of its scratchiness and I'll commit!  It 
makes it easier for me to iterate on this orphan'd usernames.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mikemccand opened a new issue, #104: Should we regenerate another full export?

2022-08-01 Thread GitBox


mikemccand opened a new issue, #104:
URL: https://github.com/apache/lucene-jira-archive/issues/104

   I know this is a hassle and adds delay, but, we have fixed a number of 
issues since the last full test export.
   
   And e.g. @msokolov's first comment on this current iteration is just such an 
example.
   
   If we take another iteration we can bring fresh eyes on the latest changes?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mikemccand commented on pull request #103: Add ohrphaned jira usernames

2022-08-01 Thread GitBox


mikemccand commented on PR #103:
URL: 
https://github.com/apache/lucene-jira-archive/pull/103#issuecomment-1200996491

   > I didn't commit this scratchy code
   
   Oh no!  You should commit scratchy code!  Progress not perfection.  It's an 
awesome start, and future people struggling with Jira -> GitHub migration, 
might want to handle such orphan'd cases too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mikemccand merged pull request #103: Add ohrphaned jira usernames

2022-08-01 Thread GitBox


mikemccand merged PR #103:
URL: https://github.com/apache/lucene-jira-archive/pull/103


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta commented on issue #96: Some user references don't convert?

2022-08-01 Thread GitBox


mocobeta commented on issue #96:
URL: 
https://github.com/apache/lucene-jira-archive/issues/96#issuecomment-1200987407

   @mikemccand Could you take a look at #103?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta opened a new pull request, #103: Add ohrphaned jira usernames

2022-08-01 Thread GitBox


mocobeta opened a new pull request, #103:
URL: https://github.com/apache/lucene-jira-archive/pull/103

   #96 
   
   `mappings-data/ohphan_jira_ids.txt` lists the "orphaned" Jira usernames that 
are obsolete usernames (i.e. unknown Jira users) appearing in issue 
descriptions or comments (`[~username]`) 
   I also added several mappings to the "verified" account mapping file; I 
don't find "new" accounts, but they will work as aliases.
   
   These orphaned usernames are detected by this script. (I didn't commit this 
scratchy code).
   ```python
   from operator import itemgetter
   from pathlib import Path
   import json
   import re
   import itertools
   from collections import defaultdict
   
   from common import JIRA_DUMP_DIRNAME, MAPPINGS_DATA_DIRNAME, 
JIRA_USERS_FILENAME, read_jira_users_map
   from jira_util import REGEX_MENION_TILDE, extract_description, 
extract_comments
   
   dump_dir = Path(__file__).resolve().parent.parent.joinpath(JIRA_DUMP_DIRNAME)
   mappings_dir = 
Path(__file__).resolve().parent.parent.joinpath(MAPPINGS_DATA_DIRNAME)
   jira_users_file = mappings_dir.joinpath(JIRA_USERS_FILENAME)
   jira_users = read_jira_users_map(jira_users_file) if 
jira_users_file.exists() else {}
   
   
   def extract_tilde_mentions(text):
   mentions = re.findall(REGEX_MENION_TILDE, text)
   mentions = set(filter(lambda x: x != '', 
itertools.chain.from_iterable(mentions)))
   mentions = [x[2:-1] for x in mentions]
   return mentions
   
   
   orphan_ids = defaultdict(int)
   for dump_file in dump_dir.glob("LUCENE-*.json"):
   mentions = set([])
   with open(dump_file) as fp:
   o = json.load(fp)
   description = extract_description(o)
   mentions.update(extract_tilde_mentions(description))
   comments = extract_comments(o)
   for (_, _, comment, _, _, _) in comments:
   mentions.update(extract_tilde_mentions(comment))
   for m in mentions:
   if m not in jira_users:
   orphan_ids[m] += 1
   
   orphan_ids = sorted(orphan_ids.items(), key=itemgetter(1), reverse=True)
   for id, count in orphan_ids:
   print(f'{id}\t{count}')
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on pull request #1016: LUCENE-10646: Add some comment on LevenshteinAutomata

2022-08-01 Thread GitBox


mikemccand commented on PR #1016:
URL: https://github.com/apache/lucene/pull/1016#issuecomment-1200969287

   > @mikemccand I have added the dead state back to method `toDot.`
   > I will first record all state need to be checked. Then bfs the smallest 
state which hasn't been checked until we have met all state .
   > So, I didn't add a check to find whether the number of states emitted is 
equal to the number of states in the incoming automaton ,because when the loop 
has finished, they will be the same.
   
   Great, thank you!
   
   Can you post the resulting `toDot` / Graphviz image of this dead-state 
infested Lev1 automaton?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] luyuncheng commented on pull request #987: LUCENE-10627: Using ByteBuffersDataInput reduce memory copy on compressing data

2022-08-01 Thread GitBox


luyuncheng commented on PR #987:
URL: https://github.com/apache/lucene/pull/987#issuecomment-1200964321

   > Thanks for running these tests, let's remove `readNBytes` and switch to a 
shared byte[] that we call `DataInput#readBytes` on instead.
   
   @jpountz Thanks a lot for you reviewing for the code. at 
https://github.com/apache/lucene/pull/987/commits/c0d31d3134653bf6009b798925fa2350e5f7ec9c
 i removed readNBytes and using shared byte[] buffer in 
`DeflateWithPresetDictCompressionMode`. 
   
   I try to do more test about why `readNBytes` with a lot of memory copy, i 
think this is relevant to the:
   1. `DeflateWithPresetCompressingCodec` chunk size: `1 << 18`
   2. `LZ4WithPresetDictCompressionMode` NUM_SUB_BLOCKS: `10` 
   3. `ByteBuffersDataOutput` BITS_PER_BLOCK
   
   The `chunk size` is larger than `block size` which made 
`ByteBuffersDataInput` fragment always copy datas


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mikemccand commented on issue #96: Some user references don't convert?

2022-08-01 Thread GitBox


mikemccand commented on issue #96:
URL: 
https://github.com/apache/lucene-jira-archive/issues/96#issuecomment-1200960362

   Thanks @mocobeta -- can we commit this to the main branch?  I can try to add 
mappings for the orphan accounts, and then iterate on the still unmapped cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta commented on issue #96: Some user references don't convert?

2022-08-01 Thread GitBox


mocobeta commented on issue #96:
URL: 
https://github.com/apache/lucene-jira-archive/issues/96#issuecomment-1200938019

   I committed a script that counts "orphaned" jira usernames in [this 
branch](https://github.com/apache/lucene-jira-archive/tree/orphan-mentions).
   
   The list is here - usernames are sorted by # of issues where the username 
appears.
   
https://github.com/apache/lucene-jira-archive/blob/orphan-mentions/migration/work/ohphan_jira_ids.txt
   
   For example, `thetaphi` is the most frequent and `steve_rowe` is the second.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] tang-hi commented on pull request #1016: LUCENE-10646: Add some comment on LevenshteinAutomata

2022-08-01 Thread GitBox


tang-hi commented on PR #1016:
URL: https://github.com/apache/lucene/pull/1016#issuecomment-1200904098

   > When the build fails with checksum failures, it'll display the task used 
to regenerate, for example:
   > 
   > > Checksums mismatch for derived resources; you might have modified a 
generated resource (regenerate task: utilGenPacked):
   > 
   > So you'd regenerate both the files and their checksums with: gradlew 
utilGenPacked
   > 
   > All projects also have a 'regenerate' alias which includes regeneration of 
everything but this may or may not work for you (some regeneration tasks 
require bigger hardware or external resources).
   
   thanks,I got it!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10671) Lucene

2022-08-01 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573638#comment-17573638
 ] 

Dawid Weiss commented on LUCENE-10671:
--

Spammer.

> Lucene
> --
>
> Key: LUCENE-10671
> URL: https://issues.apache.org/jira/browse/LUCENE-10671
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/hnsw
>Affects Versions: 8.11.2
>Reporter: allnewcracksoftwares
>Priority: Minor
>
> [link title|http://example.com]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10671) Lucene

2022-08-01 Thread Dawid Weiss (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-10671:
-
Environment: (was: 
https://allnewcracksoftwares.com/avast-secure-line-vpn-crack-download-with-key-latest-version/)

> Lucene
> --
>
> Key: LUCENE-10671
> URL: https://issues.apache.org/jira/browse/LUCENE-10671
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/hnsw
>Affects Versions: 8.11.2
>Reporter: allnewcracksoftwares
>Priority: Minor
>
> [link title|http://example.com]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Closed] (LUCENE-10671) Lucene

2022-08-01 Thread Dawid Weiss (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss closed LUCENE-10671.


> Lucene
> --
>
> Key: LUCENE-10671
> URL: https://issues.apache.org/jira/browse/LUCENE-10671
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/hnsw
>Affects Versions: 8.11.2
> Environment: 
> https://allnewcracksoftwares.com/avast-secure-line-vpn-crack-download-with-key-latest-version/
>Reporter: allnewcracksoftwares
>Priority: Minor
>
> [link title|http://example.com]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10671) Lucene

2022-08-01 Thread Dawid Weiss (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-10671.
--
Resolution: Invalid

> Lucene
> --
>
> Key: LUCENE-10671
> URL: https://issues.apache.org/jira/browse/LUCENE-10671
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/hnsw
>Affects Versions: 8.11.2
> Environment: 
> https://allnewcracksoftwares.com/avast-secure-line-vpn-crack-download-with-key-latest-version/
>Reporter: allnewcracksoftwares
>Priority: Minor
>
> [link title|http://example.com]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #1016: LUCENE-10646: Add some comment on LevenshteinAutomata

2022-08-01 Thread GitBox


dweiss commented on PR #1016:
URL: https://github.com/apache/lucene/pull/1016#issuecomment-1200899156

   When the build fails with checksum failures, it'll display the task used to 
regenerate, for example:
   
   > Checksums mismatch for derived resources; you might have modified a 
generated resource (regenerate task: utilGenPacked):
   
   So you'd regenerate both the files and their checksums with: gradlew 
utilGenPacked
   
   All projects also have a 'regenerate' alias which includes regeneration of 
everything but this may or may not work for you (some regeneration tasks 
require bigger hardware or external resources).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org