[jira] [Commented] (LUCENE-10616) Moving to dictionaries has made stored fields slower at skipping
[ https://issues.apache.org/jira/browse/LUCENE-10616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571217#comment-17571217 ] fang hou commented on LUCENE-10616: --- I think this pr [https://github.com/apache/lucene/pull/1003] is ready for review. As Adrien advised above, this pr changed {{decompress}} signature to return {{InputStream}} to make it able to decompress lazily. Different than returning {{STOP}} in {{{}StoredFieldVisitor#needsField{}}}(tried but found it's maybe impossible due to multiple value fields, see test case), this pr optimized skip method to be more smart to bypass unneeded compressed block by reading compressed block length. So for large unneeded field, we can save many decompression time. This applied to both {{BEST_SPEED}} mode and {{HIGH_COMPRESSION}} mode. So this pr optimized these two modes with preset dictionary. Could someone give some feedbacks? thanks cc [~jpountz] > Moving to dictionaries has made stored fields slower at skipping > > > Key: LUCENE-10616 > URL: https://issues.apache.org/jira/browse/LUCENE-10616 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand >Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > [~ywelsch] has been digging into a regression of stored fields retrieval that > is caused by LUCENE-9486. > Say your documents have two stored fields, one that is 100B and is stored > first, and the other one that is 100kB, and you are only interested in the > first one. While the idea behind blocks of stored fields is to store multiple > documents in the same block to leverage redundancy across documents, > sometimes documents are larger than the block size. As soon as documents are > larger than 2x the block size, our stored fields format splits such large > documents into multiple blocks, so that you wouldn't need to decompress > everything only to retrieve a couple small fields. > Before LUCENE-9486, BEST_SPEED had a block size of 16kB, so only retrieving > the first field value would only need to decompress 16kB of data. With the > move to preset dictionaries in LUCENE-9486 and then LUCENE-9917, we now have > blocks of 80kB, so stored fields would now need to decompress 80kB of data, > 5x more than before. > With dictionaries, our blocks are now split into 10 sub blocks. We happen to > eagerly decompress all sub blocks that intersect with the stored document, > which is why we would decompress 80kB of data, but this is an implementation > detail. It should be possible to decompress these sub blocks lazily so that > we would only decompress those that intersect with one of the field values > that the user is interested in retrieving? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] JoeHF commented on pull request #1003: LUCENE-10616: optimizing decompress when only retrieving some fields
JoeHF commented on PR #1003: URL: https://github.com/apache/lucene/pull/1003#issuecomment-1195050814 no obvious regression or perf improvement, guess there are no such cases in benchmark wikimedium10k: TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value BrowseRandomLabelTaxoFacets 569.65 (7.9%) 543.58 (15.4%) -4.6% ( -25% - 20%) 0.236 Prefix3 377.77 (9.1%) 368.32 (6.6%) -2.5% ( -16% - 14%) 0.321 AndHighMed 656.18 (8.2%) 648.39 (10.6%) -1.2% ( -18% - 19%) 0.691 MedIntervalsOrdered 574.68 (6.3%) 567.95 (9.7%) -1.2% ( -16% - 15%) 0.651 AndHighLow 978.77 (9.3%) 972.00 (8.5%) -0.7% ( -16% - 18%) 0.806 HighSpanNear 425.66 (8.3%) 423.78 (10.2%) -0.4% ( -17% - 19%) 0.880 OrHighMed 656.72 (8.2%) 655.28 (10.5%) -0.2% ( -17% - 20%) 0.942 LowIntervalsOrdered 481.42 (5.2%) 480.65 (10.3%) -0.2% ( -14% - 16%) 0.951 HighPhrase 500.26 (7.6%) 499.86 (11.4%) -0.1% ( -17% - 20%) 0.979 Respell 123.33 (11.8%) 123.48 (10.4%)0.1% ( -19% - 25%) 0.973 OrHighHigh 416.58 (6.9%) 417.19 (9.4%)0.1% ( -15% - 17%) 0.955 MedTerm 2063.41 (9.5%) 2069.51 (11.0%)0.3% ( -18% - 23%) 0.928 LowSloppyPhrase 301.12 (7.5%) 303.12 (12.6%)0.7% ( -18% - 22%) 0.840 HighTerm 1088.05 (9.8%) 1102.10 (14.8%)1.3% ( -21% - 28%) 0.745 LowPhrase 896.10 (8.4%) 907.71 (9.8%)1.3% ( -15% - 21%) 0.654 HighSloppyPhrase 309.31 (8.1%) 313.60 (10.0%)1.4% ( -15% - 21%) 0.629 Fuzzy2 42.78 (11.1%) 43.46 (12.2%)1.6% ( -19% - 27%) 0.665 Wildcard 315.36 (9.2%) 320.46 (7.7%)1.6% ( -14% - 20%) 0.548 MedSpanNear 520.33 (6.6%) 530.21 (11.6%)1.9% ( -15% - 21%) 0.524 HighIntervalsOrdered 356.49 (10.3%) 363.39 (10.1%)1.9% ( -16% - 24%) 0.547 AndHighHigh 619.32 (5.9%) 631.54 (9.5%)2.0% ( -12% - 18%) 0.432 HighTermMonthSort 1479.95 (6.0%) 1509.95 (11.1%)2.0% ( -14% - 20%) 0.472 MedSloppyPhrase 230.30 (8.6%) 235.24 (10.8%)2.1% ( -15% - 23%) 0.488 MedPhrase 567.04 (6.2%) 579.72 (11.5%)2.2% ( -14% - 21%) 0.442 BrowseRandomLabelSSDVFacets 350.13 (10.2%) 358.12 (16.8%)2.3% ( -22% - 32%) 0.604 HighTermDayOfYearSort 1087.80 (7.4%) 1118.61 (8.5%)2.8% ( -12% - 20%) 0.260 LowTerm 2557.43 (9.2%) 2636.37 (8.9%)3.1% ( -13% - 23%) 0.281 LowSpanNear 795.88 (9.0%) 828.70 (11.1%)4.1% ( -14% - 26%) 0.195 PKLookup 26.79 (16.3%) 27.91 (19.8%)4.2% ( -27% - 48%) 0.466 Fuzzy1 136.23 (9.7%) 142.21 (16.8%)4.4% ( -20% - 34%) 0.312 BrowseMonthTaxoFacets 801.97 (17.7%) 840.43 (19.1%)4.8% ( -27% - 50%) 0.410 IntNRQ 603.46 (10.0%) 636.52 (7.9%)5.5% ( -11% - 25%) 0.054 OrHighLow 532.25 (9.0%) 562.37 (13.6%)5.7% ( -15% - 31%) 0.121 BrowseMonthSSDVFacets 839.55 (20.9%) 894.35 (22.1%)6.5% ( -30% - 62%) 0.337 BrowseDayOfYearTaxoFacets 784.80 (16.4%) 839.36 (25.1%)7.0% ( -29% - 58%) 0.300 BrowseDateTaxoFacets 849.34 (17.9%) 908.75 (25.8%)7.0% ( -31% - 61%) 0.319 BrowseDayOfYearSSDVFacets 832.41 (17.6%) 907.43 (22.9%)9.0% ( -26% - 60%) 0.163 BrowseDateSSDVFacets 215.63 (21.1%) 241.38 (27.5%) 11.9% ( -30% - 76%) 0.123 wikimedium1m: TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value Respell 39.67 (11.8%) 37.57 (14.2%) -5.3% ( -27% - 23%) 0.200 BrowseDayOfYear
[GitHub] [lucene] visionarywind opened a new issue, #1048: Why lucene doc id changes after updating or merging?
visionarywind opened a new issue, #1048: URL: https://github.com/apache/lucene/issues/1048 ### Description As I know, lucene doc is a internal docId, it cannot be used as an external id. Why is it designed like this ? Could it be designed to be constant ? Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] nknize commented on pull request #1017: LUCENE-10654: Add new ShapeDocValuesField for LatLonShape and XYShape
nknize commented on PR #1017: URL: https://github.com/apache/lucene/pull/1017#issuecomment-1194649110 Thanks @iverase. > Why so much hurry with this change? ...it will be nice to have something production ready. Just a few points here. 1. I don't believe the proposed PR is a change. It's a new field that hasn't existed before despite Elasticsearch carrying it's own proprietary implementation for two years and only now proposing it as the preferred approach. I prefer a fresh Lucene focused implementation with input from Elasticsearch perspective. (e.g., geometry centroid and bounding box were added to help Elasticsearch `geo_centroid` and `geo_boundingbox` aggregations but really not needed for the docvalue format). 2. This PR is a move of progress over perfection. I do feel it's important to do the best we can on the first iteration but there will always be needed improvements. We have mechanisms in place to enable us to unleash experimental features for the purpose of receiving feedback from production. The PR is using those mechanisms as intended. No need to iterate to what _we think_ is "production ready". 3. Akin to 2 I'd prefer to avoid waiting another two months+ before the next minor release to get this in the wild and start obtaining that feedback and iterating. Those iterations will improve with more feedback. If we do feel this is cutting it close for 9.3 then I'd prefer merging this PR to main and 9.4 and iterate on this code for 9.4. 4. I am happy to hear Elasticsearch wanting to contribute their implementation but I think it's better to start with a foundation and iterate. We can merge any desirable properties from that Elasticsearch field in follow ups. I think that will strengthen the field as a whole and agree it's a great decision but do not believe it is a requirement before merging the current functioning PR. Again, progress over perfection. > this way of developing this exciting and complex feature is making things harder. Harder for who? > I would like to propose to initially focus on the data structure and once we are happy, we can start integrating the functionality, e.g support for queries and so on. Query support is already in this PR so I'm not sure what you mean by "start integrating the functionality". Regarding the desire to add a visitor access pattern that's a nice to have but not a requirement for this PR. I wrote an initial rough implementation (because I also thought it would be nice to mimic the query visitor pattern) but it's an improvement that is easier to add in a follow-up since (as this PR shows) it's not a requirement for supporting the queries in the first iteration. I agree with the bounding box improvements (which, again, could come later) and will add the centroid fix, but that is unrelated to the relation and query logic in this PR which already has parity with the BKD index queries and uses the same test scaffolding. > Here is our current implementation which can be used as a good starting point. The current PR already has a starting point so I'm not sure why the proposal to scrap and start from the Elasticsearch proprietary implementation (which could've been proposed two years ago if parity is the concern). I took a quick look at the proposed code and have some differing of opinions on the implementation: * I didn't see any explicit relation visitors; so it doesn't look like the prototype code includes bounding box relation logic or tests against any BKD index queries to ensure functional parity. * I didn't look at the details of the serialized format. That shouldn't matter so long as the API is the same and results are correct. This PR now includes a `VERSION` byte to provide a mechanism to change the format and ensure backwards compatibility. This reduces Elasticsearch risk. * The PR results here are matching the BKD index queries so I'm confident the PR query results are correct. I'll add (or someone can add) the visitor access pattern in a follow up. Again, it's not a requirement for Lucene (even if it is one for Elasticsearch). * This PR isolates all ShapeDocValues logic (e.g., Readers and Writers) as private logic to the single pkg private abstract `ShapeDocValues` and only exposes field and query instantiation through the public `LatLonShape` and `XYShape` access classes. I prefer this approach (which is consistent w/ the BKD field API) over the prototype that conversely has that split into disparate separate Reader / Writer classes with public abstractions. I think we should keep the abstraction and internal API surface area tidy and be thoughtful about what's exposed (e.g., only pkg-private `ShapeDocValues` and `ShapeDocValueField`. Visitor member class and BaseQuery foundation classes for internal extensions, and `LatLonShape` and `XYShape` static factories for public access). I'll add th
[jira] [Comment Edited] (LUCENE-10592) Should we build HNSW graph on the fly during indexing
[ https://issues.apache.org/jira/browse/LUCENE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570976#comment-17570976 ] Julie Tibshirani edited comment on LUCENE-10592 at 7/25/22 4:10 PM: It looks like this commit gave a nice boost to indexing. From your benchmark results, we expected a small improvement, but this looks even larger: !Screen Shot 2022-07-25 at 9.04.11 AM.png|width=582,height=238! was (Author: julietibs): It looks like this commit gave a nice boost to indexing. From your benchmark results, we expected a small improvement, but this looks even larger: !Screen Shot 2022-07-25 at 9.04.11 AM.png|width=540,height=221! > Should we build HNSW graph on the fly during indexing > - > > Key: LUCENE-10592 > URL: https://issues.apache.org/jira/browse/LUCENE-10592 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Mayya Sharipova >Assignee: Mayya Sharipova >Priority: Minor > Fix For: 9.4 > > Attachments: Screen Shot 2022-07-25 at 9.04.11 AM.png > > Time Spent: 8h > Remaining Estimate: 0h > > Currently, when we index vectors for KnnVectorField, we buffer those vectors > in memory and on flush during a segment construction we build an HNSW graph. > As building an HNSW graph is very expensive, this makes flush operation take > a lot of time. This also makes overall indexing performance quite > unpredictable (as the number of flushes are defined by memory used, and the > presence of concurrent searches), e.g. some indexing operations return almost > instantly while others that trigger flush take a lot of time. > Building an HNSW graph on the fly as we index vectors allows to avoid this > problem, and spread a load of HNSW graph construction evenly during indexing. > This will also supersede LUCENE-10194 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10592) Should we build HNSW graph on the fly during indexing
[ https://issues.apache.org/jira/browse/LUCENE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570976#comment-17570976 ] Julie Tibshirani edited comment on LUCENE-10592 at 7/25/22 4:10 PM: It looks like this commit gave a nice boost to indexing. From your benchmark results, we expected a small improvement, but this looks even larger: !Screen Shot 2022-07-25 at 9.04.11 AM.png|width=692,height=283! was (Author: julietibs): It looks like this commit gave a nice boost to indexing. From your benchmark results, we expected a small improvement, but this looks even larger: !Screen Shot 2022-07-25 at 9.04.11 AM.png|width=582,height=238! > Should we build HNSW graph on the fly during indexing > - > > Key: LUCENE-10592 > URL: https://issues.apache.org/jira/browse/LUCENE-10592 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Mayya Sharipova >Assignee: Mayya Sharipova >Priority: Minor > Fix For: 9.4 > > Attachments: Screen Shot 2022-07-25 at 9.04.11 AM.png > > Time Spent: 8h > Remaining Estimate: 0h > > Currently, when we index vectors for KnnVectorField, we buffer those vectors > in memory and on flush during a segment construction we build an HNSW graph. > As building an HNSW graph is very expensive, this makes flush operation take > a lot of time. This also makes overall indexing performance quite > unpredictable (as the number of flushes are defined by memory used, and the > presence of concurrent searches), e.g. some indexing operations return almost > instantly while others that trigger flush take a lot of time. > Building an HNSW graph on the fly as we index vectors allows to avoid this > problem, and spread a load of HNSW graph construction evenly during indexing. > This will also supersede LUCENE-10194 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10592) Should we build HNSW graph on the fly during indexing
[ https://issues.apache.org/jira/browse/LUCENE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570976#comment-17570976 ] Julie Tibshirani edited comment on LUCENE-10592 at 7/25/22 4:10 PM: It looks like this commit gave a nice boost to indexing. From your benchmark results, we expected a small improvement, but this looks even larger: !Screen Shot 2022-07-25 at 9.04.11 AM.png|width=540,height=221! was (Author: julietibs): It looks like this commit gave a nice boost to indexing. From your benchmark results, we expected a small improvement, but this looks even larger: !Screen Shot 2022-07-25 at 9.04.11 AM.png! > Should we build HNSW graph on the fly during indexing > - > > Key: LUCENE-10592 > URL: https://issues.apache.org/jira/browse/LUCENE-10592 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Mayya Sharipova >Assignee: Mayya Sharipova >Priority: Minor > Fix For: 9.4 > > Attachments: Screen Shot 2022-07-25 at 9.04.11 AM.png > > Time Spent: 8h > Remaining Estimate: 0h > > Currently, when we index vectors for KnnVectorField, we buffer those vectors > in memory and on flush during a segment construction we build an HNSW graph. > As building an HNSW graph is very expensive, this makes flush operation take > a lot of time. This also makes overall indexing performance quite > unpredictable (as the number of flushes are defined by memory used, and the > presence of concurrent searches), e.g. some indexing operations return almost > instantly while others that trigger flush take a lot of time. > Building an HNSW graph on the fly as we index vectors allows to avoid this > problem, and spread a load of HNSW graph construction evenly during indexing. > This will also supersede LUCENE-10194 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10592) Should we build HNSW graph on the fly during indexing
[ https://issues.apache.org/jira/browse/LUCENE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570976#comment-17570976 ] Julie Tibshirani commented on LUCENE-10592: --- It looks like this commit gave a nice boost to indexing. From your benchmark results, we expected a small improvement, but this looks even larger: !Screen Shot 2022-07-25 at 9.04.11 AM.png! > Should we build HNSW graph on the fly during indexing > - > > Key: LUCENE-10592 > URL: https://issues.apache.org/jira/browse/LUCENE-10592 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Mayya Sharipova >Assignee: Mayya Sharipova >Priority: Minor > Fix For: 9.4 > > Attachments: Screen Shot 2022-07-25 at 9.04.11 AM.png > > Time Spent: 8h > Remaining Estimate: 0h > > Currently, when we index vectors for KnnVectorField, we buffer those vectors > in memory and on flush during a segment construction we build an HNSW graph. > As building an HNSW graph is very expensive, this makes flush operation take > a lot of time. This also makes overall indexing performance quite > unpredictable (as the number of flushes are defined by memory used, and the > presence of concurrent searches), e.g. some indexing operations return almost > instantly while others that trigger flush take a lot of time. > Building an HNSW graph on the fly as we index vectors allows to avoid this > problem, and spread a load of HNSW graph construction evenly during indexing. > This will also supersede LUCENE-10194 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10592) Should we build HNSW graph on the fly during indexing
[ https://issues.apache.org/jira/browse/LUCENE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julie Tibshirani updated LUCENE-10592: -- Attachment: Screen Shot 2022-07-25 at 9.04.11 AM.png > Should we build HNSW graph on the fly during indexing > - > > Key: LUCENE-10592 > URL: https://issues.apache.org/jira/browse/LUCENE-10592 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Mayya Sharipova >Assignee: Mayya Sharipova >Priority: Minor > Fix For: 9.4 > > Attachments: Screen Shot 2022-07-25 at 9.04.11 AM.png > > Time Spent: 8h > Remaining Estimate: 0h > > Currently, when we index vectors for KnnVectorField, we buffer those vectors > in memory and on flush during a segment construction we build an HNSW graph. > As building an HNSW graph is very expensive, this makes flush operation take > a lot of time. This also makes overall indexing performance quite > unpredictable (as the number of flushes are defined by memory used, and the > presence of concurrent searches), e.g. some indexing operations return almost > instantly while others that trigger flush take a lot of time. > Building an HNSW graph on the fly as we index vectors allows to avoid this > problem, and spread a load of HNSW graph construction evenly during indexing. > This will also supersede LUCENE-10194 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta merged pull request #81: add my account into maping data
mocobeta merged PR #81: URL: https://github.com/apache/lucene-jira-archive/pull/81 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on pull request #81: add my account into maping data
mocobeta commented on PR #81: URL: https://github.com/apache/lucene-jira-archive/pull/81#issuecomment-1194161000 Thaks @tang-hi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on a diff in pull request #80: #79: include parent issue link
mikemccand commented on code in PR #80: URL: https://github.com/apache/lucene-jira-archive/pull/80#discussion_r928980448 ## migration/src/jira_util.py: ## @@ -83,6 +83,15 @@ def extract_assignee(o: dict) -> tuple[str, str]: return (name, disp_name) +def extract_parent(o: dict) -> tuple[str, str]: +parent = o["fields"].get("parent") +if parent: +key = parent["key"] +if key: +return key, f'https://issues.apache.org/jira/browse/{key}' Review Comment: Ahh good idea, will do. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on issue #29: Can/should we make Jira read-only on migration to GitHub issues?
mocobeta commented on issue #29: URL: https://github.com/apache/lucene-jira-archive/issues/29#issuecomment-1194132316 I would open two issues at the same time when we ask infra to start the migration; one for running the import script, and one for making Jira read-only. Anyway, it'll need some time to explain our plan and perform the migration. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on a diff in pull request #1017: LUCENE-10654: Add new ShapeDocValuesField for LatLonShape and XYShape
iverase commented on code in PR #1017: URL: https://github.com/apache/lucene/pull/1017#discussion_r928955670 ## lucene/core/src/java/org/apache/lucene/document/ShapeDocValues.java: ## @@ -0,0 +1,907 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.document; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Comparator; +import java.util.List; +import org.apache.lucene.document.ShapeField.DecodedTriangle.TYPE; +import org.apache.lucene.document.SpatialQuery.EncodedRectangle; +import org.apache.lucene.geo.Component2D; +import org.apache.lucene.index.PointValues.Relation; +import org.apache.lucene.search.Query; +import org.apache.lucene.store.ByteArrayDataInput; +import org.apache.lucene.store.ByteBuffersDataOutput; +import org.apache.lucene.store.DataInput; +import org.apache.lucene.util.ArrayUtil; +import org.apache.lucene.util.BytesRef; + +/** + * A binary doc values format representation for {@link LatLonShape} and {@link XYShape} + * + * Note that this class cannot be instantiated directly due to different encodings {@link + * org.apache.lucene.geo.XYEncodingUtils} and {@link org.apache.lucene.geo.GeoEncodingUtils} + * + * Concrete Implementations include: {@link LatLonShapeDocValues} and {@link XYShapeDocValues} + * + * @lucene.experimental + */ +abstract class ShapeDocValues { + /** doc value format version; used to support bwc for any encoding changes */ + protected static final byte VERSION = 0; + /** the binary doc value */ + private final BytesRef data; + /** the geometry comparator used to check relations */ + protected final ShapeComparator shapeComparator; + + /** + * Creates a {@ShapeDocValues} instance from a shape tessellation + * + * @param tessellation The tessellation (must not be null) + */ + ShapeDocValues(List tessellation) { +this.data = computeBinaryValue(tessellation); +try { + this.shapeComparator = new ShapeComparator(this.data); +} catch (IOException e) { + throw new IllegalArgumentException("unable to read binary shape doc value field. ", e); +} + } + + /** Creates a {@code ShapeDocValues} instance from a given serialized value */ + ShapeDocValues(BytesRef binaryValue) { +this.data = binaryValue; +try { + this.shapeComparator = new ShapeComparator(this.data); +} catch (IOException e) { + throw new IllegalArgumentException("unable to read binary shape doc value field. ", e); +} + } + + /** returns the encoded doc values field as a {@link BytesRef} */ + protected BytesRef binaryValue() { +return this.data; + } + + /** Returns the number of terms (tessellated triangles) for this shape */ + public int numberOfTerms() { +return shapeComparator.numberOfTerms(); + } + + /** returns the min x value for the shape's bounding box */ + public int getMinX() { +return shapeComparator.getMinX(); + } + + /** returns the min y value for the shape's bounding box */ + public int getMinY() { +return shapeComparator.getMinY(); + } + + /** returns the max x value for the shape's bounding box */ + public int getMaxX() { +return shapeComparator.getMaxX(); + } + + /** returns the max y value for the shape's bounding box */ + public int getMaxY() { +return shapeComparator.getMaxY(); + } + + /** Retrieves the x centroid location for the geometry(s) */ + public int getCentroidX() { +return shapeComparator.getCentroidX(); + } + + /** Retrieves the y centroid location for the geometry(s) */ + public int getCentroidY() { +return shapeComparator.getCentroidY(); + } + + /** + * Retrieves the highest dimensional type (POINT, LINE, TRIANGLE) for computing the geometry(s) + * centroid + */ + public TYPE getHighestDimension() { +return shapeComparator.getHighestDimension(); + } + + private BytesRef computeBinaryValue(List tessellation) { +try { + // dfs order serialization + List dfsSerialized = new ArrayList<>(tessellation.size()); + buildTree(tessellation, dfsSerialized); + Writer w = new Writer(dfsSerialized); + return w.getBytesRef(); +} catch (IOException e) { + throw new RuntimeException("Intern
[GitHub] [lucene] iverase commented on pull request #1017: LUCENE-10654: Add new ShapeDocValuesField for LatLonShape and XYShape
iverase commented on PR #1017: URL: https://github.com/apache/lucene/pull/1017#issuecomment-1194123352 Hey Nick, Why so much hurry with this change? I appreciate everything you are trying to do to signal the feature as experimental and adding the version on the doc value but still it will be nice to have something production ready from the beginning and this way of developing this exciting and complex feature is making things harder. I am very much interested in this change because Elasticsearch has its own doc value implementation. I would like to eventually move to Lucene doc values so I want to make sure that the functionalities, the Elasticsearch implementation currently has, can be preserved when / if it makes sense. In order to achieve that we would like to donate our code or help with our experience in this feature as we have been running it in production for 2+ years with success. The Elasticsearch implementation is pretty much similar to the one you are proposing, it has three parts, first we add all the information regarding the extent of the geometry, then we add centroid information and finally we have what I call the “triangle tree” which is exactly what you described in this issue. Here are the differences and how we would like to help adding them: Our geometry extent contains more information than just plain min/max values of the coordinates that you are proposing. In particular, we capture the minimum positive value and the maximum negative value for the x coordinate so we can wrap those bounding boxes around the dateline in the geo case. This would be my proposal for a Extent object that captures all that information: https://github.com/iverase/lucene/blob/TriangleTree/lucene/core/src/java/org/apache/lucene/geo/Extent.java In the case of the centroid, the biggest difference is that we are computing the centroid using the original geometry. I really like the algorithm you are proposing but you are working on the encoded space and I am wondering now if that would work for cartesian, remember that the encoding is not linear so in that case the centroids might be incorrect. I would like to discuss the benefits of the way you are encoding those values with a bit more care too. Finally the triangle tree is the same idea, it is just the serialisation of an interval tree that is composed of tessellation elements. One thing I realised while developing this feature is that it is necessary to be able to visit the tree in different ways and therefore adding a visitor pattern for the tree is a big win. Here is our current implementation which can be used as a good starting point: https://github.com/iverase/lucene/blob/TriangleTree/lucene/core/src/java/org/apache/lucene/geo/TriangleTreeReader.java Finally, I would like to propose to initially focus on the data structure and once we are happy, we can start integrating the functionality, e.g support for queries and so on. I hope a more structured approach will make sure we get the right structure. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on a diff in pull request #80: #79: include parent issue link
mocobeta commented on code in PR #80: URL: https://github.com/apache/lucene-jira-archive/pull/80#discussion_r928947072 ## migration/src/jira_util.py: ## @@ -83,6 +83,15 @@ def extract_assignee(o: dict) -> tuple[str, str]: return (name, disp_name) +def extract_parent(o: dict) -> tuple[str, str]: +parent = o["fields"].get("parent") +if parent: +key = parent["key"] +if key: +return key, f'https://issues.apache.org/jira/browse/{key}' Review Comment: I would just extract the key and make the url on-the-fly at `jir2github_import.py` (L120). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10592) Should we build HNSW graph on the fly during indexing
[ https://issues.apache.org/jira/browse/LUCENE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570929#comment-17570929 ] ASF subversion and git services commented on LUCENE-10592: -- Commit b15bcd11c333a96c043a3cc1e3498b8b09e7d6a2 in lucene's branch refs/heads/branch_9x from Mayya Sharipova [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=b15bcd11c33 ] LUCENE-10592 Strengthen TestHnswGraph::testSortedAndUnsortedIndicesReturnSameResults This test occasionally fails if knn search returns only 1 document in the index, as we have an assertion that returned doc IDs from sorted and unsorted index must be different. This patch ensures that we have many documents in the index, so that knn search always returns enough results. > Should we build HNSW graph on the fly during indexing > - > > Key: LUCENE-10592 > URL: https://issues.apache.org/jira/browse/LUCENE-10592 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Mayya Sharipova >Assignee: Mayya Sharipova >Priority: Minor > Fix For: 9.4 > > Time Spent: 8h > Remaining Estimate: 0h > > Currently, when we index vectors for KnnVectorField, we buffer those vectors > in memory and on flush during a segment construction we build an HNSW graph. > As building an HNSW graph is very expensive, this makes flush operation take > a lot of time. This also makes overall indexing performance quite > unpredictable (as the number of flushes are defined by memory used, and the > presence of concurrent searches), e.g. some indexing operations return almost > instantly while others that trigger flush take a lot of time. > Building an HNSW graph on the fly as we index vectors allows to avoid this > problem, and spread a load of HNSW graph construction evenly during indexing. > This will also supersede LUCENE-10194 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on issue #60: Invalid unicode character in conversion of comment
mikemccand commented on issue #60: URL: https://github.com/apache/lucene-jira-archive/issues/60#issuecomment-1194092727 Reminder: once we have the draft migrated test repo public, find some Jira issues that have exotic Unicode escapes/characters and confirm that migrated properly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand opened a new pull request, #80: #79: include parent issue link
mikemccand opened a new pull request, #80: URL: https://github.com/apache/lucene-jira-archive/pull/80 Render the parent link in the `Legacy Jira` section. I also removed ` details` from `Legacy Jira details` header. It seemed redundant / self-explanatory already. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand merged pull request #76: 61: map Jira priority to legacy-jira-priority, and include votes in the 'Legacy Jira Information' header when it's > 0
mikemccand merged PR #76: URL: https://github.com/apache/lucene-jira-archive/pull/76 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on issue #79: Carry parent issue over
mikemccand commented on issue #79: URL: https://github.com/apache/lucene-jira-archive/issues/79#issuecomment-1194082464 OK I have a PR; it renders LUCENE-618 opening description like this: - The last GData - Server commit does not build due to a wrong commit. Yonik did not commit all the files in the diff file. There are several sources and packages missing. The diff - file with the date of 26.06.06 should be applied. --> http://issues.apache.org/jira/browse/LUCENE-598 26.06.06.diff (644 kb) could any of the lucene committers apply this patch. Yonik is on the way to Dublin. Thanks Simon --- ### Legacy Jira [LUCENE-618](https://issues.apache.org/jira/browse/LUCENE-618) by Simon Willnauer (@s1monw) on Jun 27 2006, resolved Jun 28 2006 Parent: [LUCENE-598](https://issues.apache.org/jira/browse/LUCENE-598) Attachments: [27.06.06.diff](https://raw.githubusercontent.com/apache/lucene-jira-archive/attachments/attachments/LUCENE-618/27.06.06.diff) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10592) Should we build HNSW graph on the fly during indexing
[ https://issues.apache.org/jira/browse/LUCENE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570912#comment-17570912 ] ASF subversion and git services commented on LUCENE-10592: -- Commit 2efc204a390044b67bcfb85683d82a9ea2f852a2 in lucene's branch refs/heads/main from Mayya Sharipova [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=2efc204a390 ] LUCENE-10592 Strengthen TestHnswGraph::testSortedAndUnsortedIndicesReturnSameResults This test occasionally fails if knn search returns only 1 document in the index, as we have an assertion that returned doc IDs from sorted and unsorted index must be different. This patch ensures that we have many documents in the index, so that knn search always returns enough results. > Should we build HNSW graph on the fly during indexing > - > > Key: LUCENE-10592 > URL: https://issues.apache.org/jira/browse/LUCENE-10592 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Mayya Sharipova >Assignee: Mayya Sharipova >Priority: Minor > Fix For: 9.4 > > Time Spent: 8h > Remaining Estimate: 0h > > Currently, when we index vectors for KnnVectorField, we buffer those vectors > in memory and on flush during a segment construction we build an HNSW graph. > As building an HNSW graph is very expensive, this makes flush operation take > a lot of time. This also makes overall indexing performance quite > unpredictable (as the number of flushes are defined by memory used, and the > presence of concurrent searches), e.g. some indexing operations return almost > instantly while others that trigger flush take a lot of time. > Building an HNSW graph on the fly as we index vectors allows to avoid this > problem, and spread a load of HNSW graph construction evenly during indexing. > This will also supersede LUCENE-10194 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on issue #79: Carry parent issue over
mocobeta commented on issue #79: URL: https://github.com/apache/lucene-jira-archive/issues/79#issuecomment-1194074037 GitHub automatically adds mature links if we link from one issue to another issue, but I agree that it'd be good to explicitly mention parent issues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand opened a new issue, #79: Carry parent issue over
mikemccand opened a new issue, #79: URL: https://github.com/apache/lucene-jira-archive/issues/79 Spinoff from #61. We already carry the other direction (`sub-tasks`). It looks like 333 issues have a parent. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on issue #61: Should we carry over Jira "labels"?
mikemccand commented on issue #61: URL: https://github.com/apache/lucene-jira-archive/issues/61#issuecomment-1194063267 OK I looked at all of the fields (listed above) and I think we are done, except for this one!: > I think we really should migrate `parent` in some way, since we already migrate the reverse direction (sub-tasks)? I'll open a separate issue for this and work on a PR. I think it shouldn't be hard. I plan to just append to the `Legacy Jira Information` header. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand closed issue #61: Should we carry over Jira "labels"?
mikemccand closed issue #61: Should we carry over Jira "labels"? URL: https://github.com/apache/lucene-jira-archive/issues/61 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on issue #61: Should we carry over Jira "labels"?
mikemccand commented on issue #61: URL: https://github.com/apache/lucene-jira-archive/issues/61#issuecomment-1194061412 > > Ahh OK, hmm. But we cannot carry over these watches on behalf of all users, I assume. Users will have to re-watch the issues they care about (even the legacy Jira ones) again? > > Yes. We will add comments to each Jira issue to let all watchers know the corresponding GitHub issue URL at the last step in the migration. The watchers should notice it. +1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on issue #61: Should we carry over Jira "labels"?
mocobeta commented on issue #61: URL: https://github.com/apache/lucene-jira-archive/issues/61#issuecomment-1194043812 > Ahh OK, hmm. But we cannot carry over these watches on behalf of all users, I assume. Users will have to re-watch the issues they care about (even the legacy Jira ones) again? Yes. We will add comments to each Jira issue to let all watchers know the corresponding GitHub issue URL at the last step in the migration. The watchers should notice it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] apeteri commented on pull request #77: Update account-map.csv.20220722.verified
apeteri commented on PR #77: URL: https://github.com/apache/lucene-jira-archive/pull/77#issuecomment-1194036688 No worries! Thank you and everyone involved for handling the migration! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand opened a new issue, #78: Draft a summary of how we migrated to GitHub issues
mikemccand opened a new issue, #78: URL: https://github.com/apache/lucene-jira-archive/issues/78 Spinoff from #61. We should try to briefly summarize what we did during the migration, fields we chose to leave out, limitations (e.g. if you were not "verified" then you are not "mentioned"), etc. And maybe pointers to the tooling so other projects can maybe re-use and improve on this impressive start (thanks to @mocobeta!). This would be a nice artifact to leave for future developers / users / Googlers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta merged pull request #77: Update account-map.csv.20220722.verified
mocobeta merged PR #77: URL: https://github.com/apache/lucene-jira-archive/pull/77 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on pull request #77: Update account-map.csv.20220722.verified
mocobeta commented on PR #77: URL: https://github.com/apache/lucene-jira-archive/pull/77#issuecomment-1194030016 Thanks @apeteri for verifying it. We couldn't manually check all candidate accounts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on issue #61: Should we carry over Jira "labels"?
mikemccand commented on issue #61: URL: https://github.com/apache/lucene-jira-archive/issues/61#issuecomment-1194029781 > > You can watch at the repo level, but not on individual issues? > > You can watch/unwatch particular issues/PRs by Subscribe/Unsubscribe the issue/PR. (a button with a bell icon is placed in the right panel for that.) Ahh OK, hmm. But we cannot carry over these watches on behalf of all users, I assume. Users will have to re-watch the issues they care about (even the legacy Jira ones) again? We should note this on the "Release Notes" that we send about this migration? We should advertise the things we chose not to migrate. Hmm, do we have a draft of these release notes? I'll open a separate issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on issue #61: Should we carry over Jira "labels"?
mikemccand commented on issue #61: URL: https://github.com/apache/lucene-jira-archive/issues/61#issuecomment-1194027968 I think we really should migrate `parent` in some way, since we already migrate the reverse direction (sub-tasks)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on issue #61: Should we carry over Jira "labels"?
mocobeta commented on issue #61: URL: https://github.com/apache/lucene-jira-archive/issues/61#issuecomment-1194025672 > You can watch at the repo level, but not on individual issues? You can watch/unwatch particular issues/PRs by Subscribe/Unsubscribe the issue/PR. (a button with a bell icon is placed in the right panel for that.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] apeteri opened a new pull request, #77: Update account-map.csv.20220722.verified
apeteri opened a new pull request, #77: URL: https://github.com/apache/lucene-jira-archive/pull/77 I saw my name in the account map candidates file at this location: https://github.com/apache/lucene-jira-archive/blob/1309c3ff9b7815d660484b123b30be1789b672f4/migration/mappings-data/account-map.csv.20220722.232738#L1519 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on issue #61: Should we carry over Jira "labels"?
mocobeta commented on issue #61: URL: https://github.com/apache/lucene-jira-archive/issues/61#issuecomment-1194021106 > So maybe we leave `watches` behind (do not migrate that field to GitHub)? I agree with it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on issue #61: Should we carry over Jira "labels"?
mikemccand commented on issue #61: URL: https://github.com/apache/lucene-jira-archive/issues/61#issuecomment-1194019029 `watches` is a Jira field for people who have explicitly subscribed to updates on a Jira issue. I don't think GitHub has the same capability? You can watch at the repo level, but not on individual issues? So maybe we leave `watches` behind (do not migrate that field to GitHub)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on pull request #76: 61: map Jira priority to legacy-jira-priority, and include votes in the 'Legacy Jira Information' header when it's > 0
mocobeta commented on PR #76: URL: https://github.com/apache/lucene-jira-archive/pull/76#issuecomment-1194018160 Please feel free to merge this. I cannot run/test this right now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on issue #61: Should we carry over Jira "labels"?
mikemccand commented on issue #61: URL: https://github.com/apache/lucene-jira-archive/issues/61#issuecomment-1194005927 Woops, I failed to link the PR properly: https://github.com/apache/lucene-jira-archive/pull/76 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on issue #29: Can/should we make Jira read-only on migration to GitHub issues?
mikemccand commented on issue #29: URL: https://github.com/apache/lucene-jira-archive/issues/29#issuecomment-1194003681 Maybe we just open an INFRA issue now, but make it clear not to actually do it yet, to get it on their radar? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on issue #29: Can/should we make Jira read-only on migration to GitHub issues?
mikemccand commented on issue #29: URL: https://github.com/apache/lucene-jira-archive/issues/29#issuecomment-1194003309 Have we confirmed that INFRA is able to make the whole Jira project read-only after we are done migrating? Is it just a matter of opening a ticket and it's quick/easy? From my (limited) online digging I am not so sure it is easy ;) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand opened a new pull request, #76: 61: map Jira priority to legacy-jira-priority, and include votes in the 'Legacy Jira Information' header when it's > 0
mikemccand opened a new pull request, #76: URL: https://github.com/apache/lucene-jira-archive/pull/76 I ran on one issue that had votes and peeked at the output JSON and it looks correct! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on issue #61: Should we carry over Jira "labels"?
mikemccand commented on issue #61: URL: https://github.com/apache/lucene-jira-archive/issues/61#issuecomment-1193991508 > > Hmm how about `legacy-jira-priority`? > > I'm fine with it if it's needed. > > > If we carried over `legacy-jira-votes` how would we search it? Could we sort by it? > > I don't think issues can be sorted by labels (for now) or it's useful for filtering purposes. I would just log it in the issues' "Jira Information" section - if we want to convert it into labels in the future, we can add issue labels anytime by GitHub APIs. +1, OK I'll make a PR for both. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on a diff in pull request #75: Update account-map.csv.20220722.verified
mocobeta commented on code in PR #75: URL: https://github.com/apache/lucene-jira-archive/pull/75#discussion_r928823975 ## migration/mappings-data/account-map.csv.20220722.verified: ## @@ -169,3 +169,4 @@ mharwood,markharwood,Mark Harwood hossman,hossman,Chris M. Hostetter munendrasn,munendrasn,Munendra S N vajda,ovalhub,Andi Vajda +manish1982,manishbafna,Manish Review Comment: ```suggestion manish82,manishbafna,Manish ``` I suggest this change - @manishbafna if you are fine with it I'll commit this in. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on a diff in pull request #75: Update account-map.csv.20220722.verified
mocobeta commented on code in PR #75: URL: https://github.com/apache/lucene-jira-archive/pull/75#discussion_r928803845 ## migration/mappings-data/account-map.csv.20220722.verified: ## @@ -169,3 +169,4 @@ mharwood,markharwood,Mark Harwood hossman,hossman,Chris M. Hostetter munendrasn,munendrasn,Munendra S N vajda,ovalhub,Andi Vajda +manish1982,manishbafna,Manish Review Comment: We don't see the username `manish1982` in Lucene Jira, I wonder if you mean `manish82`? ``` manish82,Manish ``` https://issues.apache.org/jira/secure/ViewProfile.jspa?name=manish82 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on a diff in pull request #75: Update account-map.csv.20220722.verified
mocobeta commented on code in PR #75: URL: https://github.com/apache/lucene-jira-archive/pull/75#discussion_r928803845 ## migration/mappings-data/account-map.csv.20220722.verified: ## @@ -169,3 +169,4 @@ mharwood,markharwood,Mark Harwood hossman,hossman,Chris M. Hostetter munendrasn,munendrasn,Munendra S N vajda,ovalhub,Andi Vajda +manish1982,manishbafna,Manish Review Comment: We don't see the username `manish1982` in Lucene Jira, I wonder if you mean `manish82`? ``` manish82,Manish ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on issue #61: Should we carry over Jira "labels"?
mocobeta commented on issue #61: URL: https://github.com/apache/lucene-jira-archive/issues/61#issuecomment-1193953267 > Hmm how about `legacy-jira-priority`? I'm fine with it if it's needed. > If we carried over `legacy-jira-votes` how would we search it? Could we sort by it? I don't think issues can be sorted by labels (for now) or it's useful for filtering purposes. I would just log it in the issues' "Jira Information" section - if we want to convert it into labels in the future, we can add issue labels anytime by GitHub APIs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on issue #61: Should we carry over Jira "labels"?
mikemccand commented on issue #61: URL: https://github.com/apache/lucene-jira-archive/issues/61#issuecomment-1193931342 > I once considered/looked at it and decided not to carry them over to GitHub - I would do nothing for it if you are ok with that. Yeah +1. They drop even below my signal/noise threshold ;) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on issue #61: Should we carry over Jira "labels"?
mikemccand commented on issue #61: URL: https://github.com/apache/lucene-jira-archive/issues/61#issuecomment-1193930893 > I didn't know that, but I noticed we (reporters? committers?) can set an arbitrary username to Reporter fields in Jira. LOL I did not know either! Furthermore, you can change it after the fact! Maybe `creator` cannot be updated but `reporter` can? Must be for an "on behalf of" sort of situation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on issue #61: Should we carry over Jira "labels"?
mikemccand commented on issue #61: URL: https://github.com/apache/lucene-jira-archive/issues/61#issuecomment-1193930028 Does GitHub have a way to vote (+1) on issues? Also, do GitHub labels support numeric types? If we carried over `legacy-jira-votes` how would we search it? Could we sort by it? Vote distribution: ``` Votes: 0: 9802 1: 545 2: 142 3: 63 4: 25 5: 22 6: 10 8: 7 7: 6 12: 5 11: 4 9: 3 14: 2 10: 2 13: 1 22: 1 19: 1 28: 1 16: 1 36: 1 15: 1 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] manishbafna opened a new pull request, #75: Update account-map.csv.20220722.verified
manishbafna opened a new pull request, #75: URL: https://github.com/apache/lucene-jira-archive/pull/75 My account info added -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on issue #61: Should we carry over Jira "labels"?
mikemccand commented on issue #61: URL: https://github.com/apache/lucene-jira-archive/issues/61#issuecomment-1193923036 Hmm how about `legacy-jira-priority`? ``` python print_priority.py Major 6182 Minor 3540 Trivial 604 Blocker 202 Critical 117 ``` E.g. "how many times did an issue with merging block a release"? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on issue #1: Fix markup conversion error
mocobeta commented on issue #1: URL: https://github.com/apache/lucene-jira-archive/issues/1#issuecomment-1193898346 > Rather than having everyone make a PR / commit a change, I lower the barrier, maybe just allow them to reply to the email? I volunteer to go through all replies and carry them over to the mapping file. Thanks for your suggestion, I sent an e-mail to the dev list. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on issue #61: Should we carry over Jira "labels"?
mocobeta commented on issue #61: URL: https://github.com/apache/lucene-jira-archive/issues/61#issuecomment-1193881378 > > I think "Lucene Fields" has only two values - "New" and "Patch Available". > > OK let's maybe not carry those over? I once considered/looked at it and decided not to carry them over to GitHub - I would do nothing for it if you are ok with that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10557) Migrate to GitHub issue from Jira
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomoko Uchida updated LUCENE-10557: --- Reporter: Tomoko Uchida (was: Michael McCandless) > Migrate to GitHub issue from Jira > - > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > Attachments: Screen Shot 2022-06-29 at 11.02.35 AM.png, > image-2022-06-29-13-36-57-365.png, screenshot-1.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * (/) Get a consensus about the migration among committers > * (/) Choose issues that should be moved to GitHub - We'll migrate all > issues towards an atomic switch to GitHub if no major technical obstacles > show up. > ** Discussion thread > [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12] > ** -Conclusion for now: We don't migrate any issues. Only new issues should > be opened on GitHub.- > ** Write a prototype migration script - the decision could be made on that. > Things to consider: > *** version numbers - labels or milestones? > *** add a comment/ prepend a link to the source Jira issue on github side, > *** add a comment/ prepend a link on the jira side to the new issue on > github side (for people who access jira from blogs, mailing list archives and > other sources that will have stale links), > *** convert cross-issue automatic links in comments/ descriptions (as > suggested by Robert), > *** strategy to deal with sub-issues (hierarchies), > *** maybe prefix (or postfix) the issue title on github side with the > original LUCENE-XYZ key so that it is easier to search for a particular issue > there? > *** how to deal with user IDs (author, reporter, commenters)? Do they have > to be github users? Will information about people not registered on github be > lost? > *** create an extra mapping file of old-issue-new-issue URLs for any > potential future uses. > *** what to do with issue numbers in git/svn commits? These could be > rewritten but it'd change the entire git history tree - I don't think this is > practical, while doable. > * Prepare a complete migration tool > ** See https://github.com/apache/lucene-jira-archive/issues/5 > * Build the convention for issue label/milestone management > ** See [https://github.com/apache/lucene-jira-archive/issues/6] > ** Do some experiments on a sandbox repository > [https://github.com/mocobeta/sandbox-lucene-10557] > ** Make documentation for metadata (label/milestone) management > * (/) Enable Github issue on the lucene's repository > ** Raise an issue on INFRA > ** (Create an issue-only private repository for sensitive issues if it's > needed and allowed) > ** Set a mail hook to > [issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to > the general mail group name) > * Set a schedule for migration > ** See [https://github.com/apache/lucene-jira-archive/issues/7] > ** Give some time to committers to play around with issues/labels/milestones > before the actual migration > ** Make an announcement on the mail lists > ** Show some text messages when opening a new Jira issue (in issue template?) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on issue #61: Should we carry over Jira "labels"?
mocobeta commented on issue #61: URL: https://github.com/apache/lucene-jira-archive/issues/61#issuecomment-1193872039 > I wonder what is the difference between creator and reporter? Oh I see, I (mikemccand) can create an issue but list another user as reporter. Curious ;) I wonder how often that has happened. I didn't know that, but I noticed we (reporters? committers?) can set an arbitrary username to `Reporter` fields in Jira. ![Screenshot from 2022-07-25 19-25-11](https://user-images.githubusercontent.com/1825333/180756240-eac5bc0b-c181-437d-b976-9831950362dd.png) In that case, `reporter` and `creator` is set to different values. For example, I changed Reporter field in https://issues.apache.org/jira/browse/LUCENE-10557 as follows. ![Screenshot from 2022-07-25 19-29-37](https://user-images.githubusercontent.com/1825333/180756539-47db365b-ef98-434f-8018-ace90fe0e56f.png) Now, `creator` and `reporter` are different. ``` (.venv) migration $ cat jira-dump/LUCENE-10557.json | jq '.fields.creator.name' "tomoko" (.venv) migration $ cat jira-dump/LUCENE-10557.json | jq '.fields.reporter.name' "mikemccand" ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on issue #61: Should we carry over Jira "labels"?
mikemccand commented on issue #61: URL: https://github.com/apache/lucene-jira-archive/issues/61#issuecomment-1193871524 > I think "Lucene Fields" has only two values - "New" and "Patch Available". OK let's maybe not carry those over? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on issue #1: Fix markup conversion error
mikemccand commented on issue #1: URL: https://github.com/apache/lucene-jira-archive/issues/1#issuecomment-1193870585 I suggest also sending a separate email asking for people to state their GitHub id / Jira id mapping, if they are comfortable doing so? Rather than having everyone make a PR / commit a change, I lower the barrier, maybe just allow them to reply to the email? I volunteer to go through all replies and carry them over to the mapping file. I still wonder/wish we could use Apache's LDAP server behind `id.apache.org` -- it sometimes knows the GitHub id of committers. Hmm but I'm not sure if it knows the Jira id? TooManyIDsException!! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10557) Migrate to GitHub issue from Jira
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomoko Uchida updated LUCENE-10557: --- Reporter: Tomoko Uchida (was: mike Ma) > Migrate to GitHub issue from Jira > - > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > Attachments: Screen Shot 2022-06-29 at 11.02.35 AM.png, > image-2022-06-29-13-36-57-365.png, screenshot-1.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * (/) Get a consensus about the migration among committers > * (/) Choose issues that should be moved to GitHub - We'll migrate all > issues towards an atomic switch to GitHub if no major technical obstacles > show up. > ** Discussion thread > [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12] > ** -Conclusion for now: We don't migrate any issues. Only new issues should > be opened on GitHub.- > ** Write a prototype migration script - the decision could be made on that. > Things to consider: > *** version numbers - labels or milestones? > *** add a comment/ prepend a link to the source Jira issue on github side, > *** add a comment/ prepend a link on the jira side to the new issue on > github side (for people who access jira from blogs, mailing list archives and > other sources that will have stale links), > *** convert cross-issue automatic links in comments/ descriptions (as > suggested by Robert), > *** strategy to deal with sub-issues (hierarchies), > *** maybe prefix (or postfix) the issue title on github side with the > original LUCENE-XYZ key so that it is easier to search for a particular issue > there? > *** how to deal with user IDs (author, reporter, commenters)? Do they have > to be github users? Will information about people not registered on github be > lost? > *** create an extra mapping file of old-issue-new-issue URLs for any > potential future uses. > *** what to do with issue numbers in git/svn commits? These could be > rewritten but it'd change the entire git history tree - I don't think this is > practical, while doable. > * Prepare a complete migration tool > ** See https://github.com/apache/lucene-jira-archive/issues/5 > * Build the convention for issue label/milestone management > ** See [https://github.com/apache/lucene-jira-archive/issues/6] > ** Do some experiments on a sandbox repository > [https://github.com/mocobeta/sandbox-lucene-10557] > ** Make documentation for metadata (label/milestone) management > * (/) Enable Github issue on the lucene's repository > ** Raise an issue on INFRA > ** (Create an issue-only private repository for sensitive issues if it's > needed and allowed) > ** Set a mail hook to > [issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to > the general mail group name) > * Set a schedule for migration > ** See [https://github.com/apache/lucene-jira-archive/issues/7] > ** Give some time to committers to play around with issues/labels/milestones > before the actual migration > ** Make an announcement on the mail lists > ** Show some text messages when opening a new Jira issue (in issue template?) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10557) Migrate to GitHub issue from Jira
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomoko Uchida updated LUCENE-10557: --- Reporter: Michael McCandless (was: Tomoko Uchida) > Migrate to GitHub issue from Jira > - > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Michael McCandless >Assignee: Tomoko Uchida >Priority: Major > Attachments: Screen Shot 2022-06-29 at 11.02.35 AM.png, > image-2022-06-29-13-36-57-365.png, screenshot-1.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * (/) Get a consensus about the migration among committers > * (/) Choose issues that should be moved to GitHub - We'll migrate all > issues towards an atomic switch to GitHub if no major technical obstacles > show up. > ** Discussion thread > [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12] > ** -Conclusion for now: We don't migrate any issues. Only new issues should > be opened on GitHub.- > ** Write a prototype migration script - the decision could be made on that. > Things to consider: > *** version numbers - labels or milestones? > *** add a comment/ prepend a link to the source Jira issue on github side, > *** add a comment/ prepend a link on the jira side to the new issue on > github side (for people who access jira from blogs, mailing list archives and > other sources that will have stale links), > *** convert cross-issue automatic links in comments/ descriptions (as > suggested by Robert), > *** strategy to deal with sub-issues (hierarchies), > *** maybe prefix (or postfix) the issue title on github side with the > original LUCENE-XYZ key so that it is easier to search for a particular issue > there? > *** how to deal with user IDs (author, reporter, commenters)? Do they have > to be github users? Will information about people not registered on github be > lost? > *** create an extra mapping file of old-issue-new-issue URLs for any > potential future uses. > *** what to do with issue numbers in git/svn commits? These could be > rewritten but it'd change the entire git history tree - I don't think this is > practical, while doable. > * Prepare a complete migration tool > ** See https://github.com/apache/lucene-jira-archive/issues/5 > * Build the convention for issue label/milestone management > ** See [https://github.com/apache/lucene-jira-archive/issues/6] > ** Do some experiments on a sandbox repository > [https://github.com/mocobeta/sandbox-lucene-10557] > ** Make documentation for metadata (label/milestone) management > * (/) Enable Github issue on the lucene's repository > ** Raise an issue on INFRA > ** (Create an issue-only private repository for sensitive issues if it's > needed and allowed) > ** Set a mail hook to > [issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to > the general mail group name) > * Set a schedule for migration > ** See [https://github.com/apache/lucene-jira-archive/issues/7] > ** Give some time to committers to play around with issues/labels/milestones > before the actual migration > ** Make an announcement on the mail lists > ** Show some text messages when opening a new Jira issue (in issue template?) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10557) Migrate to GitHub issue from Jira
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomoko Uchida updated LUCENE-10557: --- Reporter: mike Ma (was: Tomoko Uchida) > Migrate to GitHub issue from Jira > - > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: mike Ma >Assignee: Tomoko Uchida >Priority: Major > Attachments: Screen Shot 2022-06-29 at 11.02.35 AM.png, > image-2022-06-29-13-36-57-365.png, screenshot-1.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * (/) Get a consensus about the migration among committers > * (/) Choose issues that should be moved to GitHub - We'll migrate all > issues towards an atomic switch to GitHub if no major technical obstacles > show up. > ** Discussion thread > [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12] > ** -Conclusion for now: We don't migrate any issues. Only new issues should > be opened on GitHub.- > ** Write a prototype migration script - the decision could be made on that. > Things to consider: > *** version numbers - labels or milestones? > *** add a comment/ prepend a link to the source Jira issue on github side, > *** add a comment/ prepend a link on the jira side to the new issue on > github side (for people who access jira from blogs, mailing list archives and > other sources that will have stale links), > *** convert cross-issue automatic links in comments/ descriptions (as > suggested by Robert), > *** strategy to deal with sub-issues (hierarchies), > *** maybe prefix (or postfix) the issue title on github side with the > original LUCENE-XYZ key so that it is easier to search for a particular issue > there? > *** how to deal with user IDs (author, reporter, commenters)? Do they have > to be github users? Will information about people not registered on github be > lost? > *** create an extra mapping file of old-issue-new-issue URLs for any > potential future uses. > *** what to do with issue numbers in git/svn commits? These could be > rewritten but it'd change the entire git history tree - I don't think this is > practical, while doable. > * Prepare a complete migration tool > ** See https://github.com/apache/lucene-jira-archive/issues/5 > * Build the convention for issue label/milestone management > ** See [https://github.com/apache/lucene-jira-archive/issues/6] > ** Do some experiments on a sandbox repository > [https://github.com/mocobeta/sandbox-lucene-10557] > ** Make documentation for metadata (label/milestone) management > * (/) Enable Github issue on the lucene's repository > ** Raise an issue on INFRA > ** (Create an issue-only private repository for sensitive issues if it's > needed and allowed) > ** Set a mail hook to > [issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to > the general mail group name) > * Set a schedule for migration > ** See [https://github.com/apache/lucene-jira-archive/issues/7] > ** Give some time to committers to play around with issues/labels/milestones > before the actual migration > ** Make an announcement on the mail lists > ** Show some text messages when opening a new Jira issue (in issue template?) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on issue #1: Fix markup conversion error
mikemccand commented on issue #1: URL: https://github.com/apache/lucene-jira-archive/issues/1#issuecomment-1193860507 Thanks @mocobeta -- the conversions are looking great -- I spot checked around a dozen issues yesterday. Once you have the new full migration done, I suggest sending an email to the dev list to call everyone's attention to it, and set a time box (two or three days?). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mocobeta commented on issue #1: Fix markup conversion error
mocobeta commented on issue #1: URL: https://github.com/apache/lucene-jira-archive/issues/1#issuecomment-1193852170 > With a few further small fixes, I'll run a full migration next week once again. It will hopefully be the final iteration, I'll make it publicly available to let others check/investigate the result. I started a (hopefully final) rehearsal to walk through the whole steps described in #7 with accumulated improvements in the migration scripts. Once it is finished, I'll share the test repository to manually check/look at the migration result. I don't come up with a systematic methodology for that though, we could randomly pick issues with complex markups, attachments, or various hyperlinks and then compare them to the original Jira issues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on issue #61: Should we carry over Jira "labels"?
mikemccand commented on issue #61: URL: https://github.com/apache/lucene-jira-archive/issues/61#issuecomment-1193843373 Heh, 13 times too: ``` issue LUCENE-288, fields["reporter"]["name"]='vajda', fields["creator"]["name"]='a...@osafoundation.org' issue LUCENE-7726, fields["reporter"]["name"]='hossman', fields["creator"]["name"]='uschindler' issue LUCENE-4864, fields["reporter"]["name"]='mpoindexter', fields["creator"]["name"]='uschindler' issue LUCENE-5169, fields["reporter"]["name"]='jpountz', fields["creator"]["name"]='watuki' issue LUCENE-5056, fields["reporter"]["name"]='hdeadman', fields["creator"]["name"]='dsmiley' issue LUCENE-672, fields["reporter"]["name"]='ysee...@gmail.com', fields["creator"]["name"]='ningli' issue LUCENE-1264, fields["reporter"]["name"]='bmargulies', fields["creator"]["name"]='bimargulies' issue LUCENE-346, fields["reporter"]["name"]='cutting', fields["creator"]["name"]='cutt...@apache.org' issue LUCENE-6673, fields["reporter"]["name"]='dancollins', fields["creator"]["name"]='andyetitmoves' issue LUCENE-228, fields["reporter"]["name"]='vajda', fields["creator"]["name"]='a...@osafoundation.org' issue LUCENE-289, fields["reporter"]["name"]='vajda', fields["creator"]["name"]='a...@osafoundation.org' issue LUCENE-359, fields["reporter"]["name"]='cutting', fields["creator"]["name"]='cutt...@apache.org' issue LUCENE-1, fields["reporter"]["name"]='cutting', fields["creator"]["name"]='cutt...@apache.org' ``` Looks like we use `reporter` not `creator`, perfect. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on issue #61: Should we carry over Jira "labels"?
mikemccand commented on issue #61: URL: https://github.com/apache/lucene-jira-archive/issues/61#issuecomment-1193837603 I wonder what is the difference between `creator` and `reporter`? Oh I see, I (`mikemccand`) can create an issue but list another user as `reporter`. Curious ;) I wonder how often that has happened. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-jira-archive] mikemccand commented on pull request #74: Polish a few sharp edges that hit me when running remap_cross_issue_links.py
mikemccand commented on PR #74: URL: https://github.com/apache/lucene-jira-archive/pull/74#issuecomment-1193835019 Thank you @mocobeta! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org