[jira] [Commented] (SOLR-11127) Add a Collections API command to migrate the .system collection schema from Trie-based (pre-7.0) to Points-based (7.0+)
[ https://issues.apache.org/jira/browse/SOLR-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841666#comment-16841666 ] David Smiley commented on SOLR-11127: - You've been working on some cool and useful issues [~ab] – kudos! I want to mention a couple suggestions to improve this feature in the future: * Making the source collection read-only might be inconvenient or infeasible for some apps. As an option, a best-effort attempt would be useful. Even if some changes don't make it, the client may already have a means of detecting data that needs to be resent, such as using a strategy involving looking at the highest timestamp. Or it may simply not matter, like for an experiment on the target. * IMO {{batchSize}} would have been a more appropriate name for {{rows}} param, which as it stands appears to be something that limits the reindexing to just this number of documents. After all, you used the same param name that we are all intimately familiar with for /select uses. I see this use of "rows" was in turn used by topic() but that's the same issue there. Ah well; many users won't touch this any way. Also I suggest re-titling this issue to reflect your commit message – "REINDEXCOLLECTION command for re-indexing of existing collections.", not the original goal. > Add a Collections API command to migrate the .system collection schema from > Trie-based (pre-7.0) to Points-based (7.0+) > --- > > Key: SOLR-11127 > URL: https://issues.apache.org/jira/browse/SOLR-11127 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Steve Rowe >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: numeric-tries-to-points > Fix For: 8.1, master (9.0) > > Attachments: SOLR-11127.patch, SOLR-11127.patch, SOLR-11127.patch, > SOLR-11127.patch > > > SOLR-9 will switch the Trie fieldtypes in the .system collection's schema > to Points. > Users with pre-7.0 .system collections will no longer be able to use them > once Trie fields have been removed (8.0). > Solr should provide a Collections API command MIGRATESYSTEMCOLLECTION to > automatically convert a Trie-based .system collection to a Points-based one. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11127) Add a Collections API command to migrate the .system collection schema from Trie-based (pre-7.0) to Points-based (7.0+)
[ https://issues.apache.org/jira/browse/SOLR-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16796080#comment-16796080 ] ASF subversion and git services commented on SOLR-11127: Commit b778417054e735cf323139a43e84d6262ce9dcd7 in lucene-solr's branch refs/heads/branch_8x from Andrzej Bialecki [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b778417 ] SOLR-11127: REINDEXCOLLECTION command for re-indexing of existing collections. > Add a Collections API command to migrate the .system collection schema from > Trie-based (pre-7.0) to Points-based (7.0+) > --- > > Key: SOLR-11127 > URL: https://issues.apache.org/jira/browse/SOLR-11127 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Steve Rowe >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: numeric-tries-to-points > Fix For: 8.1, master (9.0) > > Attachments: SOLR-11127.patch, SOLR-11127.patch, SOLR-11127.patch, > SOLR-11127.patch > > > SOLR-9 will switch the Trie fieldtypes in the .system collection's schema > to Points. > Users with pre-7.0 .system collections will no longer be able to use them > once Trie fields have been removed (8.0). > Solr should provide a Collections API command MIGRATESYSTEMCOLLECTION to > automatically convert a Trie-based .system collection to a Points-based one. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11127) Add a Collections API command to migrate the .system collection schema from Trie-based (pre-7.0) to Points-based (7.0+)
[ https://issues.apache.org/jira/browse/SOLR-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16796048#comment-16796048 ] ASF subversion and git services commented on SOLR-11127: Commit 6f2b7bf5c0144f19572b54eed4fc340c13cf8c2a in lucene-solr's branch refs/heads/master from Andrzej Bialecki [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6f2b7bf ] SOLR-11127: REINDEXCOLLECTION command for re-indexing of existing collections. > Add a Collections API command to migrate the .system collection schema from > Trie-based (pre-7.0) to Points-based (7.0+) > --- > > Key: SOLR-11127 > URL: https://issues.apache.org/jira/browse/SOLR-11127 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Steve Rowe >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: numeric-tries-to-points > Fix For: 8.1, master (9.0) > > Attachments: SOLR-11127.patch, SOLR-11127.patch, SOLR-11127.patch, > SOLR-11127.patch > > > SOLR-9 will switch the Trie fieldtypes in the .system collection's schema > to Points. > Users with pre-7.0 .system collections will no longer be able to use them > once Trie fields have been removed (8.0). > Solr should provide a Collections API command MIGRATESYSTEMCOLLECTION to > automatically convert a Trie-based .system collection to a Points-based one. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11127) Add a Collections API command to migrate the .system collection schema from Trie-based (pre-7.0) to Points-based (7.0+)
[ https://issues.apache.org/jira/browse/SOLR-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16795401#comment-16795401 ] Andrzej Bialecki commented on SOLR-11127: -- The latest patch. This includes a {{.system}} compatibility check that is performed on {{Overseer}} leader startup. This verification only logs a warning about the potentially incompatible index data, providing details of schema fields that are likely incompatible. This should provide sufficient information for users to decide whether to re-index the collection. If there are no objections I'd like to commit this shortly. > Add a Collections API command to migrate the .system collection schema from > Trie-based (pre-7.0) to Points-based (7.0+) > --- > > Key: SOLR-11127 > URL: https://issues.apache.org/jira/browse/SOLR-11127 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Steve Rowe >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: numeric-tries-to-points > Fix For: 8.1, master (9.0) > > Attachments: SOLR-11127.patch, SOLR-11127.patch, SOLR-11127.patch, > SOLR-11127.patch > > > SOLR-9 will switch the Trie fieldtypes in the .system collection's schema > to Points. > Users with pre-7.0 .system collections will no longer be able to use them > once Trie fields have been removed (8.0). > Solr should provide a Collections API command MIGRATESYSTEMCOLLECTION to > automatically convert a Trie-based .system collection to a Points-based one. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11127) Add a Collections API command to migrate the .system collection schema from Trie-based (pre-7.0) to Points-based (7.0+)
[ https://issues.apache.org/jira/browse/SOLR-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16792125#comment-16792125 ] Andrzej Bialecki commented on SOLR-11127: -- Another update - support for checking status and progress of reindexing, RefGuide documentation. > Add a Collections API command to migrate the .system collection schema from > Trie-based (pre-7.0) to Points-based (7.0+) > --- > > Key: SOLR-11127 > URL: https://issues.apache.org/jira/browse/SOLR-11127 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Steve Rowe >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: numeric-tries-to-points > Fix For: 8.0 > > Attachments: SOLR-11127.patch, SOLR-11127.patch, SOLR-11127.patch > > > SOLR-9 will switch the Trie fieldtypes in the .system collection's schema > to Points. > Users with pre-7.0 .system collections will no longer be able to use them > once Trie fields have been removed (8.0). > Solr should provide a Collections API command MIGRATESYSTEMCOLLECTION to > automatically convert a Trie-based .system collection to a Points-based one. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11127) Add a Collections API command to migrate the .system collection schema from Trie-based (pre-7.0) to Points-based (7.0+)
[ https://issues.apache.org/jira/browse/SOLR-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16790861#comment-16790861 ] Andrzej Bialecki commented on SOLR-11127: -- Updated patch, with a lot more internal error checking and additional unit tests. I think this is fairly complete in functionality, more documentation to follow soon. > Add a Collections API command to migrate the .system collection schema from > Trie-based (pre-7.0) to Points-based (7.0+) > --- > > Key: SOLR-11127 > URL: https://issues.apache.org/jira/browse/SOLR-11127 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Steve Rowe >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: numeric-tries-to-points > Fix For: 8.0 > > Attachments: SOLR-11127.patch, SOLR-11127.patch > > > SOLR-9 will switch the Trie fieldtypes in the .system collection's schema > to Points. > Users with pre-7.0 .system collections will no longer be able to use them > once Trie fields have been removed (8.0). > Solr should provide a Collections API command MIGRATESYSTEMCOLLECTION to > automatically convert a Trie-based .system collection to a Points-based one. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11127) Add a Collections API command to migrate the .system collection schema from Trie-based (pre-7.0) to Points-based (7.0+)
[ https://issues.apache.org/jira/browse/SOLR-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780903#comment-16780903 ] Andrzej Bialecki commented on SOLR-11127: -- This patch implements a REINDEX_COLLECTION command (NOTE: it depends on the changes in SOLR-13271). It uses the procedure described above. A daemon streaming expression is used for copying documents between collections. The new command supports reindexing any collection, with the usual caveats about potential data loss, and it supports the following: * different or the same source and target collection name (by using aliases, as described above) * most collection CREATE parameters are supported too, which allows re-shaping the collection (eg. changing the number of shards, the router, etc) Comments and review very appreciated! > Add a Collections API command to migrate the .system collection schema from > Trie-based (pre-7.0) to Points-based (7.0+) > --- > > Key: SOLR-11127 > URL: https://issues.apache.org/jira/browse/SOLR-11127 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Steve Rowe >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: numeric-tries-to-points > Fix For: 8.0 > > Attachments: SOLR-11127.patch > > > SOLR-9 will switch the Trie fieldtypes in the .system collection's schema > to Points. > Users with pre-7.0 .system collections will no longer be able to use them > once Trie fields have been removed (8.0). > Solr should provide a Collections API command MIGRATESYSTEMCOLLECTION to > automatically convert a Trie-based .system collection to a Points-based one. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11127) Add a Collections API command to migrate the .system collection schema from Trie-based (pre-7.0) to Points-based (7.0+)
[ https://issues.apache.org/jira/browse/SOLR-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777262#comment-16777262 ] Andrzej Bialecki commented on SOLR-11127: -- Implementing a read-only mode for a collection allows us to use a better solution to this problem: * create a new unique collection using the new schema, eg. {{.reindex__}} * put the source collection in read-only mode. This entails: ** blocking new updates, ** issuing a hard commit ** closing the IndexWriter to make sure there aren't any ongoing background merges. * copy all documents from source to the new collection * create an alias pointing from the source name to the new collection. The new collection is already in read-write mode by default, and this operation is atomic. * optionally delete the original source In this scenario we never lose the ability to search the source collection, at the cost of losing the ability to process updates during the reindexing. BTW. this scenario is applicable to basically any collection, not just the {{.system}}, with the usual caveats about potentially losing the data from document fields that can't be retrieved from the source collection. > Add a Collections API command to migrate the .system collection schema from > Trie-based (pre-7.0) to Points-based (7.0+) > --- > > Key: SOLR-11127 > URL: https://issues.apache.org/jira/browse/SOLR-11127 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Steve Rowe >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: numeric-tries-to-points > Fix For: 8.0 > > > SOLR-9 will switch the Trie fieldtypes in the .system collection's schema > to Points. > Users with pre-7.0 .system collections will no longer be able to use them > once Trie fields have been removed (8.0). > Solr should provide a Collections API command MIGRATESYSTEMCOLLECTION to > automatically convert a Trie-based .system collection to a Points-based one. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11127) Add a Collections API command to migrate the .system collection schema from Trie-based (pre-7.0) to Points-based (7.0+)
[ https://issues.apache.org/jira/browse/SOLR-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755022#comment-16755022 ] Jan Høydahl commented on SOLR-11127: How to handle the two time gaps when .system will return 0 hits during copying? Let's say we add a config option to configure {{BlobHandler}} and {{UpdateRequestHandler}} into R/O mode (readOnly=true) where update requests return HTTP 503 Service Unavailable. Then we could start by setting .system in R/O and then safely copy back and forth and move alias only when copy is complete, then at the end set .system back to readOnly=false and RELOAD .system collection to get back to normal operation. Don't know how much work that would be, sounds doable. > Add a Collections API command to migrate the .system collection schema from > Trie-based (pre-7.0) to Points-based (7.0+) > --- > > Key: SOLR-11127 > URL: https://issues.apache.org/jira/browse/SOLR-11127 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Steve Rowe >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: numeric-tries-to-points > Fix For: 8.0 > > > SOLR-9 will switch the Trie fieldtypes in the .system collection's schema > to Points. > Users with pre-7.0 .system collections will no longer be able to use them > once Trie fields have been removed (8.0). > Solr should provide a Collections API command MIGRATESYSTEMCOLLECTION to > automatically convert a Trie-based .system collection to a Points-based one. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11127) Add a Collections API command to migrate the .system collection schema from Trie-based (pre-7.0) to Points-based (7.0+)
[ https://issues.apache.org/jira/browse/SOLR-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755007#comment-16755007 ] Andrzej Bialecki commented on SOLR-11127: -- My plan of attack is to implement a collection command that orchestrates the following steps: * create a temporary collection with a unique name, eg. {{tmpCollection_123}}, using the updated {{.system}} schema * define an alias that points {{.system -> tmpCollection_123}}. This should redirect all updates and queries to the temp collection. * copy the documents from {{.system}} to the temp collection, avoiding overwriting updated docs (incremental updates won't work during this process, but AFAIK no Solr component uses incremental updates when indexing to {{.system}}) * delete the original {{.system}} and create it again using the updated schema. * remove the alias * copy over the documents from temporary collection to {{.system}}, again avoiding overwrites. The collection command will take care of async processing, resuming the operation on Overseer restarts, etc. Comments and feedback are welcome. (Also, given that the 8.0 release is imminent I'm not sure I can fix this in time for the 8.0 release.) > Add a Collections API command to migrate the .system collection schema from > Trie-based (pre-7.0) to Points-based (7.0+) > --- > > Key: SOLR-11127 > URL: https://issues.apache.org/jira/browse/SOLR-11127 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Steve Rowe >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: numeric-tries-to-points > Fix For: 8.0 > > > SOLR-9 will switch the Trie fieldtypes in the .system collection's schema > to Points. > Users with pre-7.0 .system collections will no longer be able to use them > once Trie fields have been removed (8.0). > Solr should provide a Collections API command MIGRATESYSTEMCOLLECTION to > automatically convert a Trie-based .system collection to a Points-based one. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11127) Add a Collections API command to migrate the .system collection schema from Trie-based (pre-7.0) to Points-based (7.0+)
[ https://issues.apache.org/jira/browse/SOLR-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733325#comment-16733325 ] Jan Høydahl commented on SOLR-11127: Anyone planning to look into this for 8.0? > Add a Collections API command to migrate the .system collection schema from > Trie-based (pre-7.0) to Points-based (7.0+) > --- > > Key: SOLR-11127 > URL: https://issues.apache.org/jira/browse/SOLR-11127 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Steve Rowe >Priority: Blocker > Labels: numeric-tries-to-points > Fix For: master (8.0) > > > SOLR-9 will switch the Trie fieldtypes in the .system collection's schema > to Points. > Users with pre-7.0 .system collections will no longer be able to use them > once Trie fields have been removed (8.0). > Solr should provide a Collections API command MIGRATESYSTEMCOLLECTION to > automatically convert a Trie-based .system collection to a Points-based one. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11127) Add a Collections API command to migrate the .system collection schema from Trie-based (pre-7.0) to Points-based (7.0+)
[ https://issues.apache.org/jira/browse/SOLR-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510799#comment-16510799 ] Jan Høydahl commented on SOLR-11127: Perhaps also that there should be a check on system startup in version 7.x which logs an ERROR log line if the system collection is not converted so people are alerted of the need before it's too late? That could be a new Jira issue for 7.5? > Add a Collections API command to migrate the .system collection schema from > Trie-based (pre-7.0) to Points-based (7.0+) > --- > > Key: SOLR-11127 > URL: https://issues.apache.org/jira/browse/SOLR-11127 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Steve Rowe >Priority: Blocker > Labels: numeric-tries-to-points > Fix For: master (8.0) > > > SOLR-9 will switch the Trie fieldtypes in the .system collection's schema > to Points. > Users with pre-7.0 .system collections will no longer be able to use them > once Trie fields have been removed (8.0). > Solr should provide a Collections API command MIGRATESYSTEMCOLLECTION to > automatically convert a Trie-based .system collection to a Points-based one. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org