Re: Delete data from stored documents
Since the data already existing and need is to remove unwanted fields using a custom update processor looks less useful here. Erick's recommendation on re-indexing into a new collection if at all possible looks simple and safe. On Sat, Nov 8, 2014 at 12:44 AM, Erick Erickson erickerick...@gmail.com wrote: bq: My question is if I can delete the field definition from the schema.xml and do an optimize and the fields “magically” disappears no. schema.xml is really just about regularizing how Lucene indexes things. Lucene (where this would have to take place) doesn't have any understanding of schema.xml, so changing it then optimizing (and optimizing is also a Lucene function) won't have any effect. If you 1 change the schema and 2 update documents the data will be purged as background merges happen. But really, I'd recommend re-indexing into a new collection if at all possible. Best, Erick On Fri, Nov 7, 2014 at 4:26 AM, Yago Riveiro yago.rive...@gmail.com wrote: Jack, I have some data indexed that I don’t need any more. My question is if I can delete the field definition from the schema.xml and do an optimize and the fields “magically” disappears (and free space from disk). Re-index data to delete fields is to expensive in collections with hundreds of millions of documents. Optimize operation seems to be a good place to shrink to documents ... — /Yago Riveiro On Fri, Nov 7, 2014 at 12:19 PM, Jack Krupansky j...@basetechnology.com wrote: Could you clarify exactly what you are trying to do, like with an example? I mean, how exactly are you determining what fields are unwanted? Are you simply asking whether fields can be deleted from the index (and schema)? -- Jack Krupansky -Original Message- From: yriveiro Sent: Thursday, November 6, 2014 9:19 AM To: solr-user@lucene.apache.org Subject: Delete data from stored documents Hi, It's possible remove store data of an index deleting the unwanted fields from schema.xml and after do an optimize over the index? Thanks, /yago - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Delete data from stored documents
Agreed, but I think it would be great if Lucene and Solr provided an API to delete a single field for the entire index. We could file a Jira, but can Lucene accommodate it? Maybe we'll just have to wait for Elasticsearch to implement this feature! -- Jack Krupansky -Original Message- From: Anurag Sharma Sent: Saturday, November 8, 2014 6:46 AM To: solr-user@lucene.apache.org Subject: Re: Delete data from stored documents Since the data already existing and need is to remove unwanted fields using a custom update processor looks less useful here. Erick's recommendation on re-indexing into a new collection if at all possible looks simple and safe. On Sat, Nov 8, 2014 at 12:44 AM, Erick Erickson erickerick...@gmail.com wrote: bq: My question is if I can delete the field definition from the schema.xml and do an optimize and the fields “magically” disappears no. schema.xml is really just about regularizing how Lucene indexes things. Lucene (where this would have to take place) doesn't have any understanding of schema.xml, so changing it then optimizing (and optimizing is also a Lucene function) won't have any effect. If you 1 change the schema and 2 update documents the data will be purged as background merges happen. But really, I'd recommend re-indexing into a new collection if at all possible. Best, Erick On Fri, Nov 7, 2014 at 4:26 AM, Yago Riveiro yago.rive...@gmail.com wrote: Jack, I have some data indexed that I don’t need any more. My question is if I can delete the field definition from the schema.xml and do an optimize and the fields “magically” disappears (and free space from disk). Re-index data to delete fields is to expensive in collections with hundreds of millions of documents. Optimize operation seems to be a good place to shrink to documents ... — /Yago Riveiro On Fri, Nov 7, 2014 at 12:19 PM, Jack Krupansky j...@basetechnology.com wrote: Could you clarify exactly what you are trying to do, like with an example? I mean, how exactly are you determining what fields are unwanted? Are you simply asking whether fields can be deleted from the index (and schema)? -- Jack Krupansky -Original Message- From: yriveiro Sent: Thursday, November 6, 2014 9:19 AM To: solr-user@lucene.apache.org Subject: Delete data from stored documents Hi, It's possible remove store data of an index deleting the unwanted fields from schema.xml and after do an optimize over the index? Thanks, /yago - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Delete data from stored documents
With out of the box functionality, no. You have to develop custom UpdateProcessor and add it to the updateprocessors chain. On Thu, Nov 6, 2014 at 3:19 PM, yriveiro yago.rive...@gmail.com wrote: Hi, It's possible remove store data of an index deleting the unwanted fields from schema.xml and after do an optimize over the index? Thanks, /yago - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Delete data from stored documents
Andrey Can you point me to any tutorial or howto where I can see how develop custom UpdateProcessor class? — /Yago Riveiro On Fri, Nov 7, 2014 at 10:39 AM, andrey prokopenko andrey4...@gmail.com wrote: With out of the box functionality, no. You have to develop custom UpdateProcessor and add it to the updateprocessors chain. On Thu, Nov 6, 2014 at 3:19 PM, yriveiro yago.rive...@gmail.com wrote: Hi, It's possible remove store data of an index deleting the unwanted fields from schema.xml and after do an optimize over the index? Thanks, /yago - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Delete data from stored documents
Take a look over here: https://wiki.apache.org/solr/UpdateRequestProcessor Full list of updateprocessors for 4.10 version can be found here: http://lucene.apache.org/solr/4_10_2/solr-core/org/apache/solr/update/processor/UpdateRequestProcessorFactory.html You may pick up the most suitable for you as a template and make a custom version, tailored to your needs. On Fri, Nov 7, 2014 at 12:21 PM, Yago Riveiro yago.rive...@gmail.com wrote: Andrey Can you point me to any tutorial or howto where I can see how develop custom UpdateProcessor class? — /Yago Riveiro On Fri, Nov 7, 2014 at 10:39 AM, andrey prokopenko andrey4...@gmail.com wrote: With out of the box functionality, no. You have to develop custom UpdateProcessor and add it to the updateprocessors chain. On Thu, Nov 6, 2014 at 3:19 PM, yriveiro yago.rive...@gmail.com wrote: Hi, It's possible remove store data of an index deleting the unwanted fields from schema.xml and after do an optimize over the index? Thanks, /yago - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Delete data from stored documents
Could you clarify exactly what you are trying to do, like with an example? I mean, how exactly are you determining what fields are unwanted? Are you simply asking whether fields can be deleted from the index (and schema)? -- Jack Krupansky -Original Message- From: yriveiro Sent: Thursday, November 6, 2014 9:19 AM To: solr-user@lucene.apache.org Subject: Delete data from stored documents Hi, It's possible remove store data of an index deleting the unwanted fields from schema.xml and after do an optimize over the index? Thanks, /yago - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Delete data from stored documents
Jack, I have some data indexed that I don’t need any more. My question is if I can delete the field definition from the schema.xml and do an optimize and the fields “magically” disappears (and free space from disk). Re-index data to delete fields is to expensive in collections with hundreds of millions of documents. Optimize operation seems to be a good place to shrink to documents ... — /Yago Riveiro On Fri, Nov 7, 2014 at 12:19 PM, Jack Krupansky j...@basetechnology.com wrote: Could you clarify exactly what you are trying to do, like with an example? I mean, how exactly are you determining what fields are unwanted? Are you simply asking whether fields can be deleted from the index (and schema)? -- Jack Krupansky -Original Message- From: yriveiro Sent: Thursday, November 6, 2014 9:19 AM To: solr-user@lucene.apache.org Subject: Delete data from stored documents Hi, It's possible remove store data of an index deleting the unwanted fields from schema.xml and after do an optimize over the index? Thanks, /yago - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Delete data from stored documents
On 7 November 2014 06:57, andrey prokopenko andrey4...@gmail.com wrote: Full list of updateprocessors for 4.10 version can be found here: http://lucene.apache.org/solr/4_10_2/solr-core/org/apache/solr/update/processor/UpdateRequestProcessorFactory.html Actually, that's just the top level of the inheritance hierarchy and you need to realize that lots of interesting URPs are hiding lower down. Hence: http://www.solr-start.com/info/update-request-processors/ Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
Re: Delete data from stored documents
bq: My question is if I can delete the field definition from the schema.xml and do an optimize and the fields “magically” disappears no. schema.xml is really just about regularizing how Lucene indexes things. Lucene (where this would have to take place) doesn't have any understanding of schema.xml, so changing it then optimizing (and optimizing is also a Lucene function) won't have any effect. If you 1 change the schema and 2 update documents the data will be purged as background merges happen. But really, I'd recommend re-indexing into a new collection if at all possible. Best, Erick On Fri, Nov 7, 2014 at 4:26 AM, Yago Riveiro yago.rive...@gmail.com wrote: Jack, I have some data indexed that I don’t need any more. My question is if I can delete the field definition from the schema.xml and do an optimize and the fields “magically” disappears (and free space from disk). Re-index data to delete fields is to expensive in collections with hundreds of millions of documents. Optimize operation seems to be a good place to shrink to documents ... — /Yago Riveiro On Fri, Nov 7, 2014 at 12:19 PM, Jack Krupansky j...@basetechnology.com wrote: Could you clarify exactly what you are trying to do, like with an example? I mean, how exactly are you determining what fields are unwanted? Are you simply asking whether fields can be deleted from the index (and schema)? -- Jack Krupansky -Original Message- From: yriveiro Sent: Thursday, November 6, 2014 9:19 AM To: solr-user@lucene.apache.org Subject: Delete data from stored documents Hi, It's possible remove store data of an index deleting the unwanted fields from schema.xml and after do an optimize over the index? Thanks, /yago - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html Sent from the Solr - User mailing list archive at Nabble.com.
Delete data from stored documents
Hi, It's possible remove store data of an index deleting the unwanted fields from schema.xml and after do an optimize over the index? Thanks, /yago - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Delete data from stored documents
nope. On Thu, Nov 6, 2014 at 5:19 PM, yriveiro yago.rive...@gmail.com wrote: Hi, It's possible remove store data of an index deleting the unwanted fields from schema.xml and after do an optimize over the index? Thanks, /yago - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com