Re: Delete data from stored documents

2014-11-08 Thread Anurag Sharma
Since the data already existing and need is to remove unwanted fields using
a custom update processor looks less useful here. Erick's
recommendation on re-indexing
into a new collection if at all possible looks simple and safe.



On Sat, Nov 8, 2014 at 12:44 AM, Erick Erickson erickerick...@gmail.com
wrote:

 bq: My question is if I can delete the field definition from the
 schema.xml and do an optimize and the fields “magically” disappears

 no. schema.xml is really just about regularizing how Lucene indexes
 things. Lucene (where this would have to take place) doesn't have any
 understanding of schema.xml, so changing it then optimizing (and
 optimizing is also a Lucene function) won't have any effect.

 If you
 1 change the schema
 and
 2 update documents
 the data will be purged as background merges happen.

 But really, I'd recommend re-indexing into a new collection if at all
 possible.


 Best,
 Erick

 On Fri, Nov 7, 2014 at 4:26 AM, Yago Riveiro yago.rive...@gmail.com
 wrote:
  Jack,
 
 
 
 
  I have some data indexed that I don’t need any more. My question is if I
 can delete the field definition from the schema.xml and do an optimize and
 the fields “magically” disappears (and free space from disk).
 
 
 
 
  Re-index data to delete fields is to expensive in collections with
 hundreds of millions of documents.
 
 
 
 
  Optimize operation seems to be a good place to shrink to documents ...
 
 
 
  —
  /Yago Riveiro
 
  On Fri, Nov 7, 2014 at 12:19 PM, Jack Krupansky j...@basetechnology.com
 
  wrote:
 
  Could you clarify exactly what you are trying to do, like with an
 example? I
  mean, how exactly are you determining what fields are unwanted? Are
 you
  simply asking whether fields can be deleted from the index (and schema)?
  -- Jack Krupansky
  -Original Message-
  From: yriveiro
  Sent: Thursday, November 6, 2014 9:19 AM
  To: solr-user@lucene.apache.org
  Subject: Delete data from stored documents
  Hi,
  It's possible remove store data of an index deleting the unwanted fields
  from schema.xml and after do an optimize over the index?
  Thanks,
  /yago
  -
  Best regards
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html
  Sent from the Solr - User mailing list archive at Nabble.com.



Re: Delete data from stored documents

2014-11-08 Thread Jack Krupansky
Agreed, but I think it would be great if Lucene and Solr provided an API to 
delete a single field for the entire index. We could file a Jira, but can 
Lucene accommodate it? Maybe we'll just have to wait for Elasticsearch to 
implement this feature!


-- Jack Krupansky

-Original Message- 
From: Anurag Sharma

Sent: Saturday, November 8, 2014 6:46 AM
To: solr-user@lucene.apache.org
Subject: Re: Delete data from stored documents

Since the data already existing and need is to remove unwanted fields using
a custom update processor looks less useful here. Erick's
recommendation on re-indexing
into a new collection if at all possible looks simple and safe.



On Sat, Nov 8, 2014 at 12:44 AM, Erick Erickson erickerick...@gmail.com
wrote:


bq: My question is if I can delete the field definition from the
schema.xml and do an optimize and the fields “magically” disappears

no. schema.xml is really just about regularizing how Lucene indexes
things. Lucene (where this would have to take place) doesn't have any
understanding of schema.xml, so changing it then optimizing (and
optimizing is also a Lucene function) won't have any effect.

If you
1 change the schema
and
2 update documents
the data will be purged as background merges happen.

But really, I'd recommend re-indexing into a new collection if at all
possible.


Best,
Erick

On Fri, Nov 7, 2014 at 4:26 AM, Yago Riveiro yago.rive...@gmail.com
wrote:
 Jack,




 I have some data indexed that I don’t need any more. My question is if I
can delete the field definition from the schema.xml and do an optimize and
the fields “magically” disappears (and free space from disk).




 Re-index data to delete fields is to expensive in collections with
hundreds of millions of documents.




 Optimize operation seems to be a good place to shrink to documents ...



 —
 /Yago Riveiro

 On Fri, Nov 7, 2014 at 12:19 PM, Jack Krupansky j...@basetechnology.com

 wrote:

 Could you clarify exactly what you are trying to do, like with an
example? I
 mean, how exactly are you determining what fields are unwanted? Are
you
 simply asking whether fields can be deleted from the index (and 
 schema)?

 -- Jack Krupansky
 -Original Message-
 From: yriveiro
 Sent: Thursday, November 6, 2014 9:19 AM
 To: solr-user@lucene.apache.org
 Subject: Delete data from stored documents
 Hi,
 It's possible remove store data of an index deleting the unwanted 
 fields

 from schema.xml and after do an optimize over the index?
 Thanks,
 /yago
 -
 Best regards
 --
 View this message in context:

http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html
 Sent from the Solr - User mailing list archive at Nabble.com.





Re: Delete data from stored documents

2014-11-07 Thread andrey prokopenko
With out of the box functionality, no. You have to develop custom
UpdateProcessor and add it to the updateprocessors chain.

On Thu, Nov 6, 2014 at 3:19 PM, yriveiro yago.rive...@gmail.com wrote:

 Hi,

 It's possible remove store data of an index deleting the unwanted fields
 from schema.xml and after do an optimize over the index?

 Thanks,

 /yago



 -
 Best regards
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Delete data from stored documents

2014-11-07 Thread Yago Riveiro
Andrey


Can you point me to any tutorial or howto where I can see how develop custom 
UpdateProcessor class?


—
/Yago Riveiro

On Fri, Nov 7, 2014 at 10:39 AM, andrey prokopenko andrey4...@gmail.com
wrote:

 With out of the box functionality, no. You have to develop custom
 UpdateProcessor and add it to the updateprocessors chain.
 On Thu, Nov 6, 2014 at 3:19 PM, yriveiro yago.rive...@gmail.com wrote:
 Hi,

 It's possible remove store data of an index deleting the unwanted fields
 from schema.xml and after do an optimize over the index?

 Thanks,

 /yago



 -
 Best regards
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Delete data from stored documents

2014-11-07 Thread andrey prokopenko
Take a look over here: https://wiki.apache.org/solr/UpdateRequestProcessor
Full list of updateprocessors for 4.10 version can  be found here:
http://lucene.apache.org/solr/4_10_2/solr-core/org/apache/solr/update/processor/UpdateRequestProcessorFactory.html
You may pick up the most suitable for you as a template and make a custom
version, tailored to your needs.

On Fri, Nov 7, 2014 at 12:21 PM, Yago Riveiro yago.rive...@gmail.com
wrote:

 Andrey


 Can you point me to any tutorial or howto where I can see how develop
 custom UpdateProcessor class?


 —
 /Yago Riveiro

 On Fri, Nov 7, 2014 at 10:39 AM, andrey prokopenko andrey4...@gmail.com
 wrote:

  With out of the box functionality, no. You have to develop custom
  UpdateProcessor and add it to the updateprocessors chain.
  On Thu, Nov 6, 2014 at 3:19 PM, yriveiro yago.rive...@gmail.com wrote:
  Hi,
 
  It's possible remove store data of an index deleting the unwanted fields
  from schema.xml and after do an optimize over the index?
 
  Thanks,
 
  /yago
 
 
 
  -
  Best regards
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 



Re: Delete data from stored documents

2014-11-07 Thread Jack Krupansky
Could you clarify exactly what you are trying to do, like with an example? I 
mean, how exactly are you determining what fields are unwanted? Are you 
simply asking whether fields can be deleted from the index (and schema)?


-- Jack Krupansky

-Original Message- 
From: yriveiro

Sent: Thursday, November 6, 2014 9:19 AM
To: solr-user@lucene.apache.org
Subject: Delete data from stored documents

Hi,

It's possible remove store data of an index deleting the unwanted fields
from schema.xml and after do an optimize over the index?

Thanks,

/yago



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Delete data from stored documents

2014-11-07 Thread Yago Riveiro
Jack, 




I have some data indexed that I don’t need any more. My question is if I can 
delete the field definition from the schema.xml and do an optimize and the 
fields “magically” disappears (and free space from disk).




Re-index data to delete fields is to expensive in collections with hundreds of 
millions of documents.




Optimize operation seems to be a good place to shrink to documents ...



—
/Yago Riveiro

On Fri, Nov 7, 2014 at 12:19 PM, Jack Krupansky j...@basetechnology.com
wrote:

 Could you clarify exactly what you are trying to do, like with an example? I 
 mean, how exactly are you determining what fields are unwanted? Are you 
 simply asking whether fields can be deleted from the index (and schema)?
 -- Jack Krupansky
 -Original Message- 
 From: yriveiro
 Sent: Thursday, November 6, 2014 9:19 AM
 To: solr-user@lucene.apache.org
 Subject: Delete data from stored documents
 Hi,
 It's possible remove store data of an index deleting the unwanted fields
 from schema.xml and after do an optimize over the index?
 Thanks,
 /yago
 -
 Best regards
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html
 Sent from the Solr - User mailing list archive at Nabble.com. 

Re: Delete data from stored documents

2014-11-07 Thread Alexandre Rafalovitch
On 7 November 2014 06:57, andrey prokopenko andrey4...@gmail.com wrote:
 Full list of updateprocessors for 4.10 version can  be found here:
 http://lucene.apache.org/solr/4_10_2/solr-core/org/apache/solr/update/processor/UpdateRequestProcessorFactory.html

Actually, that's just the top level of the inheritance hierarchy and
you need to realize that lots of interesting URPs are hiding lower
down. Hence: http://www.solr-start.com/info/update-request-processors/

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


Re: Delete data from stored documents

2014-11-07 Thread Erick Erickson
bq: My question is if I can delete the field definition from the
schema.xml and do an optimize and the fields “magically” disappears

no. schema.xml is really just about regularizing how Lucene indexes
things. Lucene (where this would have to take place) doesn't have any
understanding of schema.xml, so changing it then optimizing (and
optimizing is also a Lucene function) won't have any effect.

If you
1 change the schema
and
2 update documents
the data will be purged as background merges happen.

But really, I'd recommend re-indexing into a new collection if at all possible.


Best,
Erick

On Fri, Nov 7, 2014 at 4:26 AM, Yago Riveiro yago.rive...@gmail.com wrote:
 Jack,




 I have some data indexed that I don’t need any more. My question is if I can 
 delete the field definition from the schema.xml and do an optimize and the 
 fields “magically” disappears (and free space from disk).




 Re-index data to delete fields is to expensive in collections with hundreds 
 of millions of documents.




 Optimize operation seems to be a good place to shrink to documents ...



 —
 /Yago Riveiro

 On Fri, Nov 7, 2014 at 12:19 PM, Jack Krupansky j...@basetechnology.com
 wrote:

 Could you clarify exactly what you are trying to do, like with an example? I
 mean, how exactly are you determining what fields are unwanted? Are you
 simply asking whether fields can be deleted from the index (and schema)?
 -- Jack Krupansky
 -Original Message-
 From: yriveiro
 Sent: Thursday, November 6, 2014 9:19 AM
 To: solr-user@lucene.apache.org
 Subject: Delete data from stored documents
 Hi,
 It's possible remove store data of an index deleting the unwanted fields
 from schema.xml and after do an optimize over the index?
 Thanks,
 /yago
 -
 Best regards
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Delete data from stored documents

2014-11-06 Thread yriveiro
Hi,

It's possible remove store data of an index deleting the unwanted fields
from schema.xml and after do an optimize over the index?

Thanks,

/yago



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Delete data from stored documents

2014-11-06 Thread Mikhail Khludnev
nope.

On Thu, Nov 6, 2014 at 5:19 PM, yriveiro yago.rive...@gmail.com wrote:

 Hi,

 It's possible remove store data of an index deleting the unwanted fields
 from schema.xml and after do an optimize over the index?

 Thanks,

 /yago



 -
 Best regards
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com