Re: Change column family

David Medinets Thu, 28 May 2015 04:29:32 -0700

Is there enough data volume to justify writing an iterator vs a map-reduce
job? Could you write an Apache Pig script to accomplish the same task which
would be a more easily understood solution? Iterators are great but have a
high learning curve which, to me, implies they should be used sparingly to
reduce the long-tail of O&M.


On Thu, May 28, 2015 at 1:34 AM, Josh Elser <[email protected]> wrote:

> I believe the typical case would be to set it at the scan and major
> compaction scopes for the table. This would ensure that queries for data
> would see the transformed result and, eventually, all of the data would be
> rewritten to the new schema (or you could force a major compaction and know
> definitively).
>
> Also, since it hasn't been otherwise stated, using the
> TransformingIterator is on the fringes of "normal". Your life may be much
> more simple to write a mapreduce job to rewrite your data. Implementing the
> Iterator correctly is a little obtuse (as you're noticing) and is not at
> all straightforward to debug. If it's reasonable to rewrite your data, it
> may be the easier solution IMO.
>
> madhvi wrote:
>
>> Hi All,
>>
>> If anyone has worked on tranforming iterator can tell me if the iterator
>> make tranformed changes in the accumulo table also or it returns the
>> result at the scan time only. Can u provide me details how to implement
>> its abstract methods and their use and workflow of the iterator?
>>
>> Thanks
>> Madhvi
>> On Wednesday 27 May 2015 05:38 PM, Andrew Wells wrote:
>>
>>> to implement that iterator.
>>>
>>> looks like you will only need to override replaceColumnFamily
>>>
>>> and this looks to return the new ColumnFamily via the argument. So
>>> manipulate the Text object provided.
>>>
>>> On Wed, May 27, 2015 at 8:06 AM, Andrew Wells <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>>     Looks like you want to override these methods:
>>>
>>>     |protected Key
>>>     <
>>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html
>>> >|
>>>         |*replaceColumnFamily
>>>     <
>>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/user/TransformingIterator.html#replaceColumnFamily%28org.apache.accumulo.core.data.Key,%20org.apache.hadoop.io.Text%29
>>> >*(Key
>>>     <
>>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html>
>>> originalKey,
>>>     org.apache.hadoop.io.Text newColFam)|
>>>               Make a new key with all parts (including delete flag)
>>>     coming from |originalKey| but use |newColFam| as the column family.
>>>     |protected Key
>>>     <
>>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html
>>> >|
>>>         |*replaceColumnQualifier
>>>     <
>>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/user/TransformingIterator.html#replaceColumnQualifier%28org.apache.accumulo.core.data.Key,%20org.apache.hadoop.io.Text%29
>>> >*(Key
>>>     <
>>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html>
>>> originalKey,
>>>     org.apache.hadoop.io.Text newColQual)|
>>>               Make a new key with all parts (including delete flag)
>>>     coming from |originalKey| but use |newColQual| as the column
>>>     qualifier.
>>>     |protected Key
>>>     <
>>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html
>>> >|
>>>         |*replaceColumnVisibility
>>>     <
>>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/user/TransformingIterator.html#replaceColumnVisibility%28org.apache.accumulo.core.data.Key,%20org.apache.hadoop.io.Text%29
>>> >*(Key
>>>     <
>>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html>
>>> originalKey,
>>>     org.apache.hadoop.io.Text newColVis)|
>>>               Make a new key with all parts (including delete flag)
>>>     coming from |originalKey| but use |newColVis| as the column
>>>     visibility.
>>>     |protected Key
>>>     <
>>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html
>>> >|
>>>         |*replaceKeyParts
>>>     <
>>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/user/TransformingIterator.html#replaceKeyParts%28org.apache.accumulo.core.data.Key,%20org.apache.hadoop.io.Text,%20org.apache.hadoop.io.Text%29
>>> >*(Key
>>>     <
>>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html>
>>> originalKey,
>>>     org.apache.hadoop.io.Text newColQual,
>>>     org.apache.hadoop.io.Text newColVis)|
>>>               Make a new key with a column qualifier, and column
>>>     visibility.
>>>     |protected Key
>>>     <
>>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html
>>> >|
>>>         |*replaceKeyParts
>>>     <
>>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/user/TransformingIterator.html#replaceKeyParts%28org.apache.accumulo.core.data.Key,%20org.apache.hadoop.io.Text,%20org.apache.hadoop.io.Text,%20org.apache.hadoop.io.Text%29
>>> >*(Key
>>>     <
>>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html>
>>> originalKey,
>>>     org.apache.hadoop.io.Text newColFam,
>>>     org.apache.hadoop.io.Text newColQual,
>>>     org.apache.hadoop.io.Text newColVis)|
>>>               Make a new key with a column family, column qualifier,
>>>     and column visibility.
>>>
>>>
>>>
>>>
>>>
>>>     On Wed, May 27, 2015 at 7:40 AM, shweta.agrawal
>>>     <[email protected] <mailto:[email protected]>>
>>> wrote:
>>>
>>>         Thanks for all the suggestion.
>>>
>>>         I read about TransformingIterator and started implementing
>>>         it,  I extended this class and tried to override its abstract
>>>         method. But I am not able to get where and what to write to
>>>         change column family?
>>>
>>>         So please provide your suggestions.
>>>
>>>         Thanks
>>>         Shweta
>>>
>>>
>>>
>>>         On Tuesday 26 May 2015 08:33 PM, Adam Fuchs wrote:
>>>
>>>>         This can also be done with a row-doesn't-fit-into-memory
>>>>         constraint. You won't need to hold the second column
>>>>         in-memory if your iterator tree deep copies, filters,
>>>>         transforms and merges. Exhibit A:
>>>>
>>>>         [HeapIterator-derivative]
>>>>            |_________________________
>>>>            |                         \
>>>>         [transform-graph1-to-graph2]  \
>>>>            |                           \
>>>>         [column-family-graph1][all-but-column-family-graph1]
>>>>
>>>>         With this design, you can subclass the HeapIterator, deep
>>>>         copy the source in the init method, wrap one in a custom
>>>>         transform iterator, and create a appropriate seek method.
>>>>         This is probably more on the advanced side of Accumulo
>>>>         programming, but can be done.
>>>>
>>>>         Adam
>>>>
>>>>
>>>>         On Tue, May 26, 2015 at 8:59 AM, Eric Newton
>>>>         <[email protected] <mailto:[email protected]>> wrote:
>>>>
>>>>             Short answer: no.
>>>>
>>>>             Long answer: maybe.
>>>>
>>>>             You can write an iterator which will transform:
>>>>
>>>>             row, cf1, cq, vis -> value
>>>>
>>>>             into:
>>>>
>>>>             row, cf2, cq, vis -> value
>>>>
>>>>             And if you can do this while maintaining sort order, you
>>>>             can get your new ColumnFamily transformed during scans
>>>>             and compactions.
>>>>
>>>>             But this bit about maintaining the sort order is more
>>>>             complex than it sounds.
>>>>
>>>>             If you have the following:
>>>>
>>>>             row, a, cq, vis -> value
>>>>             row, aa, cq, vis -> value
>>>>
>>>>
>>>>             And you want to transform cf "a" into cf "b":
>>>>
>>>>             row, aa, cq, vis -> value
>>>>             row, b, cq, vis -> value
>>>>
>>>>
>>>>             Your iterator needs to hold the second column in memory,
>>>>             after transforming the first column.  Tablet server
>>>>             memory for holding Key/Values is not infinite.
>>>>
>>>>             -Eric
>>>>
>>>>             On Tue, May 26, 2015 at 8:44 AM, shweta.agrawal
>>>>             <[email protected]
>>>>             <mailto:[email protected]>> wrote:
>>>>
>>>>                 Hi,
>>>>
>>>>                 I want to ask, is it possible in accumulo to change
>>>>                 the column family without changing the whole data.
>>>>
>>>>                 Suppose my column family is graph1, now i want to
>>>>                 rename this column family as graph2.
>>>>                 Is it possible?
>>>>
>>>>                 Thanks
>>>>                 Shweta
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>     --
>>>     *Andrew George Wells*
>>>     *Software Engineer*
>>>     *[email protected] <mailto:[email protected]>*
>>>
>>>
>>>
>>>
>>> --
>>> *Andrew George Wells*
>>> *Software Engineer*
>>> *[email protected] <mailto:[email protected]>*
>>>
>>>
>>

Re: Change column family

Reply via email to