Is there enough data volume to justify writing an iterator vs a map-reduce job? Could you write an Apache Pig script to accomplish the same task which would be a more easily understood solution? Iterators are great but have a high learning curve which, to me, implies they should be used sparingly to reduce the long-tail of O&M.
On Thu, May 28, 2015 at 1:34 AM, Josh Elser <[email protected]> wrote: > I believe the typical case would be to set it at the scan and major > compaction scopes for the table. This would ensure that queries for data > would see the transformed result and, eventually, all of the data would be > rewritten to the new schema (or you could force a major compaction and know > definitively). > > Also, since it hasn't been otherwise stated, using the > TransformingIterator is on the fringes of "normal". Your life may be much > more simple to write a mapreduce job to rewrite your data. Implementing the > Iterator correctly is a little obtuse (as you're noticing) and is not at > all straightforward to debug. If it's reasonable to rewrite your data, it > may be the easier solution IMO. > > madhvi wrote: > >> Hi All, >> >> If anyone has worked on tranforming iterator can tell me if the iterator >> make tranformed changes in the accumulo table also or it returns the >> result at the scan time only. Can u provide me details how to implement >> its abstract methods and their use and workflow of the iterator? >> >> Thanks >> Madhvi >> On Wednesday 27 May 2015 05:38 PM, Andrew Wells wrote: >> >>> to implement that iterator. >>> >>> looks like you will only need to override replaceColumnFamily >>> >>> and this looks to return the new ColumnFamily via the argument. So >>> manipulate the Text object provided. >>> >>> On Wed, May 27, 2015 at 8:06 AM, Andrew Wells <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Looks like you want to override these methods: >>> >>> |protected Key >>> < >>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html >>> >| >>> |*replaceColumnFamily >>> < >>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/user/TransformingIterator.html#replaceColumnFamily%28org.apache.accumulo.core.data.Key,%20org.apache.hadoop.io.Text%29 >>> >*(Key >>> < >>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html> >>> originalKey, >>> org.apache.hadoop.io.Text newColFam)| >>> Make a new key with all parts (including delete flag) >>> coming from |originalKey| but use |newColFam| as the column family. >>> |protected Key >>> < >>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html >>> >| >>> |*replaceColumnQualifier >>> < >>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/user/TransformingIterator.html#replaceColumnQualifier%28org.apache.accumulo.core.data.Key,%20org.apache.hadoop.io.Text%29 >>> >*(Key >>> < >>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html> >>> originalKey, >>> org.apache.hadoop.io.Text newColQual)| >>> Make a new key with all parts (including delete flag) >>> coming from |originalKey| but use |newColQual| as the column >>> qualifier. >>> |protected Key >>> < >>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html >>> >| >>> |*replaceColumnVisibility >>> < >>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/user/TransformingIterator.html#replaceColumnVisibility%28org.apache.accumulo.core.data.Key,%20org.apache.hadoop.io.Text%29 >>> >*(Key >>> < >>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html> >>> originalKey, >>> org.apache.hadoop.io.Text newColVis)| >>> Make a new key with all parts (including delete flag) >>> coming from |originalKey| but use |newColVis| as the column >>> visibility. >>> |protected Key >>> < >>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html >>> >| >>> |*replaceKeyParts >>> < >>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/user/TransformingIterator.html#replaceKeyParts%28org.apache.accumulo.core.data.Key,%20org.apache.hadoop.io.Text,%20org.apache.hadoop.io.Text%29 >>> >*(Key >>> < >>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html> >>> originalKey, >>> org.apache.hadoop.io.Text newColQual, >>> org.apache.hadoop.io.Text newColVis)| >>> Make a new key with a column qualifier, and column >>> visibility. >>> |protected Key >>> < >>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html >>> >| >>> |*replaceKeyParts >>> < >>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/user/TransformingIterator.html#replaceKeyParts%28org.apache.accumulo.core.data.Key,%20org.apache.hadoop.io.Text,%20org.apache.hadoop.io.Text,%20org.apache.hadoop.io.Text%29 >>> >*(Key >>> < >>> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html> >>> originalKey, >>> org.apache.hadoop.io.Text newColFam, >>> org.apache.hadoop.io.Text newColQual, >>> org.apache.hadoop.io.Text newColVis)| >>> Make a new key with a column family, column qualifier, >>> and column visibility. >>> >>> >>> >>> >>> >>> On Wed, May 27, 2015 at 7:40 AM, shweta.agrawal >>> <[email protected] <mailto:[email protected]>> >>> wrote: >>> >>> Thanks for all the suggestion. >>> >>> I read about TransformingIterator and started implementing >>> it, I extended this class and tried to override its abstract >>> method. But I am not able to get where and what to write to >>> change column family? >>> >>> So please provide your suggestions. >>> >>> Thanks >>> Shweta >>> >>> >>> >>> On Tuesday 26 May 2015 08:33 PM, Adam Fuchs wrote: >>> >>>> This can also be done with a row-doesn't-fit-into-memory >>>> constraint. You won't need to hold the second column >>>> in-memory if your iterator tree deep copies, filters, >>>> transforms and merges. Exhibit A: >>>> >>>> [HeapIterator-derivative] >>>> |_________________________ >>>> | \ >>>> [transform-graph1-to-graph2] \ >>>> | \ >>>> [column-family-graph1][all-but-column-family-graph1] >>>> >>>> With this design, you can subclass the HeapIterator, deep >>>> copy the source in the init method, wrap one in a custom >>>> transform iterator, and create a appropriate seek method. >>>> This is probably more on the advanced side of Accumulo >>>> programming, but can be done. >>>> >>>> Adam >>>> >>>> >>>> On Tue, May 26, 2015 at 8:59 AM, Eric Newton >>>> <[email protected] <mailto:[email protected]>> wrote: >>>> >>>> Short answer: no. >>>> >>>> Long answer: maybe. >>>> >>>> You can write an iterator which will transform: >>>> >>>> row, cf1, cq, vis -> value >>>> >>>> into: >>>> >>>> row, cf2, cq, vis -> value >>>> >>>> And if you can do this while maintaining sort order, you >>>> can get your new ColumnFamily transformed during scans >>>> and compactions. >>>> >>>> But this bit about maintaining the sort order is more >>>> complex than it sounds. >>>> >>>> If you have the following: >>>> >>>> row, a, cq, vis -> value >>>> row, aa, cq, vis -> value >>>> >>>> >>>> And you want to transform cf "a" into cf "b": >>>> >>>> row, aa, cq, vis -> value >>>> row, b, cq, vis -> value >>>> >>>> >>>> Your iterator needs to hold the second column in memory, >>>> after transforming the first column. Tablet server >>>> memory for holding Key/Values is not infinite. >>>> >>>> -Eric >>>> >>>> On Tue, May 26, 2015 at 8:44 AM, shweta.agrawal >>>> <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> Hi, >>>> >>>> I want to ask, is it possible in accumulo to change >>>> the column family without changing the whole data. >>>> >>>> Suppose my column family is graph1, now i want to >>>> rename this column family as graph2. >>>> Is it possible? >>>> >>>> Thanks >>>> Shweta >>>> >>>> >>>> >>>> >>> >>> >>> >>> -- >>> *Andrew George Wells* >>> *Software Engineer* >>> *[email protected] <mailto:[email protected]>* >>> >>> >>> >>> >>> -- >>> *Andrew George Wells* >>> *Software Engineer* >>> *[email protected] <mailto:[email protected]>* >>> >>> >>
