Probably Flume's mailing list would be a better resource to get help about this.
SimpleHBaseEventSerializer doesn't do regex, so you can't extract your own . https://github.com/slmnhq/flume/blob/master/flume-ng-sinks/flume-ng-hbase-sink/src/main/java/org/apache/flume/sink/hbase/SimpleHbaseEventSerializer.java#L40 I'd say you should go for RegexHbaseEventRowKeySerializer. On Fri, 14 Feb 2020 at 13:27, Mich Talebzadeh <[email protected]> wrote: > Thanks Pedro, > > As I understand it tries a default rowkey as follows: > > Row keys are default + UUID_like_string > : > defaultfb7cb953-8598-466e-a1c0-277e2863b249 > > But I send rowkey value as well > > *f2d7174e-6299-49a7-9e87-0d66c248e66b* > {"rowkey":"f2d7174e-6299-49a7-9e87-0d66c248e66b","ticker":"BP", > "timeissued":"2020-02-14T08:54:13", "price":573.25} > > But it is still generates its own rowkey. --> > defaultfb7cb953-8598-466e-a1c0-277e2863b249 > > How can I make Hbase use the rowkey that flume sends WITHOUT generating its > own rowkey? > > Regards, > > Mich > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Fri, 14 Feb 2020 at 12:27, Pedro Boado <[email protected]> wrote: > > > If what you're looking after is not achievable by extracting fields > through > > regex (it looks like it should) and you are after full control over > what's > > written to HBase you're probably looking at writing your own serializer. > > > > On Fri, 14 Feb 2020 at 11:05, Mich Talebzadeh <[email protected] > > > > wrote: > > > > > Hi, > > > > > > I have an Hbase table 'trading:MARKETDATAHBASEBATCH' > > > > > > Kafka delivers topic rows into flume. > > > > > > This is a typical json row > > > > > > f2d7174e-6299-49a7-9e87-0d66c248e66b > > > {"rowkey":"f2d7174e-6299-49a7-9e87-0d66c248e66b","ticker":"BP", > > > "timeissued":"2020-02-14T08:54:13", "price":573.25} > > > > > > The rowkey is UUID > > > > > > The json.conf for Flume is as follows: > > > > > > # Describing/Configuring the sink > > > JsonAgent.channels.hdfs-channel-1.type = memory > > > JsonAgent.channels.hdfs-channel-1.capacity = 300 > > > JsonAgent.channels.hdfs-channel-1.transactionCapacity = 100 > > > JsonAgent.sinks.Hbase-sink.type = org.apache.flume.sink.hbase.HBaseSink > > > JsonAgent.sinks.Hbase-sink.channel =hdfs-channel-1 > > > JsonAgent.sinks.Hbase-sink.table =trading:MARKETDATAHBASEBATCH > > > JsonAgent.sinks.Hbase-sink.columnFamily=PRICE_INFO > > > JsonAgent.sinks.Hbase-sink.serializer > > > =org.apache.flume.sink.hbase.SimpleHbaseEventSerializer > > > ##JsonAgent.sinks.Hbase-sink.serializer.regex =(.+),(.+),(.+),(.+) > > > agent1.sinks.sink1.serializer.regex > > > =[a-zA-Z0-9]*^C[a-zA-Z0-9]*^C[a-zA-Z0-9]* > > > JsonAgent.sinks.Hbase-sink.serializer.rowKeyIndex = ROW_KEY > > > JsonAgent.sinks.Hbase-sink.serializer.colNames > > > =ROW_KEY,ticker,timeissued,price > > > JsonAgent.sinks.Hbase-sink.serializer.regexIgnoreCase = true > > > JsonAgent.sinks.Hbase-sink.batchSize =100 > > > > > > The problem is that the rows are inserted as follows > > > > > > defaultff90d8d3-d8c5-40ff-bc37-6ee1544988c1 > > > column=PRICE_INFO:pCol, timestamp=1581670394182, > > > value={"rowkey":"a7464cf4-42a1-40b8-a597-a41fbc3b847f","ticker":"MRW", > > > "timeissued":"2020-02-14T09:03:46", "price":317.13} > > > > > > So it creates a default rowkey value > > > "defaultff90d8d3-d8c5-40ff-bc37-6ee1544988c1" followed by json values > all > > > in value column > > > > > > Ideally I would like something similar to below: > > > > > > hbase(main):085:0> put 'trading:MARKETDATAHBASEBATCH', > > > "8b97d3b9-e87b-4f21-9879-b43c4dcccb37", 'PRICE_INFO:rowkey', > > > "8b97d3b9-e87b-4f21-9879-b43c4dcccb37" > > > hbase(main):086:0> put 'trading:MARKETDATAHBASEBATCH', > > > "8b97d3b9-e87b-4f21-9879-b43c4dcccb37", 'PRICE_INFO:ticker', "ORCL" > > > hbase(main):087:0> put 'trading:MARKETDATAHBASEBATCH', > > > "8b97d3b9-e87b-4f21-9879-b43c4dcccb37", 'PRICE_INFO:timeissued', > > > "2020-02-14T09:57:32" > > > hbase(main):001:0> put 'trading:MARKETDATAHBASEBATCH', > > > "8b97d3b9-e87b-4f21-9879-b43c4dcccb37", 'PRICE_INFO:price' ,22.12 > > > hbase(main):002:0> get 'trading:MARKETDATAHBASEBATCH', > > > "8b97d3b9-e87b-4f21-9879-b43c4dcccb37" > > > COLUMN CELL > > > PRICE_INFO:price > > > timestamp=1581676221846, value=22.12 > > > PRICE_INFO:rowkey > > > timestamp=1581675986932, value=8b97d3b9-e87b-4f21-9879-b43c4dcccb37 > > > PRICE_INFO:ticker > > > timestamp=1581676103443, value=ORCL > > > PRICE_INFO:timeissued > > > timestamp=1581676168656, value=2020-02-14T09:57:32 > > > > > > Any advice would be appreciated. > > > > > > Thanks, > > > > > > Mich > > > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for > any > > > loss, damage or destruction of data or any other property which may > arise > > > from relying on this email's technical content is explicitly > disclaimed. > > > The author will in no case be liable for any monetary damages arising > > from > > > such loss, damage or destruction. > > > > > > > > > -- > > Un saludo. > > Pedro Boado. > > > -- Un saludo. Pedro Boado.
