Probably Flume's mailing list would be a better resource to get help about
this.

SimpleHBaseEventSerializer doesn't do regex, so you can't extract your own
.
https://github.com/slmnhq/flume/blob/master/flume-ng-sinks/flume-ng-hbase-sink/src/main/java/org/apache/flume/sink/hbase/SimpleHbaseEventSerializer.java#L40

I'd say you should go for RegexHbaseEventRowKeySerializer.



On Fri, 14 Feb 2020 at 13:27, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Thanks Pedro,
>
> As I understand it tries a default rowkey as follows:
>
> Row keys are default + UUID_like_string
> :
>  defaultfb7cb953-8598-466e-a1c0-277e2863b249
>
> But I send rowkey value as well
>
> *f2d7174e-6299-49a7-9e87-0d66c248e66b*
> {"rowkey":"f2d7174e-6299-49a7-9e87-0d66c248e66b","ticker":"BP",
> "timeissued":"2020-02-14T08:54:13", "price":573.25}
>
> But it is still generates its own rowkey. -->
> defaultfb7cb953-8598-466e-a1c0-277e2863b249
>
> How can I make Hbase use the rowkey that flume sends WITHOUT generating its
> own rowkey?
>
> Regards,
>
> Mich
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 14 Feb 2020 at 12:27, Pedro Boado <pedro.bo...@gmail.com> wrote:
>
> > If what you're looking after is not achievable by extracting fields
> through
> > regex (it looks like it should) and you are after full control over
> what's
> > written to HBase you're probably looking at writing your own serializer.
> >
> > On Fri, 14 Feb 2020 at 11:05, Mich Talebzadeh <mich.talebza...@gmail.com
> >
> > wrote:
> >
> > > Hi,
> > >
> > > I have an Hbase table 'trading:MARKETDATAHBASEBATCH'
> > >
> > > Kafka delivers topic rows into flume.
> > >
> > > This is a typical json row
> > >
> > > f2d7174e-6299-49a7-9e87-0d66c248e66b
> > > {"rowkey":"f2d7174e-6299-49a7-9e87-0d66c248e66b","ticker":"BP",
> > > "timeissued":"2020-02-14T08:54:13", "price":573.25}
> > >
> > > The rowkey is UUID
> > >
> > > The json.conf for Flume is as follows:
> > >
> > > # Describing/Configuring the sink
> > > JsonAgent.channels.hdfs-channel-1.type = memory
> > > JsonAgent.channels.hdfs-channel-1.capacity = 300
> > > JsonAgent.channels.hdfs-channel-1.transactionCapacity = 100
> > > JsonAgent.sinks.Hbase-sink.type = org.apache.flume.sink.hbase.HBaseSink
> > > JsonAgent.sinks.Hbase-sink.channel =hdfs-channel-1
> > > JsonAgent.sinks.Hbase-sink.table =trading:MARKETDATAHBASEBATCH
> > > JsonAgent.sinks.Hbase-sink.columnFamily=PRICE_INFO
> > > JsonAgent.sinks.Hbase-sink.serializer
> > > =org.apache.flume.sink.hbase.SimpleHbaseEventSerializer
> > > ##JsonAgent.sinks.Hbase-sink.serializer.regex =(.+),(.+),(.+),(.+)
> > > agent1.sinks.sink1.serializer.regex
> > > =[a-zA-Z0-9]*^C[a-zA-Z0-9]*^C[a-zA-Z0-9]*
> > > JsonAgent.sinks.Hbase-sink.serializer.rowKeyIndex  = ROW_KEY
> > > JsonAgent.sinks.Hbase-sink.serializer.colNames
> > > =ROW_KEY,ticker,timeissued,price
> > > JsonAgent.sinks.Hbase-sink.serializer.regexIgnoreCase = true
> > > JsonAgent.sinks.Hbase-sink.batchSize =100
> > >
> > > The problem is that the rows are inserted as follows
> > >
> > > defaultff90d8d3-d8c5-40ff-bc37-6ee1544988c1
> > > column=PRICE_INFO:pCol, timestamp=1581670394182,
> > > value={"rowkey":"a7464cf4-42a1-40b8-a597-a41fbc3b847f","ticker":"MRW",
> > > "timeissued":"2020-02-14T09:03:46", "price":317.13}
> > >
> > > So it creates a default rowkey value
> > > "defaultff90d8d3-d8c5-40ff-bc37-6ee1544988c1" followed by json values
> all
> > > in value column
> > >
> > > Ideally I would like something similar to below:
> > >
> > > hbase(main):085:0> put 'trading:MARKETDATAHBASEBATCH',
> > > "8b97d3b9-e87b-4f21-9879-b43c4dcccb37", 'PRICE_INFO:rowkey',
> > > "8b97d3b9-e87b-4f21-9879-b43c4dcccb37"
> > > hbase(main):086:0> put 'trading:MARKETDATAHBASEBATCH',
> > > "8b97d3b9-e87b-4f21-9879-b43c4dcccb37", 'PRICE_INFO:ticker', "ORCL"
> > > hbase(main):087:0> put 'trading:MARKETDATAHBASEBATCH',
> > > "8b97d3b9-e87b-4f21-9879-b43c4dcccb37", 'PRICE_INFO:timeissued',
> > > "2020-02-14T09:57:32"
> > > hbase(main):001:0> put 'trading:MARKETDATAHBASEBATCH',
> > > "8b97d3b9-e87b-4f21-9879-b43c4dcccb37", 'PRICE_INFO:price' ,22.12
> > > hbase(main):002:0> get 'trading:MARKETDATAHBASEBATCH',
> > > "8b97d3b9-e87b-4f21-9879-b43c4dcccb37"
> > > COLUMN                                                         CELL
> > >  PRICE_INFO:price
> > > timestamp=1581676221846, value=22.12
> > >  PRICE_INFO:rowkey
> > > timestamp=1581675986932, value=8b97d3b9-e87b-4f21-9879-b43c4dcccb37
> > >  PRICE_INFO:ticker
> > > timestamp=1581676103443, value=ORCL
> > >  PRICE_INFO:timeissued
> > > timestamp=1581676168656, value=2020-02-14T09:57:32
> > >
> > > Any advice would be appreciated.
> > >
> > > Thanks,
> > >
> > > Mich
> > >
> > > *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any
> > > loss, damage or destruction of data or any other property which may
> arise
> > > from relying on this email's technical content is explicitly
> disclaimed.
> > > The author will in no case be liable for any monetary damages arising
> > from
> > > such loss, damage or destruction.
> > >
> >
> >
> > --
> > Un saludo.
> > Pedro Boado.
> >
>


-- 
Un saludo.
Pedro Boado.

Reply via email to