Re: Flume to Phoenix as Sink Issue
Hi Ravi Kiran, Really Thank You for your reply it worked well.I could see Apache data in phoenix. Do you have any idea when will Apache Phoenix Give support for UNION Statement. We are eagerly waiting for it .Apache Phoenix is really a good tool and very useful. Thanks a Lot !! Divya N On Mon, Dec 22, 2014 at 11:33 AM, Ravi Kiran wrote: > Hi Divya, > > Based on the logs you have shared, can you please change the following > entries > > agent.sinks.phoenix-sink.serializer.regex=^([\\d.]+) (\\S+) (\\S+) > \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(.+?)\" (\\d{3}) (\\d+) \"([^\"]+)\" > \"([^\"]+)\" > > agent.sinks.phoenix-sink.serializer.columns=host,identity,user,time,request,status,size,referer,agent > > Regarding changing the logging level , try changing the entry within > log4j.properties and give it a try. > > Regards > Ravi > > On Sat, Dec 20, 2014 at 4:34 AM, Divya Nagarajan > wrote: > >> Hi, >> >> This is my Flume Configuration File >> >> agent.sources = tail >> agent.channels = memoryChannel >> agent.sinks = loggerSink >> agent.sinks = phoenix-sink >> >> agent.sources.tail.type = exec >> agent.sources.tail.command = tail -f /var/log/httpd/access_log >> agent.sources.tail.channels = memoryChannel >> >> agent.sinks.loggerSink.channel = memoryChannel >> agent.sinks.loggerSink.type = logger >> >> agent.channels.memoryChannel.type = memory >> agent.channels.memoryChannel.capacity = 100 >> >> agent.sinks.phoenix-sink.type=org.apache.phoenix.flume.sink.PhoenixSink >> agent.sinks.phoenix-sink.channel=memoryChannel >> agent.sinks.phoenix-sink.batchSize=5 >> agent.sinks.phoenix-sink.table=S1.APACHE >> >> agent.sinks.phoenix-sink.zookeeperQuorum=nn01 >> agent.sinks.phoenix-sink.serializer=REGEX >> agent.sinks.phoenix-sink.serializer.rowkeyType=uuid >> agent.sinks.phoenix-sink.ddl=CREATE TABLE IF NOT EXISTS S1.APACHE (uid >> varchar NOT NULL,host varchar,identity varchar,user varchar,time >> varchar,method varchar,request varchar,protocol varchar,status INTEGER,size >> INTEGER,referer varchar,agent varchar,f_host varchar CONSTRAINT pk PRIMARY >> KEY (uid)) >> >> #agent.sinks.phoenix-sink.serializer.regex="([^ ]*) ([^ ]*) ([^ ]*) >> (-|\\[[^\\]]*\\]) \"([^ ]+) ([^ ]+) ([^\"]+)\" (-|[0-9]*) (-|[0-9]*)(?: ([^ >> \"]*|\"[^\"]*\") ([^ \"]$ >> #agent.sinks.phoenix-sink.serializer.regex="([^ ]*) ([^ ]*) ([^ ]*) >> (-|\\[[^\\]]*\\]) \"([^ ]+) ([^ ]+) ([^\"]+)\" (-|[0-9]*) (-|[0-9]*)(?: ([^ >> \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\"))?" >> >> >> agent.sinks.phoenix-sink.serializer.regex=([^ ]*) ([^ ]*) ([^ ]*) ([^ ]* >> [^ ]*) "([^\"]+)\" (-|[0-9]*) (-|[0-9]*) "([^ ]*)" "([^\"]+)\" >> >> agent.sinks.phoenix-sink.serializer.columns=host,identity,user,time,method,request,protocol,status,size,referer,agent >> agent.sinks.phoenix-sink.serializer.headers=f_host >> >> >> This Is my Apache log File Structure >> >> 127.0.0.1 - - [20/Dec/2014:17:11:06 +0530] "GET / HTTP/1.0" 403 4954 "-" >> "check_http/v2.0.3 (nagios-plugins 2.0.3)" >> 127.0.0.1 - - [20/Dec/2014:17:16:06 +0530] "GET / HTTP/1.0" 403 4954 "-" >> "check_http/v2.0.3 (nagios-plugins 2.0.3)" >> 127.0.0.1 - - [20/Dec/2014:17:21:06 +0530] "GET / HTTP/1.0" 403 4954 "-" >> "check_http/v2.0.3 (nagios-plugins 2.0.3)" >> 127.0.0.1 - - [20/Dec/2014:17:26:06 +0530] "GET / HTTP/1.0" 403 4954 "-" >> "check_http/v2.0.3 (nagios-plugins 2.0.3)" >> 127.0.0.1 - - [20/Dec/2014:17:31:06 +0530] "GET / HTTP/1.0" 403 4954 "-" >> "check_http/v2.0.3 (nagios-plugins 2.0.3)" >> 127.0.0.1 - - [20/Dec/2014:17:36:06 +0530] "GET / HTTP/1.0" 403 4954 "-" >> "check_http/v2.0.3 (nagios-plugins 2.0.3)" >> 127.0.0.1 - - [20/Dec/2014:17:41:06 +0530] "GET / HTTP/1.0" 403 4954 "-" >> "check_http/v2.0.3 (nagios-plugins 2.0.3)" >> 127.0.0.1 - - [20/Dec/2014:17:46:06 +0530] "GET / HTTP/1.0" 403 4954 "-" >> "check_http/v2.0.3 (nagios-plugins 2.0.3)" >> 127.0.0.1 - - [20/Dec/2014:17:51:06 +0530] "GET / HTTP/1.0" 403 4954 "-" >> "check_http/v2.0.3 (nagios-plugins 2.0.3)" >> 127.0.0.1 - - [20/Dec/2014:17:56:06 +0530] "GET / HTTP/1.0" 403 4954 "-" >> "check_http/v2.0.3 (nagios-plugins 2.0.3)" >> >> >> Iam using >> phoenix 4.2.1 >> Hbase 0.98.8 >> >> and Sorry i enable DEBUG Mode in flume . it shows only INFO as usual when >> executing this >> flume-ng agent -c conf -f /opt/flume/conf/apache.conf -n agent >> -Dflume.root.looger=DEBUG,console >> >> Thanks >> Divya N >> >> >> >> On Sat, Dec 20, 2014 at 2:14 AM, Ravi Kiran >> wrote: >> >>> Hi Divya, >>> >>>Also, can you confirm if the regex given in the configuration matches >>> the access log . To confirm , is it possible to set the logging level to >>> debug as there is debug log entry if the event doesn't match the regex >>> given in the configuration. >>> We have a test case for processing apache logs >>> https://github.com/apache/phoenix/blob/master/phoenix-flume/src/it/java/org/apache/phoenix/flume/RegexEventSerializerIT.java#testApacheLogRegex >>> which can help you with the regex >>> Happy to help!! >>> >>> Regards >>> Ravi >>> >>> On Fri, Dec
Re: Flume to Phoenix as Sink Issue
Hi Divya, Based on the logs you have shared, can you please change the following entries agent.sinks.phoenix-sink.serializer.regex=^([\\d.]+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(.+?)\" (\\d{3}) (\\d+) \"([^\"]+)\" \"([^\"]+)\" agent.sinks.phoenix-sink.serializer.columns=host,identity,user,time,request,status,size,referer,agent Regarding changing the logging level , try changing the entry within log4j.properties and give it a try. Regards Ravi On Sat, Dec 20, 2014 at 4:34 AM, Divya Nagarajan wrote: > Hi, > > This is my Flume Configuration File > > agent.sources = tail > agent.channels = memoryChannel > agent.sinks = loggerSink > agent.sinks = phoenix-sink > > agent.sources.tail.type = exec > agent.sources.tail.command = tail -f /var/log/httpd/access_log > agent.sources.tail.channels = memoryChannel > > agent.sinks.loggerSink.channel = memoryChannel > agent.sinks.loggerSink.type = logger > > agent.channels.memoryChannel.type = memory > agent.channels.memoryChannel.capacity = 100 > > agent.sinks.phoenix-sink.type=org.apache.phoenix.flume.sink.PhoenixSink > agent.sinks.phoenix-sink.channel=memoryChannel > agent.sinks.phoenix-sink.batchSize=5 > agent.sinks.phoenix-sink.table=S1.APACHE > > agent.sinks.phoenix-sink.zookeeperQuorum=nn01 > agent.sinks.phoenix-sink.serializer=REGEX > agent.sinks.phoenix-sink.serializer.rowkeyType=uuid > agent.sinks.phoenix-sink.ddl=CREATE TABLE IF NOT EXISTS S1.APACHE (uid > varchar NOT NULL,host varchar,identity varchar,user varchar,time > varchar,method varchar,request varchar,protocol varchar,status INTEGER,size > INTEGER,referer varchar,agent varchar,f_host varchar CONSTRAINT pk PRIMARY > KEY (uid)) > > #agent.sinks.phoenix-sink.serializer.regex="([^ ]*) ([^ ]*) ([^ ]*) > (-|\\[[^\\]]*\\]) \"([^ ]+) ([^ ]+) ([^\"]+)\" (-|[0-9]*) (-|[0-9]*)(?: ([^ > \"]*|\"[^\"]*\") ([^ \"]$ > #agent.sinks.phoenix-sink.serializer.regex="([^ ]*) ([^ ]*) ([^ ]*) > (-|\\[[^\\]]*\\]) \"([^ ]+) ([^ ]+) ([^\"]+)\" (-|[0-9]*) (-|[0-9]*)(?: ([^ > \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\"))?" > > > agent.sinks.phoenix-sink.serializer.regex=([^ ]*) ([^ ]*) ([^ ]*) ([^ ]* > [^ ]*) "([^\"]+)\" (-|[0-9]*) (-|[0-9]*) "([^ ]*)" "([^\"]+)\" > > agent.sinks.phoenix-sink.serializer.columns=host,identity,user,time,method,request,protocol,status,size,referer,agent > agent.sinks.phoenix-sink.serializer.headers=f_host > > > This Is my Apache log File Structure > > 127.0.0.1 - - [20/Dec/2014:17:11:06 +0530] "GET / HTTP/1.0" 403 4954 "-" > "check_http/v2.0.3 (nagios-plugins 2.0.3)" > 127.0.0.1 - - [20/Dec/2014:17:16:06 +0530] "GET / HTTP/1.0" 403 4954 "-" > "check_http/v2.0.3 (nagios-plugins 2.0.3)" > 127.0.0.1 - - [20/Dec/2014:17:21:06 +0530] "GET / HTTP/1.0" 403 4954 "-" > "check_http/v2.0.3 (nagios-plugins 2.0.3)" > 127.0.0.1 - - [20/Dec/2014:17:26:06 +0530] "GET / HTTP/1.0" 403 4954 "-" > "check_http/v2.0.3 (nagios-plugins 2.0.3)" > 127.0.0.1 - - [20/Dec/2014:17:31:06 +0530] "GET / HTTP/1.0" 403 4954 "-" > "check_http/v2.0.3 (nagios-plugins 2.0.3)" > 127.0.0.1 - - [20/Dec/2014:17:36:06 +0530] "GET / HTTP/1.0" 403 4954 "-" > "check_http/v2.0.3 (nagios-plugins 2.0.3)" > 127.0.0.1 - - [20/Dec/2014:17:41:06 +0530] "GET / HTTP/1.0" 403 4954 "-" > "check_http/v2.0.3 (nagios-plugins 2.0.3)" > 127.0.0.1 - - [20/Dec/2014:17:46:06 +0530] "GET / HTTP/1.0" 403 4954 "-" > "check_http/v2.0.3 (nagios-plugins 2.0.3)" > 127.0.0.1 - - [20/Dec/2014:17:51:06 +0530] "GET / HTTP/1.0" 403 4954 "-" > "check_http/v2.0.3 (nagios-plugins 2.0.3)" > 127.0.0.1 - - [20/Dec/2014:17:56:06 +0530] "GET / HTTP/1.0" 403 4954 "-" > "check_http/v2.0.3 (nagios-plugins 2.0.3)" > > > Iam using > phoenix 4.2.1 > Hbase 0.98.8 > > and Sorry i enable DEBUG Mode in flume . it shows only INFO as usual when > executing this > flume-ng agent -c conf -f /opt/flume/conf/apache.conf -n agent > -Dflume.root.looger=DEBUG,console > > Thanks > Divya N > > > > On Sat, Dec 20, 2014 at 2:14 AM, Ravi Kiran > wrote: > >> Hi Divya, >> >>Also, can you confirm if the regex given in the configuration matches >> the access log . To confirm , is it possible to set the logging level to >> debug as there is debug log entry if the event doesn't match the regex >> given in the configuration. >> We have a test case for processing apache logs >> https://github.com/apache/phoenix/blob/master/phoenix-flume/src/it/java/org/apache/phoenix/flume/RegexEventSerializerIT.java#testApacheLogRegex >> which can help you with the regex >> Happy to help!! >> >> Regards >> Ravi >> >> On Fri, Dec 19, 2014 at 11:19 AM, Ravi Kiran >> wrote: >>> >>> Hi Nagarajan, >>> >>> Do you see any exceptions in the logs ? Can you please give it a try >>> to ingest > 100 records and see if that works. Also, can you please share >>> the version of Phoenix you are using. >>> >>> Regards >>> Ravi >>> >>> On Thu, Dec 18, 2014 at 10:36 PM, Divya Nagarajan < >>> divya.se2...@gmail.com> wrote: H i, I tried with 5 as batchsize,still data is not upserted into phoenix.
Re: Re: What is the purpose of these system tables(CATALOG, STATS, and SEQUENCE)?
Like I said before, no it's not ok to drop system tables. If for some reason you don't want the sequence table presplit 256 ways, you can set the phoenix.sequence.saltBuckets to specify how many pre-split regions you'd like it to have (including setting it to 0). On Sun, Dec 21, 2014 at 5:38 PM, chenwenhui wrote: > Hi James, > Thanks for your reply. > The SYSTEM.SEQUENCE contains 256 regions by default, it looks like a large > number. > I ever tried to drop the table, but find that the sequence function became > no-effect. My application should not use the sequence function for ever, > are there other side-effect if dropping the SYSTEM.SEQUENCE table? > If existing other side-effect indeed, how to reduce the region number? > Thank again. > > > > > > > At 2014-12-20 15:13:33, "James Taylor" wrote: >>Hi, >>The system tables store and manage your metadata (i.e. tables, their >>columns, views, sequences, indexes, etc.). You should leave them >>alone. Phoenix manages (reads/writes) to these tables when necessary. >>Thanks, >>James >> >>On Thu, Dec 18, 2014 at 6:30 PM, chenwenhui wrote: >>> Do nobody almost care these system tables? >>> >>> > > >
Re:Re: What is the purpose of these system tables(CATALOG, STATS, and SEQUENCE)?
Hi James, Thanks for your reply. The SYSTEM.SEQUENCE contains 256 regions by default, it looks like a large number. I ever tried to drop the table, but find that the sequence function became no-effect. My application should not use the sequence function for ever, are there other side-effect if dropping the SYSTEM.SEQUENCE table? If existing other side-effect indeed, how to reduce the region number? Thank again. At 2014-12-20 15:13:33, "James Taylor" wrote: >Hi, >The system tables store and manage your metadata (i.e. tables, their >columns, views, sequences, indexes, etc.). You should leave them >alone. Phoenix manages (reads/writes) to these tables when necessary. >Thanks, >James > >On Thu, Dec 18, 2014 at 6:30 PM, chenwenhui wrote: >> Do nobody almost care these system tables? >> >>