RE: Help tuning for bursts of high traffic?

2015-12-07 Thread Riesland, Zack
Also, and somewhat related:

I’m trying to running 8 simultaneous instances of this code (on 8 separate 
input files), since I have 8 CPUs on the machine.

When I try this, I get java.lang.RuntimeException: java.lang.OutOfMemory: 
unable to create a new native thread

My phoenix connection has the “phoenix.query.threadPoolSize” set to “256”, 
which should result in 8x256 = ~2,000 threads being spawned by Phoenix.

Is that correct?

I’m running RH Linux 6, with my ulimit set to “unlimited”, so I should be able 
to handle thousands of threads.

Any ideas?

From: Andrew Purtell [mailto:andrew.purt...@gmail.com]
Sent: Friday, December 04, 2015 4:24 PM
To: user@phoenix.apache.org
Cc: Haisty, Geoffrey
Subject: Re: Help tuning for bursts of high traffic?

Any chance of stack dumps from the debug servlet? Impossible to get anywhere 
with 'pegged the CPU' otherwise. Thanks.

On Dec 4, 2015, at 12:20 PM, Riesland, Zack 
> wrote:
James,

2 quick followups, for whatever they’re worth:

1 – There is nothing phoenix-related in /tmp

2 – I added a ton of logging, and played with the properties a bit, and I think 
I see a pattern:

Watching the logging and the system profiler side-by-side, I see that, 
periodically – maybe every 60 or 90 seconds – all of my CPUs (there are 8 on 
this machine) go from mildly busy to almost totally pegged.

They USUALLY stay pegged for 5-10 seconds, and then calm down.

However, occasionally, they stay pegged for around a minute. When this happens, 
I get the very slow queries. I added logic so that when I get a very slow 
response (> 1 second), I pause for 30 seconds.

This ‘fixes’ everything, in the sense that I’m usually able to get a couple 
thousand good queries before the whole pattern repeats.

For reference, there’s nothing external that should be causing those CPU 
spikes, so I’m guessing that it’s maybe java GC (?) or perhaps something that 
the phoenix client is doing ?

Can you guess at what Phoenix might do periodically that would peg the CPUs – 
and in such a way that a query has to wait as much as 2 minutes to execute (I’m 
guessing from the pattern that it’s not actually the query that is slow, but a 
very long between when it gets queued and when it actually gets executed).

Oh and the methods you mentioned aren’t in my version of PhoenixRuntime, 
evidently. I’m on 4.2.2.something.

Thanks for any further feedback you can provide on this. Hopefully the 
conversation is helpful to the whole Phoenix community.

From: Riesland, Zack
Sent: Friday, December 04, 2015 1:36 PM
To: user@phoenix.apache.org
Cc: geoff.hai...@sensus.com
Subject: RE: Help tuning for bursts of high traffic?

Thanks, James

I'll work on gathering more information.

In the meantime, answers to a few of your questions inline below just narrow 
the scope a bit:


From: James Taylor [jamestay...@apache.org]
Sent: Friday, December 04, 2015 12:21 PM
To: user
Subject: Re: Help tuning for bursts of high traffic?
Zack,
Thanks for reporting this and for the detailed description. Here's a bunch of 
questions and some things you can try in addition to what Andrew suggested:
1) Is this reproducible in a test environment (perhaps through Pherf: 
https://phoenix.apache.org/pherf.html) so you can experiment more?
-Will check

2) Do you get a sense of whether the bottleneck is on the client or the server? 
CPU, IO, or network? How many clients are you running and have you tried 
increasing this? Do you think your network is saturated by the data being 
returned?
-I'm no expert on this. When I look at the HBase dashboard on Ambari, 
everything looks good. When I look at the stats on the machine running the java 
code, it also looks good. Certainly no bottleneck related to memory or CPU. 
Network wise, the box is on the same rack as the cluster, with 10GB switches 
everywhere, so I'd be surprised if network latency were an issue.

3) From your description, it sounds like you're querying the data as your 
ingesting. When it gets slow, have you tried running a major compaction to see 
if that helps? Perhaps queries are getting slower because of the number of 
HFiles that need to be merged.
-Rereading my original email, I see where you get that. But actually, there is 
nothing being ingested by HBase during this process. At the end of the process, 
I generate a CSV file that is then consumed and altered by Pentaho, then 
consumed by Hive, and THEN some of the Hive data is send to HBase/Phoenix. So 
this is part of the ingest process, but a precursor to the cluster Ingesting 
any data.

4) If you bounce your cluster when it gets slow, does this have any impact?
-Can check. What should I expect to happen if I restart HBase-related services 
while trying to query Phoenix? Will the query just wait until everything is 
back up? Will I get strange 

system.catalog and system.stats entries slows down bulk MR inserts by 20-25X (Phoenix 4.4)

2015-12-07 Thread Thangamani, Arun
Hello, I noticed an issue with bulk insert through map reduce in phoenix 
4.4.0.2.3.0.0-2557, using outline of the code below

Normally the inserts of about 25 million rows complete in about 5 mins, there 
are 5 region servers and the phoenix table has 32 buckets
But sometimes (maybe after major compactions or region movement?), writes 
simply slow down to 90 mins, when I truncate SYSTEM.STATS hbase table, the 
inserts get a little faster (60 mins), but when I truncate both SYSTEM.CATALOG 
& SYSTEM.STATS tables, and recreate the phoenix table def(s) the inserts go 
back to 5 mins, the workaround of truncating SYSTEM tables is not sustainable 
for long, can someone help and let me know if there is a patch available for 
this? Thanks in advance.

Job job = Job.getInstance(conf, NAME);
// Set the target Phoenix table and the columns
PhoenixMapReduceUtil.setOutput(job, tableName, 
"WEB_ID,WEB_PAGE_LABEL,DEVICE_TYPE," +

"WIDGET_INSTANCE_ID,WIDGET_TYPE,WIDGET_VERSION,WIDGET_CONTEXT," +

"TOTAL_CLICKS,TOTAL_CLICK_VIEWS,TOTAL_HOVER_TIME_MS,TOTAL_TIME_ON_PAGE_MS,TOTAL_VIEWABLE_TIME_MS,"
 +

"VIEW_COUNT,USER_SEGMENT,DIM_DATE_KEY,VIEW_DATE,VIEW_DATE_TIMESTAMP,ROW_NUMBER");
FileInputFormat.setInputPaths(job, inputPath);
job.setMapperClass(WidgetPhoenixMapper.class);
job.setMapOutputKeyClass(NullWritable.class);
job.setMapOutputValueClass(WidgetPagesStatsWritable.class);
job.setOutputFormatClass(PhoenixOutputFormat.class);
TableMapReduceUtil.addDependencyJars(job);
job.setNumReduceTasks(0);
job.waitForCompletion(true);

public static class WidgetPhoenixMapper extends Mapper {
@Override
public void map(LongWritable longWritable, Text text, Context context) 
throws IOException, InterruptedException {
Configuration conf = context.getConfiguration();
String rundateString = conf.get("rundate");
PagesSegmentWidgetLineParser parser = new 
PagesSegmentWidgetLineParser();
try {
PagesSegmentWidget pagesSegmentWidget = 
parser.parse(text.toString());

if (pagesSegmentWidget != null) {
WidgetPagesStatsWritable widgetPagesStatsWritable = new 
WidgetPagesStatsWritable();
WidgetPagesStats widgetPagesStats = new WidgetPagesStats();

widgetPagesStats.setWebId(pagesSegmentWidget.getWebId());

widgetPagesStats.setWebPageLabel(pagesSegmentWidget.getWebPageLabel());

widgetPagesStats.setWidgetInstanceId(pagesSegmentWidget.getWidgetInstanceId());
…..

widgetPagesStatsWritable.setWidgetPagesStats(widgetPagesStats);
context.write(NullWritable.get(), widgetPagesStatsWritable);
}

}catch (Exception e){
e.printStackTrace();
}
}
}

public final class WidgetPagesStats {
private String webId;
private String webPageLabel;
private long widgetInstanceId;
private String widgetType;

…
@Override
public boolean equals(Object o) {

..
}
@Override
public int hashCode() {

..
}
@Override
public String toString() {
return "WidgetPhoenix{“….
'}';
}
}

public class WidgetPagesStatsWritable implements DBWritable, Writable {

private WidgetPagesStats widgetPagesStats;

public void readFields(DataInput input) throws IOException {
widgetPagesStats.setWebId(input.readLine());
widgetPagesStats.setWebPageLabel(input.readLine());
widgetPagesStats.setWidgetInstanceId(input.readLong());
widgetPagesStats.setWidgetType(input.readLine());

…
}

public void write(DataOutput output) throws IOException {
output.writeBytes(widgetPagesStats.getWebId());
output.writeBytes(widgetPagesStats.getWebPageLabel());

output.writeLong(widgetPagesStats.getWidgetInstanceId());
output.writeBytes(widgetPagesStats.getWidgetType());

..
}

public void readFields(ResultSet rs) throws SQLException {
widgetPagesStats.setWebId(rs.getString("WEB_ID"));
widgetPagesStats.setWebPageLabel(rs.getString("WEB_PAGE_LABEL"));
widgetPagesStats.setWidgetInstanceId(rs.getLong("WIDGET_INSTANCE_ID"));
widgetPagesStats.setWidgetType(rs.getString("WIDGET_TYPE"));

…
}

public void write(PreparedStatement pstmt) throws SQLException {
Connection connection = pstmt.getConnection();
PhoenixConnection phoenixConnection = (PhoenixConnection) connection;
//connection.getClientInfo().setProperty("scn", 
Long.toString(widgetPhoenix.getViewDateTimestamp()));

pstmt.setString(1, widgetPagesStats.getWebId());
pstmt.setString(2, widgetPagesStats.getWebPageLabel());
pstmt.setString(3, widgetPagesStats.getDeviceType());

pstmt.setLong(4, widgetPagesStats.getWidgetInstanceId());

…
}

public 

Re: system.catalog and system.stats entries slows down bulk MR inserts by 20-25X (Phoenix 4.4)

2015-12-07 Thread Matt Kowalczyk
We're also encountering slow downs after bulk MR inserts. I've only
measured slow downs in the query path (since our bulk inserts workloads
vary in size it hasn't been clear that we see slow downs here but i'll now
measure this as well). The subject of my reported issue was titled, "stats
table causing slow queries".

the stats table seems to be re-built during compactions and and I have to
actively purge the table to regain sane query times. Would be sweet if the
stats feature could be disabled.

On Mon, Dec 7, 2015 at 9:53 AM, Thangamani, Arun  wrote:

> This is on hbase-1.1.1.2.3.0.0-2557 if that would make any difference in
> analysis. Thanks
>
> From: Arun Thangamani 
> Date: Monday, December 7, 2015 at 12:13 AM
> To: "user@phoenix.apache.org" 
> Subject: system.catalog and system.stats entries slows down bulk MR
> inserts by 20-25X (Phoenix 4.4)
>
> Hello, I noticed an issue with bulk insert through map reduce in phoenix
> 4.4.0.2.3.0.0-2557, using outline of the code below
>
> Normally the inserts of about 25 million rows complete in about 5 mins,
> there are 5 region servers and the phoenix table has 32 buckets
> But sometimes (maybe after major compactions or region movement?), writes
> simply slow down to 90 mins, when I truncate SYSTEM.STATS hbase table, the
> inserts get a little faster (60 mins), but when I truncate both
> SYSTEM.CATALOG & SYSTEM.STATS tables, and recreate the phoenix table def(s)
> the inserts go back to 5 mins, the workaround of truncating SYSTEM tables
> is not sustainable for long, can someone help and let me know if there is a
> patch available for this? Thanks in advance.
>
> Job job = Job.getInstance(conf, NAME);
> // Set the target Phoenix table and the columns
> PhoenixMapReduceUtil.setOutput(job, tableName,
> "WEB_ID,WEB_PAGE_LABEL,DEVICE_TYPE," +
>
> "WIDGET_INSTANCE_ID,WIDGET_TYPE,WIDGET_VERSION,WIDGET_CONTEXT," +
> 
> "TOTAL_CLICKS,TOTAL_CLICK_VIEWS,TOTAL_HOVER_TIME_MS,TOTAL_TIME_ON_PAGE_MS,TOTAL_VIEWABLE_TIME_MS,"
> +
>
> 
> "VIEW_COUNT,USER_SEGMENT,DIM_DATE_KEY,VIEW_DATE,VIEW_DATE_TIMESTAMP,ROW_NUMBER");
> FileInputFormat.setInputPaths(job, inputPath);
> job.setMapperClass(WidgetPhoenixMapper.class);
> job.setMapOutputKeyClass(NullWritable.class);
> job.setMapOutputValueClass(WidgetPagesStatsWritable.class);
> job.setOutputFormatClass(PhoenixOutputFormat.class);
> TableMapReduceUtil.addDependencyJars(job);
> job.setNumReduceTasks(0);
> job.waitForCompletion(true);
>
> public static class WidgetPhoenixMapper extends Mapper NullWritable, WidgetPagesStatsWritable> {
> @Override
> public void map(LongWritable longWritable, Text text, Context context)
> throws IOException, InterruptedException {
> Configuration conf = context.getConfiguration();
> String rundateString = conf.get("rundate");
> PagesSegmentWidgetLineParser parser = new
> PagesSegmentWidgetLineParser();
> try {
> PagesSegmentWidget pagesSegmentWidget =
> parser.parse(text.toString());
>
> if (pagesSegmentWidget != null) {
> WidgetPagesStatsWritable widgetPagesStatsWritable = new
> WidgetPagesStatsWritable();
> WidgetPagesStats widgetPagesStats = new WidgetPagesStats();
>
> widgetPagesStats.setWebId(pagesSegmentWidget.getWebId());
>
> 
> widgetPagesStats.setWebPageLabel(pagesSegmentWidget.getWebPageLabel());
>
> 
> widgetPagesStats.setWidgetInstanceId(pagesSegmentWidget.getWidgetInstanceId());
> …..
>
>
> 
> widgetPagesStatsWritable.setWidgetPagesStats(widgetPagesStats);
> context.write(NullWritable.get(),
> widgetPagesStatsWritable);
> }
>
> }catch (Exception e){
> e.printStackTrace();
> }
> }
> }
>
> public final class WidgetPagesStats {
> private String webId;
> private String webPageLabel;
> private long widgetInstanceId;
> private String widgetType;
>
> …
> @Override
> public boolean equals(Object o) {
>
> ..
> }
> @Override
> public int hashCode() {
>
> ..
> }
> @Override
> public String toString() {
> return "WidgetPhoenix{“….
> '}';
> }
> }
>
> public class WidgetPagesStatsWritable implements DBWritable, Writable {
>
> private WidgetPagesStats widgetPagesStats;
>
> public void readFields(DataInput input) throws IOException {
> widgetPagesStats.setWebId(input.readLine());
> widgetPagesStats.setWebPageLabel(input.readLine());
> widgetPagesStats.setWidgetInstanceId(input.readLong());
> widgetPagesStats.setWidgetType(input.readLine());
>
> …
> }
>
> public void write(DataOutput output) throws IOException {
> output.writeBytes(widgetPagesStats.getWebId());
> 

Spark on hbase using Phoenix in secure cluster

2015-12-07 Thread Akhilesh Pathodia
Hi,

I am running spark job on yarn in cluster mode in secured cluster. I am
trying to run Spark on Hbase using Phoenix, but Spark executors are unable
to get hbase connection using phoenix. I am running knit command to get the
ticket before starting the job and also keytab file and principal are
correctly specified in connection URL. But still spark job on each node
throws below error:

15/12/01 03:23:15 ERROR ipc.AbstractRpcClient: SASL authentication failed.
The most likely cause is missing or invalid credentials. Consider 'kinit'.
javax.security.sasl.SaslException: GSS initiate failed [Caused by
GSSException: No valid credentials provided (Mechanism level: Failed to
find any Kerberos tgt)]
at
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)

I am using Spark 1.3.1, Hbase 1.0.0, Phoenix 4.3. I am able to run Spark on
Hbase(without phoenix) successfully in yarn-client mode as mentioned in
this link:

https://github.com/cloudera-labs/SparkOnHBase#scan-that-works-on-kerberos

Also, I found that there is a known issue for yarn-cluster mode for Spark
1.3.1 version:

https://issues.apache.org/jira/browse/SPARK-6918

Has anybody been successful in running Spark on hbase using Phoenix in yarn
cluster or client mode?

Thanks,
Akhilesh Pathodia


Re: system.catalog and system.stats entries slows down bulk MR inserts by 20-25X (Phoenix 4.4)

2015-12-07 Thread James Taylor
You can disable stats through setting the phoenix.stats.guidepost.width
config parameter to a larger value in the server side hbase-site.xml. The
default is 104857600 (or 10MB). If you set it to your MAX_FILESIZE (the
size you allow a region to grow to before it splits - default 20GB), then
you're essentially disabling it. You could also try increasing it somewhere
in between to maybe 5 or 10GB.

Thanks,
James

On Mon, Dec 7, 2015 at 10:25 AM, Matt Kowalczyk 
wrote:

> We're also encountering slow downs after bulk MR inserts. I've only
> measured slow downs in the query path (since our bulk inserts workloads
> vary in size it hasn't been clear that we see slow downs here but i'll now
> measure this as well). The subject of my reported issue was titled, "stats
> table causing slow queries".
>
> the stats table seems to be re-built during compactions and and I have to
> actively purge the table to regain sane query times. Would be sweet if the
> stats feature could be disabled.
>
> On Mon, Dec 7, 2015 at 9:53 AM, Thangamani, Arun 
> wrote:
>
>> This is on hbase-1.1.1.2.3.0.0-2557 if that would make any difference in
>> analysis. Thanks
>>
>> From: Arun Thangamani 
>> Date: Monday, December 7, 2015 at 12:13 AM
>> To: "user@phoenix.apache.org" 
>> Subject: system.catalog and system.stats entries slows down bulk MR
>> inserts by 20-25X (Phoenix 4.4)
>>
>> Hello, I noticed an issue with bulk insert through map reduce in phoenix
>> 4.4.0.2.3.0.0-2557, using outline of the code below
>>
>> Normally the inserts of about 25 million rows complete in about 5 mins,
>> there are 5 region servers and the phoenix table has 32 buckets
>> But sometimes (maybe after major compactions or region movement?), writes
>> simply slow down to 90 mins, when I truncate SYSTEM.STATS hbase table, the
>> inserts get a little faster (60 mins), but when I truncate both
>> SYSTEM.CATALOG & SYSTEM.STATS tables, and recreate the phoenix table def(s)
>> the inserts go back to 5 mins, the workaround of truncating SYSTEM tables
>> is not sustainable for long, can someone help and let me know if there is a
>> patch available for this? Thanks in advance.
>>
>> Job job = Job.getInstance(conf, NAME);
>> // Set the target Phoenix table and the columns
>> PhoenixMapReduceUtil.setOutput(job, tableName,
>> "WEB_ID,WEB_PAGE_LABEL,DEVICE_TYPE," +
>>
>> "WIDGET_INSTANCE_ID,WIDGET_TYPE,WIDGET_VERSION,WIDGET_CONTEXT," +
>> 
>> "TOTAL_CLICKS,TOTAL_CLICK_VIEWS,TOTAL_HOVER_TIME_MS,TOTAL_TIME_ON_PAGE_MS,TOTAL_VIEWABLE_TIME_MS,"
>> +
>>
>> 
>> "VIEW_COUNT,USER_SEGMENT,DIM_DATE_KEY,VIEW_DATE,VIEW_DATE_TIMESTAMP,ROW_NUMBER");
>> FileInputFormat.setInputPaths(job, inputPath);
>> job.setMapperClass(WidgetPhoenixMapper.class);
>> job.setMapOutputKeyClass(NullWritable.class);
>> job.setMapOutputValueClass(WidgetPagesStatsWritable.class);
>> job.setOutputFormatClass(PhoenixOutputFormat.class);
>> TableMapReduceUtil.addDependencyJars(job);
>> job.setNumReduceTasks(0);
>> job.waitForCompletion(true);
>>
>> public static class WidgetPhoenixMapper extends Mapper> Text, NullWritable, WidgetPagesStatsWritable> {
>> @Override
>> public void map(LongWritable longWritable, Text text, Context
>> context) throws IOException, InterruptedException {
>> Configuration conf = context.getConfiguration();
>> String rundateString = conf.get("rundate");
>> PagesSegmentWidgetLineParser parser = new
>> PagesSegmentWidgetLineParser();
>> try {
>> PagesSegmentWidget pagesSegmentWidget =
>> parser.parse(text.toString());
>>
>> if (pagesSegmentWidget != null) {
>> WidgetPagesStatsWritable widgetPagesStatsWritable = new
>> WidgetPagesStatsWritable();
>> WidgetPagesStats widgetPagesStats = new
>> WidgetPagesStats();
>>
>> widgetPagesStats.setWebId(pagesSegmentWidget.getWebId());
>>
>> 
>> widgetPagesStats.setWebPageLabel(pagesSegmentWidget.getWebPageLabel());
>>
>> 
>> widgetPagesStats.setWidgetInstanceId(pagesSegmentWidget.getWidgetInstanceId());
>> …..
>>
>>
>> 
>> widgetPagesStatsWritable.setWidgetPagesStats(widgetPagesStats);
>> context.write(NullWritable.get(),
>> widgetPagesStatsWritable);
>> }
>>
>> }catch (Exception e){
>> e.printStackTrace();
>> }
>> }
>> }
>>
>> public final class WidgetPagesStats {
>> private String webId;
>> private String webPageLabel;
>> private long widgetInstanceId;
>> private String widgetType;
>>
>> …
>> @Override
>> public boolean equals(Object o) {
>>
>> ..
>> }
>> @Override
>> public int hashCode() {
>>
>> ..
>> }
>> @Override
>> public String toString() {
>> return "WidgetPhoenix{“….
>> '}';
>> }
>> }

Re: system.catalog and system.stats entries slows down bulk MR inserts by 20-25X (Phoenix 4.4)

2015-12-07 Thread Matt Kowalczyk
I've set, phoenix.stats.guidepost.per.region to 1 and continue to see
entries added to the system.stats table. I believe this should have the
same effect? I'll try setting the guidepost width though.


On Mon, Dec 7, 2015 at 12:11 PM, James Taylor 
wrote:

> You can disable stats through setting the phoenix.stats.guidepost.width
> config parameter to a larger value in the server side hbase-site.xml. The
> default is 104857600 (or 10MB). If you set it to your MAX_FILESIZE (the
> size you allow a region to grow to before it splits - default 20GB), then
> you're essentially disabling it. You could also try increasing it somewhere
> in between to maybe 5 or 10GB.
>
> Thanks,
> James
>
> On Mon, Dec 7, 2015 at 10:25 AM, Matt Kowalczyk 
> wrote:
>
>> We're also encountering slow downs after bulk MR inserts. I've only
>> measured slow downs in the query path (since our bulk inserts workloads
>> vary in size it hasn't been clear that we see slow downs here but i'll now
>> measure this as well). The subject of my reported issue was titled, "stats
>> table causing slow queries".
>>
>> the stats table seems to be re-built during compactions and and I have to
>> actively purge the table to regain sane query times. Would be sweet if the
>> stats feature could be disabled.
>>
>> On Mon, Dec 7, 2015 at 9:53 AM, Thangamani, Arun 
>> wrote:
>>
>>> This is on hbase-1.1.1.2.3.0.0-2557 if that would make any difference in
>>> analysis. Thanks
>>>
>>> From: Arun Thangamani 
>>> Date: Monday, December 7, 2015 at 12:13 AM
>>> To: "user@phoenix.apache.org" 
>>> Subject: system.catalog and system.stats entries slows down bulk MR
>>> inserts by 20-25X (Phoenix 4.4)
>>>
>>> Hello, I noticed an issue with bulk insert through map reduce in phoenix
>>> 4.4.0.2.3.0.0-2557, using outline of the code below
>>>
>>> Normally the inserts of about 25 million rows complete in about 5 mins,
>>> there are 5 region servers and the phoenix table has 32 buckets
>>> But sometimes (maybe after major compactions or region movement?),
>>> writes simply slow down to 90 mins, when I truncate SYSTEM.STATS hbase
>>> table, the inserts get a little faster (60 mins), but when I truncate both
>>> SYSTEM.CATALOG & SYSTEM.STATS tables, and recreate the phoenix table def(s)
>>> the inserts go back to 5 mins, the workaround of truncating SYSTEM tables
>>> is not sustainable for long, can someone help and let me know if there is a
>>> patch available for this? Thanks in advance.
>>>
>>> Job job = Job.getInstance(conf, NAME);
>>> // Set the target Phoenix table and the columns
>>> PhoenixMapReduceUtil.setOutput(job, tableName,
>>> "WEB_ID,WEB_PAGE_LABEL,DEVICE_TYPE," +
>>>
>>> "WIDGET_INSTANCE_ID,WIDGET_TYPE,WIDGET_VERSION,WIDGET_CONTEXT," +
>>> 
>>> "TOTAL_CLICKS,TOTAL_CLICK_VIEWS,TOTAL_HOVER_TIME_MS,TOTAL_TIME_ON_PAGE_MS,TOTAL_VIEWABLE_TIME_MS,"
>>> +
>>>
>>> 
>>> "VIEW_COUNT,USER_SEGMENT,DIM_DATE_KEY,VIEW_DATE,VIEW_DATE_TIMESTAMP,ROW_NUMBER");
>>> FileInputFormat.setInputPaths(job, inputPath);
>>> job.setMapperClass(WidgetPhoenixMapper.class);
>>> job.setMapOutputKeyClass(NullWritable.class);
>>> job.setMapOutputValueClass(WidgetPagesStatsWritable.class);
>>> job.setOutputFormatClass(PhoenixOutputFormat.class);
>>> TableMapReduceUtil.addDependencyJars(job);
>>> job.setNumReduceTasks(0);
>>> job.waitForCompletion(true);
>>>
>>> public static class WidgetPhoenixMapper extends Mapper>> Text, NullWritable, WidgetPagesStatsWritable> {
>>> @Override
>>> public void map(LongWritable longWritable, Text text, Context
>>> context) throws IOException, InterruptedException {
>>> Configuration conf = context.getConfiguration();
>>> String rundateString = conf.get("rundate");
>>> PagesSegmentWidgetLineParser parser = new
>>> PagesSegmentWidgetLineParser();
>>> try {
>>> PagesSegmentWidget pagesSegmentWidget =
>>> parser.parse(text.toString());
>>>
>>> if (pagesSegmentWidget != null) {
>>> WidgetPagesStatsWritable widgetPagesStatsWritable = new
>>> WidgetPagesStatsWritable();
>>> WidgetPagesStats widgetPagesStats = new
>>> WidgetPagesStats();
>>>
>>> widgetPagesStats.setWebId(pagesSegmentWidget.getWebId());
>>>
>>> 
>>> widgetPagesStats.setWebPageLabel(pagesSegmentWidget.getWebPageLabel());
>>>
>>> 
>>> widgetPagesStats.setWidgetInstanceId(pagesSegmentWidget.getWidgetInstanceId());
>>> …..
>>>
>>>
>>> 
>>> widgetPagesStatsWritable.setWidgetPagesStats(widgetPagesStats);
>>> context.write(NullWritable.get(),
>>> widgetPagesStatsWritable);
>>> }
>>>
>>> }catch (Exception e){
>>> e.printStackTrace();
>>> }
>>> }
>>> }
>>>
>>> public final class WidgetPagesStats {
>>> private String webId;
>>>   

Re: system.catalog and system.stats entries slows down bulk MR inserts by 20-25X (Phoenix 4.4)

2015-12-07 Thread James Taylor
Yes, setting that property is another way to disable stats. You'll need to
bounce your cluster after setting either of these, and stats won't be
updated until a major compaction occurs.

On Monday, December 7, 2015, Matt Kowalczyk  wrote:

> I've set, phoenix.stats.guidepost.per.region to 1 and continue to see
> entries added to the system.stats table. I believe this should have the
> same effect? I'll try setting the guidepost width though.
>
>
> On Mon, Dec 7, 2015 at 12:11 PM, James Taylor  > wrote:
>
>> You can disable stats through setting the phoenix.stats.guidepost.width
>> config parameter to a larger value in the server side hbase-site.xml. The
>> default is 104857600 (or 10MB). If you set it to your MAX_FILESIZE (the
>> size you allow a region to grow to before it splits - default 20GB), then
>> you're essentially disabling it. You could also try increasing it somewhere
>> in between to maybe 5 or 10GB.
>>
>> Thanks,
>> James
>>
>> On Mon, Dec 7, 2015 at 10:25 AM, Matt Kowalczyk > > wrote:
>>
>>> We're also encountering slow downs after bulk MR inserts. I've only
>>> measured slow downs in the query path (since our bulk inserts workloads
>>> vary in size it hasn't been clear that we see slow downs here but i'll now
>>> measure this as well). The subject of my reported issue was titled, "stats
>>> table causing slow queries".
>>>
>>> the stats table seems to be re-built during compactions and and I have
>>> to actively purge the table to regain sane query times. Would be sweet if
>>> the stats feature could be disabled.
>>>
>>> On Mon, Dec 7, 2015 at 9:53 AM, Thangamani, Arun >> > wrote:
>>>
 This is on hbase-1.1.1.2.3.0.0-2557 if that would make any difference
 in analysis. Thanks

 From: Arun Thangamani >
 Date: Monday, December 7, 2015 at 12:13 AM
 To: "user@phoenix.apache.org
 " <
 user@phoenix.apache.org
 >
 Subject: system.catalog and system.stats entries slows down bulk MR
 inserts by 20-25X (Phoenix 4.4)

 Hello, I noticed an issue with bulk insert through map reduce in
 phoenix 4.4.0.2.3.0.0-2557, using outline of the code below

 Normally the inserts of about 25 million rows complete in about 5 mins,
 there are 5 region servers and the phoenix table has 32 buckets
 But sometimes (maybe after major compactions or region movement?),
 writes simply slow down to 90 mins, when I truncate SYSTEM.STATS hbase
 table, the inserts get a little faster (60 mins), but when I truncate both
 SYSTEM.CATALOG & SYSTEM.STATS tables, and recreate the phoenix table def(s)
 the inserts go back to 5 mins, the workaround of truncating SYSTEM tables
 is not sustainable for long, can someone help and let me know if there is a
 patch available for this? Thanks in advance.

 Job job = Job.getInstance(conf, NAME);
 // Set the target Phoenix table and the columns
 PhoenixMapReduceUtil.setOutput(job, tableName,
 "WEB_ID,WEB_PAGE_LABEL,DEVICE_TYPE," +

 "WIDGET_INSTANCE_ID,WIDGET_TYPE,WIDGET_VERSION,WIDGET_CONTEXT,"
 +
 
 "TOTAL_CLICKS,TOTAL_CLICK_VIEWS,TOTAL_HOVER_TIME_MS,TOTAL_TIME_ON_PAGE_MS,TOTAL_VIEWABLE_TIME_MS,"
 +

 
 "VIEW_COUNT,USER_SEGMENT,DIM_DATE_KEY,VIEW_DATE,VIEW_DATE_TIMESTAMP,ROW_NUMBER");
 FileInputFormat.setInputPaths(job, inputPath);
 job.setMapperClass(WidgetPhoenixMapper.class);
 job.setMapOutputKeyClass(NullWritable.class);
 job.setMapOutputValueClass(WidgetPagesStatsWritable.class);
 job.setOutputFormatClass(PhoenixOutputFormat.class);
 TableMapReduceUtil.addDependencyJars(job);
 job.setNumReduceTasks(0);
 job.waitForCompletion(true);

 public static class WidgetPhoenixMapper extends Mapper {
 @Override
 public void map(LongWritable longWritable, Text text, Context
 context) throws IOException, InterruptedException {
 Configuration conf = context.getConfiguration();
 String rundateString = conf.get("rundate");
 PagesSegmentWidgetLineParser parser = new
 PagesSegmentWidgetLineParser();
 try {
 PagesSegmentWidget pagesSegmentWidget =
 parser.parse(text.toString());

 if (pagesSegmentWidget != null) {
 WidgetPagesStatsWritable widgetPagesStatsWritable = new
 WidgetPagesStatsWritable();
 WidgetPagesStats widgetPagesStats = new

Re: system.catalog and system.stats entries slows down bulk MR inserts by 20-25X (Phoenix 4.4)

2015-12-07 Thread Matt Kowalczyk
bounced, just after major compaction, with the setting as indicated above.
I'm unable to disable the stats table.

select count(*) from system.stats where physical_name = 'X';
+--+
| COUNT(1) |
+--+
| 653  |
+--+
1 row selected (0.036 seconds)


On Mon, Dec 7, 2015 at 2:41 PM, James Taylor  wrote:

> Yes, setting that property is another way to disable stats. You'll need to
> bounce your cluster after setting either of these, and stats won't be
> updated until a major compaction occurs.
>
>
> On Monday, December 7, 2015, Matt Kowalczyk 
> wrote:
>
>> I've set, phoenix.stats.guidepost.per.region to 1 and continue to see
>> entries added to the system.stats table. I believe this should have the
>> same effect? I'll try setting the guidepost width though.
>>
>>
>> On Mon, Dec 7, 2015 at 12:11 PM, James Taylor 
>> wrote:
>>
>>> You can disable stats through setting the phoenix.stats.guidepost.width
>>> config parameter to a larger value in the server side hbase-site.xml. The
>>> default is 104857600 (or 10MB). If you set it to your MAX_FILESIZE (the
>>> size you allow a region to grow to before it splits - default 20GB), then
>>> you're essentially disabling it. You could also try increasing it somewhere
>>> in between to maybe 5 or 10GB.
>>>
>>> Thanks,
>>> James
>>>
>>> On Mon, Dec 7, 2015 at 10:25 AM, Matt Kowalczyk 
>>> wrote:
>>>
 We're also encountering slow downs after bulk MR inserts. I've only
 measured slow downs in the query path (since our bulk inserts workloads
 vary in size it hasn't been clear that we see slow downs here but i'll now
 measure this as well). The subject of my reported issue was titled, "stats
 table causing slow queries".

 the stats table seems to be re-built during compactions and and I have
 to actively purge the table to regain sane query times. Would be sweet if
 the stats feature could be disabled.

 On Mon, Dec 7, 2015 at 9:53 AM, Thangamani, Arun 
 wrote:

> This is on hbase-1.1.1.2.3.0.0-2557 if that would make any difference
> in analysis. Thanks
>
> From: Arun Thangamani 
> Date: Monday, December 7, 2015 at 12:13 AM
> To: "user@phoenix.apache.org" 
> Subject: system.catalog and system.stats entries slows down bulk MR
> inserts by 20-25X (Phoenix 4.4)
>
> Hello, I noticed an issue with bulk insert through map reduce in
> phoenix 4.4.0.2.3.0.0-2557, using outline of the code below
>
> Normally the inserts of about 25 million rows complete in about 5
> mins, there are 5 region servers and the phoenix table has 32 buckets
> But sometimes (maybe after major compactions or region movement?),
> writes simply slow down to 90 mins, when I truncate SYSTEM.STATS hbase
> table, the inserts get a little faster (60 mins), but when I truncate both
> SYSTEM.CATALOG & SYSTEM.STATS tables, and recreate the phoenix table 
> def(s)
> the inserts go back to 5 mins, the workaround of truncating SYSTEM tables
> is not sustainable for long, can someone help and let me know if there is 
> a
> patch available for this? Thanks in advance.
>
> Job job = Job.getInstance(conf, NAME);
> // Set the target Phoenix table and the columns
> PhoenixMapReduceUtil.setOutput(job, tableName,
> "WEB_ID,WEB_PAGE_LABEL,DEVICE_TYPE," +
>
> "WIDGET_INSTANCE_ID,WIDGET_TYPE,WIDGET_VERSION,WIDGET_CONTEXT,"
> +
> 
> "TOTAL_CLICKS,TOTAL_CLICK_VIEWS,TOTAL_HOVER_TIME_MS,TOTAL_TIME_ON_PAGE_MS,TOTAL_VIEWABLE_TIME_MS,"
> +
>
> 
> "VIEW_COUNT,USER_SEGMENT,DIM_DATE_KEY,VIEW_DATE,VIEW_DATE_TIMESTAMP,ROW_NUMBER");
> FileInputFormat.setInputPaths(job, inputPath);
> job.setMapperClass(WidgetPhoenixMapper.class);
> job.setMapOutputKeyClass(NullWritable.class);
> job.setMapOutputValueClass(WidgetPagesStatsWritable.class);
> job.setOutputFormatClass(PhoenixOutputFormat.class);
> TableMapReduceUtil.addDependencyJars(job);
> job.setNumReduceTasks(0);
> job.waitForCompletion(true);
>
> public static class WidgetPhoenixMapper extends Mapper Text, NullWritable, WidgetPagesStatsWritable> {
> @Override
> public void map(LongWritable longWritable, Text text, Context
> context) throws IOException, InterruptedException {
> Configuration conf = context.getConfiguration();
> String rundateString = conf.get("rundate");
> PagesSegmentWidgetLineParser parser = new
> PagesSegmentWidgetLineParser();
> try {
> PagesSegmentWidget 

Re: system.catalog and system.stats entries slows down bulk MR inserts by 20-25X (Phoenix 4.4)

2015-12-07 Thread James Taylor
You need to bounce the cluster *before* major compaction or the region
server will continue to use the old guideposts setting during compaction.

On Mon, Dec 7, 2015 at 2:45 PM, Matt Kowalczyk 
wrote:

> bounced, just after major compaction, with the setting as indicated above.
> I'm unable to disable the stats table.
>
> select count(*) from system.stats where physical_name = 'X';
> +--+
> | COUNT(1) |
> +--+
> | 653  |
> +--+
> 1 row selected (0.036 seconds)
>
>
> On Mon, Dec 7, 2015 at 2:41 PM, James Taylor 
> wrote:
>
>> Yes, setting that property is another way to disable stats. You'll need
>> to bounce your cluster after setting either of these, and stats won't be
>> updated until a major compaction occurs.
>>
>>
>> On Monday, December 7, 2015, Matt Kowalczyk 
>> wrote:
>>
>>> I've set, phoenix.stats.guidepost.per.region to 1 and continue to see
>>> entries added to the system.stats table. I believe this should have the
>>> same effect? I'll try setting the guidepost width though.
>>>
>>>
>>> On Mon, Dec 7, 2015 at 12:11 PM, James Taylor 
>>> wrote:
>>>
 You can disable stats through setting the phoenix.stats.guidepost.width
 config parameter to a larger value in the server side hbase-site.xml. The
 default is 104857600 (or 10MB). If you set it to your MAX_FILESIZE (the
 size you allow a region to grow to before it splits - default 20GB), then
 you're essentially disabling it. You could also try increasing it somewhere
 in between to maybe 5 or 10GB.

 Thanks,
 James

 On Mon, Dec 7, 2015 at 10:25 AM, Matt Kowalczyk  wrote:

> We're also encountering slow downs after bulk MR inserts. I've only
> measured slow downs in the query path (since our bulk inserts workloads
> vary in size it hasn't been clear that we see slow downs here but i'll now
> measure this as well). The subject of my reported issue was titled, "stats
> table causing slow queries".
>
> the stats table seems to be re-built during compactions and and I have
> to actively purge the table to regain sane query times. Would be sweet if
> the stats feature could be disabled.
>
> On Mon, Dec 7, 2015 at 9:53 AM, Thangamani, Arun 
> wrote:
>
>> This is on hbase-1.1.1.2.3.0.0-2557 if that would make any difference
>> in analysis. Thanks
>>
>> From: Arun Thangamani 
>> Date: Monday, December 7, 2015 at 12:13 AM
>> To: "user@phoenix.apache.org" 
>> Subject: system.catalog and system.stats entries slows down bulk MR
>> inserts by 20-25X (Phoenix 4.4)
>>
>> Hello, I noticed an issue with bulk insert through map reduce in
>> phoenix 4.4.0.2.3.0.0-2557, using outline of the code below
>>
>> Normally the inserts of about 25 million rows complete in about 5
>> mins, there are 5 region servers and the phoenix table has 32 buckets
>> But sometimes (maybe after major compactions or region movement?),
>> writes simply slow down to 90 mins, when I truncate SYSTEM.STATS hbase
>> table, the inserts get a little faster (60 mins), but when I truncate 
>> both
>> SYSTEM.CATALOG & SYSTEM.STATS tables, and recreate the phoenix table 
>> def(s)
>> the inserts go back to 5 mins, the workaround of truncating SYSTEM tables
>> is not sustainable for long, can someone help and let me know if there 
>> is a
>> patch available for this? Thanks in advance.
>>
>> Job job = Job.getInstance(conf, NAME);
>> // Set the target Phoenix table and the columns
>> PhoenixMapReduceUtil.setOutput(job, tableName,
>> "WEB_ID,WEB_PAGE_LABEL,DEVICE_TYPE," +
>>
>> "WIDGET_INSTANCE_ID,WIDGET_TYPE,WIDGET_VERSION,WIDGET_CONTEXT,"
>> +
>> 
>> "TOTAL_CLICKS,TOTAL_CLICK_VIEWS,TOTAL_HOVER_TIME_MS,TOTAL_TIME_ON_PAGE_MS,TOTAL_VIEWABLE_TIME_MS,"
>> +
>>
>> 
>> "VIEW_COUNT,USER_SEGMENT,DIM_DATE_KEY,VIEW_DATE,VIEW_DATE_TIMESTAMP,ROW_NUMBER");
>> FileInputFormat.setInputPaths(job, inputPath);
>> job.setMapperClass(WidgetPhoenixMapper.class);
>> job.setMapOutputKeyClass(NullWritable.class);
>> job.setMapOutputValueClass(WidgetPagesStatsWritable.class);
>> job.setOutputFormatClass(PhoenixOutputFormat.class);
>> TableMapReduceUtil.addDependencyJars(job);
>> job.setNumReduceTasks(0);
>> job.waitForCompletion(true);
>>
>> public static class WidgetPhoenixMapper extends Mapper> Text, NullWritable, WidgetPagesStatsWritable> {
>> @Override
>> public void map(LongWritable longWritable, Text 

Re: system.catalog and system.stats entries slows down bulk MR inserts by 20-25X (Phoenix 4.4)

2015-12-07 Thread Matt Kowalczyk
I'm sorry I poorly communicated in the previous e-mail. I meant to provide
a list of things that I did. I bounced and then performed a major
compaction and then ran the select count(*) query.

On Mon, Dec 7, 2015 at 2:49 PM, James Taylor  wrote:

> You need to bounce the cluster *before* major compaction or the region
> server will continue to use the old guideposts setting during compaction.
>
> On Mon, Dec 7, 2015 at 2:45 PM, Matt Kowalczyk 
> wrote:
>
>> bounced, just after major compaction, with the setting as indicated
>> above. I'm unable to disable the stats table.
>>
>> select count(*) from system.stats where physical_name = 'X';
>> +--+
>> | COUNT(1) |
>> +--+
>> | 653  |
>> +--+
>> 1 row selected (0.036 seconds)
>>
>>
>> On Mon, Dec 7, 2015 at 2:41 PM, James Taylor 
>> wrote:
>>
>>> Yes, setting that property is another way to disable stats. You'll need
>>> to bounce your cluster after setting either of these, and stats won't be
>>> updated until a major compaction occurs.
>>>
>>>
>>> On Monday, December 7, 2015, Matt Kowalczyk 
>>> wrote:
>>>
 I've set, phoenix.stats.guidepost.per.region to 1 and continue to see
 entries added to the system.stats table. I believe this should have the
 same effect? I'll try setting the guidepost width though.


 On Mon, Dec 7, 2015 at 12:11 PM, James Taylor 
 wrote:

> You can disable stats through setting
> the phoenix.stats.guidepost.width config parameter to a larger value in 
> the
> server side hbase-site.xml. The default is 104857600 (or 10MB). If you set
> it to your MAX_FILESIZE (the size you allow a region to grow to before it
> splits - default 20GB), then you're essentially disabling it. You could
> also try increasing it somewhere in between to maybe 5 or 10GB.
>
> Thanks,
> James
>
> On Mon, Dec 7, 2015 at 10:25 AM, Matt Kowalczyk <
> ma...@cloudability.com> wrote:
>
>> We're also encountering slow downs after bulk MR inserts. I've only
>> measured slow downs in the query path (since our bulk inserts workloads
>> vary in size it hasn't been clear that we see slow downs here but i'll 
>> now
>> measure this as well). The subject of my reported issue was titled, 
>> "stats
>> table causing slow queries".
>>
>> the stats table seems to be re-built during compactions and and I
>> have to actively purge the table to regain sane query times. Would be 
>> sweet
>> if the stats feature could be disabled.
>>
>> On Mon, Dec 7, 2015 at 9:53 AM, Thangamani, Arun 
>> wrote:
>>
>>> This is on hbase-1.1.1.2.3.0.0-2557 if that would make any
>>> difference in analysis. Thanks
>>>
>>> From: Arun Thangamani 
>>> Date: Monday, December 7, 2015 at 12:13 AM
>>> To: "user@phoenix.apache.org" 
>>> Subject: system.catalog and system.stats entries slows down bulk MR
>>> inserts by 20-25X (Phoenix 4.4)
>>>
>>> Hello, I noticed an issue with bulk insert through map reduce in
>>> phoenix 4.4.0.2.3.0.0-2557, using outline of the code below
>>>
>>> Normally the inserts of about 25 million rows complete in about 5
>>> mins, there are 5 region servers and the phoenix table has 32 buckets
>>> But sometimes (maybe after major compactions or region movement?),
>>> writes simply slow down to 90 mins, when I truncate SYSTEM.STATS hbase
>>> table, the inserts get a little faster (60 mins), but when I truncate 
>>> both
>>> SYSTEM.CATALOG & SYSTEM.STATS tables, and recreate the phoenix table 
>>> def(s)
>>> the inserts go back to 5 mins, the workaround of truncating SYSTEM 
>>> tables
>>> is not sustainable for long, can someone help and let me know if there 
>>> is a
>>> patch available for this? Thanks in advance.
>>>
>>> Job job = Job.getInstance(conf, NAME);
>>> // Set the target Phoenix table and the columns
>>> PhoenixMapReduceUtil.setOutput(job, tableName,
>>> "WEB_ID,WEB_PAGE_LABEL,DEVICE_TYPE," +
>>>
>>> "WIDGET_INSTANCE_ID,WIDGET_TYPE,WIDGET_VERSION,WIDGET_CONTEXT,"
>>> +
>>> 
>>> "TOTAL_CLICKS,TOTAL_CLICK_VIEWS,TOTAL_HOVER_TIME_MS,TOTAL_TIME_ON_PAGE_MS,TOTAL_VIEWABLE_TIME_MS,"
>>> +
>>>
>>> 
>>> "VIEW_COUNT,USER_SEGMENT,DIM_DATE_KEY,VIEW_DATE,VIEW_DATE_TIMESTAMP,ROW_NUMBER");
>>> FileInputFormat.setInputPaths(job, inputPath);
>>> job.setMapperClass(WidgetPhoenixMapper.class);
>>> job.setMapOutputKeyClass(NullWritable.class);
>>> 

Re: system.catalog and system.stats entries slows down bulk MR inserts by 20-25X (Phoenix 4.4)

2015-12-07 Thread Thangamani, Arun
Thanks, I am doing some testing on this in parallel with 
phoenix.stats.guidepost.width = 10737418240 (which is the max file size set 
from ambari), will keep this thread updated

Going back to my original question, if guideposts are the only issue, why do I 
have to purge my SYSTEM.CATALOG records and recreate the defs to get the write 
performance back?

From: James Taylor >
Reply-To: "user@phoenix.apache.org" 
>
Date: Monday, December 7, 2015 at 2:49 PM
To: user >
Subject: Re: system.catalog and system.stats entries slows down bulk MR inserts 
by 20-25X (Phoenix 4.4)

You need to bounce the cluster *before* major compaction or the region server 
will continue to use the old guideposts setting during compaction.

On Mon, Dec 7, 2015 at 2:45 PM, Matt Kowalczyk 
> wrote:
bounced, just after major compaction, with the setting as indicated above. I'm 
unable to disable the stats table.

select count(*) from system.stats where physical_name = 'X';
+--+
| COUNT(1) |
+--+
| 653  |
+--+
1 row selected (0.036 seconds)


On Mon, Dec 7, 2015 at 2:41 PM, James Taylor 
> wrote:
Yes, setting that property is another way to disable stats. You'll need to 
bounce your cluster after setting either of these, and stats won't be updated 
until a major compaction occurs.


On Monday, December 7, 2015, Matt Kowalczyk 
> wrote:
I've set, phoenix.stats.guidepost.per.region to 1 and continue to see entries 
added to the system.stats table. I believe this should have the same effect? 
I'll try setting the guidepost width though.


On Mon, Dec 7, 2015 at 12:11 PM, James Taylor  wrote:
You can disable stats through setting the phoenix.stats.guidepost.width config 
parameter to a larger value in the server side hbase-site.xml. The default is 
104857600 (or 10MB). If you set it to your MAX_FILESIZE (the size you allow a 
region to grow to before it splits - default 20GB), then you're essentially 
disabling it. You could also try increasing it somewhere in between to maybe 5 
or 10GB.

Thanks,
James

On Mon, Dec 7, 2015 at 10:25 AM, Matt Kowalczyk  wrote:
We're also encountering slow downs after bulk MR inserts. I've only measured 
slow downs in the query path (since our bulk inserts workloads vary in size it 
hasn't been clear that we see slow downs here but i'll now measure this as 
well). The subject of my reported issue was titled, "stats table causing slow 
queries".

the stats table seems to be re-built during compactions and and I have to 
actively purge the table to regain sane query times. Would be sweet if the 
stats feature could be disabled.

On Mon, Dec 7, 2015 at 9:53 AM, Thangamani, Arun  wrote:
This is on hbase-1.1.1.2.3.0.0-2557 if that would make any difference in 
analysis. Thanks

From: Arun Thangamani 
Date: Monday, December 7, 2015 at 12:13 AM
To: "user@phoenix.apache.org" 
Subject: system.catalog and system.stats entries slows down bulk MR inserts by 
20-25X (Phoenix 4.4)

Hello, I noticed an issue with bulk insert through map reduce in phoenix 
4.4.0.2.3.0.0-2557, using outline of the code below

Normally the inserts of about 25 million rows complete in about 5 mins, there 
are 5 region servers and the phoenix table has 32 buckets
But sometimes (maybe after major compactions or region movement?), writes 
simply slow down to 90 mins, when I truncate SYSTEM.STATS hbase table, the 
inserts get a little faster (60 mins), but when I truncate both SYSTEM.CATALOG 
& SYSTEM.STATS tables, and recreate the phoenix table def(s) the inserts go 
back to 5 mins, the workaround of truncating SYSTEM tables is not sustainable 
for long, can someone help and let me know if there is a patch available for 
this? Thanks in advance.

Job job = Job.getInstance(conf, NAME);
// Set the target Phoenix table and the columns
PhoenixMapReduceUtil.setOutput(job, tableName, 
"WEB_ID,WEB_PAGE_LABEL,DEVICE_TYPE," +

"WIDGET_INSTANCE_ID,WIDGET_TYPE,WIDGET_VERSION,WIDGET_CONTEXT," +

"TOTAL_CLICKS,TOTAL_CLICK_VIEWS,TOTAL_HOVER_TIME_MS,TOTAL_TIME_ON_PAGE_MS,TOTAL_VIEWABLE_TIME_MS,"
 +

"VIEW_COUNT,USER_SEGMENT,DIM_DATE_KEY,VIEW_DATE,VIEW_DATE_TIMESTAMP,ROW_NUMBER");
FileInputFormat.setInputPaths(job, inputPath);
job.setMapperClass(WidgetPhoenixMapper.class);
job.setMapOutputKeyClass(NullWritable.class);
job.setMapOutputValueClass(WidgetPagesStatsWritable.class);

Re: system.catalog and system.stats entries slows down bulk MR inserts by 20-25X (Phoenix 4.4)

2015-12-07 Thread Thangamani, Arun
This is on hbase-1.1.1.2.3.0.0-2557 if that would make any difference in 
analysis. Thanks

From: Arun Thangamani >
Date: Monday, December 7, 2015 at 12:13 AM
To: "user@phoenix.apache.org" 
>
Subject: system.catalog and system.stats entries slows down bulk MR inserts by 
20-25X (Phoenix 4.4)

Hello, I noticed an issue with bulk insert through map reduce in phoenix 
4.4.0.2.3.0.0-2557, using outline of the code below

Normally the inserts of about 25 million rows complete in about 5 mins, there 
are 5 region servers and the phoenix table has 32 buckets
But sometimes (maybe after major compactions or region movement?), writes 
simply slow down to 90 mins, when I truncate SYSTEM.STATS hbase table, the 
inserts get a little faster (60 mins), but when I truncate both SYSTEM.CATALOG 
& SYSTEM.STATS tables, and recreate the phoenix table def(s) the inserts go 
back to 5 mins, the workaround of truncating SYSTEM tables is not sustainable 
for long, can someone help and let me know if there is a patch available for 
this? Thanks in advance.

Job job = Job.getInstance(conf, NAME);
// Set the target Phoenix table and the columns
PhoenixMapReduceUtil.setOutput(job, tableName, 
"WEB_ID,WEB_PAGE_LABEL,DEVICE_TYPE," +

"WIDGET_INSTANCE_ID,WIDGET_TYPE,WIDGET_VERSION,WIDGET_CONTEXT," +

"TOTAL_CLICKS,TOTAL_CLICK_VIEWS,TOTAL_HOVER_TIME_MS,TOTAL_TIME_ON_PAGE_MS,TOTAL_VIEWABLE_TIME_MS,"
 +

"VIEW_COUNT,USER_SEGMENT,DIM_DATE_KEY,VIEW_DATE,VIEW_DATE_TIMESTAMP,ROW_NUMBER");
FileInputFormat.setInputPaths(job, inputPath);
job.setMapperClass(WidgetPhoenixMapper.class);
job.setMapOutputKeyClass(NullWritable.class);
job.setMapOutputValueClass(WidgetPagesStatsWritable.class);
job.setOutputFormatClass(PhoenixOutputFormat.class);
TableMapReduceUtil.addDependencyJars(job);
job.setNumReduceTasks(0);
job.waitForCompletion(true);

public static class WidgetPhoenixMapper extends Mapper {
@Override
public void map(LongWritable longWritable, Text text, Context context) 
throws IOException, InterruptedException {
Configuration conf = context.getConfiguration();
String rundateString = conf.get("rundate");
PagesSegmentWidgetLineParser parser = new 
PagesSegmentWidgetLineParser();
try {
PagesSegmentWidget pagesSegmentWidget = 
parser.parse(text.toString());

if (pagesSegmentWidget != null) {
WidgetPagesStatsWritable widgetPagesStatsWritable = new 
WidgetPagesStatsWritable();
WidgetPagesStats widgetPagesStats = new WidgetPagesStats();

widgetPagesStats.setWebId(pagesSegmentWidget.getWebId());

widgetPagesStats.setWebPageLabel(pagesSegmentWidget.getWebPageLabel());

widgetPagesStats.setWidgetInstanceId(pagesSegmentWidget.getWidgetInstanceId());
…..

widgetPagesStatsWritable.setWidgetPagesStats(widgetPagesStats);
context.write(NullWritable.get(), widgetPagesStatsWritable);
}

}catch (Exception e){
e.printStackTrace();
}
}
}

public final class WidgetPagesStats {
private String webId;
private String webPageLabel;
private long widgetInstanceId;
private String widgetType;

…
@Override
public boolean equals(Object o) {

..
}
@Override
public int hashCode() {

..
}
@Override
public String toString() {
return "WidgetPhoenix{“….
'}';
}
}

public class WidgetPagesStatsWritable implements DBWritable, Writable {

private WidgetPagesStats widgetPagesStats;

public void readFields(DataInput input) throws IOException {
widgetPagesStats.setWebId(input.readLine());
widgetPagesStats.setWebPageLabel(input.readLine());
widgetPagesStats.setWidgetInstanceId(input.readLong());
widgetPagesStats.setWidgetType(input.readLine());

…
}

public void write(DataOutput output) throws IOException {
output.writeBytes(widgetPagesStats.getWebId());
output.writeBytes(widgetPagesStats.getWebPageLabel());

output.writeLong(widgetPagesStats.getWidgetInstanceId());
output.writeBytes(widgetPagesStats.getWidgetType());

..
}

public void readFields(ResultSet rs) throws SQLException {
widgetPagesStats.setWebId(rs.getString("WEB_ID"));
widgetPagesStats.setWebPageLabel(rs.getString("WEB_PAGE_LABEL"));
widgetPagesStats.setWidgetInstanceId(rs.getLong("WIDGET_INSTANCE_ID"));
widgetPagesStats.setWidgetType(rs.getString("WIDGET_TYPE"));

…
}

public void write(PreparedStatement pstmt) throws SQLException {
Connection connection = pstmt.getConnection();
PhoenixConnection phoenixConnection 

Re: Phoenix 4.6.0: sqlline.py Hangs From Remote Host

2015-12-07 Thread Steve Terrell
I may have figured it out.  I found out that when I created another EMR
cluster exactly the same as my working one (same AMI, Hbase version,
Phoenix version), the problem was duplicated.  I could still connect to the
first cluster's Phoenix end point, but got timeout on the second cluster.

So what was different between the two clusters?

For my cluster that has no sqlline issues, the private ip addresses for my
master and region servers were in my local /etc/hosts file. Looked
something like this:
10.0.100.38ip-10-0-100-38.ec2.internal
10.0.100.54ip-10-0-100-54.ec2.internal
10.0.100.55ip-10-0-100-55.ec2.internal
10.0.100.56ip-10-0-100-56.ec2.internal

Once I added the ips for  servers on the new cluster, it worked.  It will
be some time before I have a chance to see if this solves my original
problem with the newer Phoenix and HBase.  but wanted to share what I
learned in case someone else is scratching their head.

Meanwhile, does anyone know why the region server ips are important?  I
thought communication was only between the client and the master node.

Thanks,
Steve

On Sun, Nov 1, 2015 at 9:29 AM, Steve Terrell  wrote:

> Just had a thought:  Could it be that my Mac Java version is too new?
>
> Mac:
> $ java -version
> java version "1.8.0_60"
> Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
> Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)
>
> EMR node:
> $ java -version
> java version "1.7.0_71"
> Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
> Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode)
>
> I may try a switching to 1.7 later and report back.
>
> On Sun, Nov 1, 2015 at 9:24 AM, Steve Terrell 
> wrote:
>
>> Thanks, but I'm trying to run remotely.  I'm sure my /etc/hosts is fine
>> as I can ssh and "telnet  " OK.
>>
>> On Sun, Nov 1, 2015 at 9:21 AM, Steve Terrell 
>> wrote:
>>
>>> Thank you, but I'm sure this is not the case as I can easily run
>>> Squirrel client on my mac and query an older version of Phoenix (on another
>>> cluster) via port 2181.
>>>
>>> On Sat, Oct 31, 2015 at 2:21 PM, Naor David  wrote:
>>>
 run netstat -a 1 | grep "SYN_SENT" and check if port 2181 is blocked in
 your network..

 On Sat, Oct 31, 2015 at 8:00 PM, Steve Terrell 
 wrote:

> OK, did some more troubleshooting.  Still can't run sqlline.py from my
> macbook laptop.  Still hangs.
>
> My HBase cluster is an Amazon EMR, and I can run sqlline.py from any
> of nodes in the cluster, be they master, core, or task nodes.
>
> So maybe it's not so much a remote host issue but a problem with my
> Mac or some kind of Amazon EMR security issue.  But, I've tried opening up
> all ports just in case something other than 2181 is required, and no luck.
>
> Has anyone run this version of Phoenix on EMR and been able to use
> sqlline.py or SQuirreL client remotely from outside of AWS's private
> network?
>
> (My ultimate goal was to get SQuirreL working, but though sqlline.py
> would be an easier problem to tackle.  SQuirreL is getting timeouts which 
> I
> suspect are due to the same hanging that I see with sqlline.py.)
>
> Thanks,
> Steve
>
> On Wed, Oct 28, 2015 at 5:04 PM, Steve Terrell 
> wrote:
>
>> Yes, I can:
>>
>> $ telnet ** 2181
>> Trying 54.174.32.95...
>> Connected to **.
>> Escape character is '^]'.
>>
>> Thanks,
>> Steve
>>
>> On Wed, Oct 28, 2015 at 4:48 PM, Alok Singh 
>> wrote:
>>
>>> It looks like a zookeeper node is also on the master. Can you
>>> connect to  on port 2181 from the machine that you are 
>>> running
>>> sqlline.py on?
>>>
>>> Alok
>>>
>>> Alok
>>>
>>> a...@cloudability.com
>>>
>>> On Wed, Oct 28, 2015 at 2:23 PM, Steve Terrell <
>>> sterr...@oculus360.us> wrote:
>>>
 I can get "sqlline.py localhost" to work fine from the master node.

 However, when I try to run it remotely, all I get is this:

 java -cp "**/phoenix-4.6.0-HBase-0.98-client.jar"
 -Dlog4j.configuration=file:**/log4j.properties
 sqlline.SqlLine -d org.apache.phoenix.jdbc.PhoenixDriver -u 
 jdbc:phoenix:*>>> master ip*>:2181:/hbase -n none -p none --color=true
 --fastConnect=false --verbose=true 
 --isolation=TRANSACTION_READ_COMMITTED

 Setting property: [isolation, TRANSACTION_READ_COMMITTED]
 issuing: !connect jdbc:phoenix:*:2181:/hbase none
 none org.apache.phoenix.jdbc.PhoenixDriver
 Connecting to jdbc:phoenix:*:2181:/hbase
 15/10/28 15:29:44 WARN util.NativeCodeLoader: Unable to load
 native-hadoop library for your platform... using