Re: Error when using Pig Storage: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected

2014-08-28 Thread Ravi Kiran
Hi ,
   With both PHOENIX-1184 , PHOENIX-1183 tickets fixed, I happened to do a
quick test of Pig scripts on a CDH 5.1.0 backed cluster and things are
looking good. Below are the steps I followed,
a) Downloaded the binaries from
https://dist.apache.org/repos/dist/dev/phoenix/phoenix-4.1.0-rc1/bin/
b) Copied phoenix-4.1.0-server-hadoop2.jar to HBase Region Server lib path
and restarted.
c) Copied phoenix-4.1.0-client-hadoop2.jar and phoenix-pig-4.1.0-hadoop2.jar
on to the gateway node where I planned to run my Pig scripts.
d) Registered the following jars within the Pig script
   zookeeper.jar
   hbase-hadoop2-compat.jar
   hbase-client.jar
   hbase-protocol-0.98.1-cdh5.1.0.jar
   phoenix-4.1.0-client-hadoop2.jar
   phoenix-pig-4.1.0-hadoop2.jar


Regards
Ravi



On Tue, Aug 19, 2014 at 5:43 PM, Russell Jurney russell.jur...@gmail.com
wrote:

 I agree the vendor should resolve these issues. Hortonworks has already
 included Phoenix in HDP. Cloudera is behind the curve here. I'm told
 they'll include Phoenix when they feel they can support it well.

 That being said, wouldn't adding CDH/HDP options in pom.xml make the
 project easier to use, and result in more people trying to use CDH/HDP with
 Phoenix (and more peiople using Phoenix in general), which would bring up
 bugs like the ones here? Ideally the vendors would fix these JIRAs. That
 would seem to be a good thing.

 ᐧ


 On Tue, Aug 19, 2014 at 5:34 PM, Andrew Purtell apurt...@apache.org
 wrote:

 Maybe pick on didn't get close enough to what I was after.

  Maybe this is something I can fix. If I were to add the
 cloudera/hortonworks maven repos, and then add some supported options for
 hadoop beyond 1/2, that would pretty much do it, right?

 I doubt it, because v4 and master branches probably won't compile against
 either, certainly not against CDH 5.1, their HBase releases are out of step
 and stale with respect to the latest Apache HBase 0.98 and Apache Phoenix 4
 releases.

 Getting back to my point, it's unfair in my opinion to expect the
 upstream volunteer Apache projects to track all of the commercial options
 and the vagaries of their arbitrary code freezes and curated additional
 patches. It's unfair to expect Salesforce to fund such an effort, unless
 Salesforce has somehow gone into the Hadoop distribution business.
 Certainly I am not speaking on behalf of Salesforce or anyone else here. On
 the other hand, I think it would be totally reasonable to request your
 favorite vendor address Phoenix related issues with *their* derivative
 distributions.



 On Tue, Aug 19, 2014 at 3:53 PM, Russell Jurney russell.jur...@gmail.com
  wrote:

 Maybe this is something I can fix. If I were to add the
 cloudera/hortonworks maven repos, and then add some supported options for
 hadoop beyond 1/2, that would pretty much do it, right?
 ᐧ


 On Tue, Aug 19, 2014 at 3:49 PM, Jesse Yates jesse.k.ya...@gmail.com
 wrote:

 FWIW internally at Salesforce we also patch the HBase and Hadoop poms
 to support our own internal 'light forks'. Its really not a big deal to
 manage - a couple of jenkins jobs (one to automate, one to track open
 source changes and ensure your patch(es) still work, etc) and you are good
 to go.

 I imagine this is also what various distributors are doing for their
 forks as well.

 ---
 Jesse Yates
 @jesse_yates
 jyates.github.com


 On Tue, Aug 19, 2014 at 3:36 PM, Russell Jurney 
 russell.jur...@gmail.com wrote:

 First of all, I apologize if you feel like I was picking on you. I was
 not trying to do that.

 My understanding is that Salesforce pays people to work on Phoenix. Is
 that not the case? I'm hoping one of them will add spark-like support for
 CDH and HDP to advance the project.

 And I don't mention the POM thing to pick on someone. The majority of
 HBase users are not going to be able to use Phoenix because they run a
 commercial distribution of Hadoop and aren't pom wizards. That seems kind
 of important for the well being of the project.
 ᐧ


 On Tue, Aug 19, 2014 at 3:26 PM, Andrew Purtell apurt...@apache.org
 wrote:

 I don't think an Apache project should spend precious bandwidth
 tracking the various and sundry redistributors of Apache ecosystem
 projects. This is putting the cart before the horse. The horse is the
 Apache upstream projects. The cart is the commercial distributions
 leveraging the Apache ecosystem for profit. Spark is not a good example, 
 it
 is supported by a commercial concern, Databricks. What commercial company
 supports Phoenix? Why do you think it is appropriate to pick on 
 volunteers
 because editing POM files is too much work?


 On Tue, Aug 19, 2014 at 3:09 PM, Russell Jurney 
 russell.jur...@gmail.com wrote:

 I also created https://issues.apache.org/jira/browse/PHOENIX-1185
 because requiring users to hand-edit the pom.xml just to build against 
 CDH
 and HDP is nuts.
 ᐧ


 On Tue, Aug 19, 2014 at 3:03 PM, Russell Jurney 
 russell.jur...@gmail.com 

Re: ManagedTests and 4.1.0-RC1

2014-08-28 Thread Dan Di Spaltro
Any idea on this, it's blocking my usage in tests and I can't tell if I am
just setting something up incorrectly?  Also I am concerned that this can
affect production since this code path I would assume is used frequently.

-Dan


On Tue, Aug 26, 2014 at 10:49 PM, Dan Di Spaltro dan.dispal...@gmail.com
wrote:

 I inherit from the BaseHBaseManagedTimeIT and implement my own tests using
 the infrastructure you've put together.  It's worked pretty well, minus the
 fact I use an Ivy resolver which doesn't deal with jarless pom's well.

 So I've upgraded from 4.0 to 4.1 and ran into a single issue that looks
 related to Tracing, and I can't really figure it out.  When I start the
 cluster everything works as expected but after I am done creating tables
 like clockwork I get this:

 58062 [defaultRpcServer.handler=2,queue=0,port=53950] WARN
  org.apache.hadoop.ipc.RpcServer  -
 defaultRpcServer.handler=2,queue=0,port=53950: caught:
 java.lang.IllegalArgumentException: offset (0) + length (4) exceed the
 capacity of the array: 3
  at
 org.apache.hadoop.hbase.util.Bytes.explainWrongLengthOrOffset(Bytes.java:600)
 at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:749)
  at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:725)
 at
 org.apache.phoenix.trace.TracingCompat.readAnnotation(TracingCompat.java:56)
  at
 org.apache.phoenix.trace.TraceMetricSource.receiveSpan(TraceMetricSource.java:121)
 at org.cloudera.htrace.Tracer.deliver(Tracer.java:81)
  at org.cloudera.htrace.impl.MilliSpan.stop(MilliSpan.java:70)
 at org.cloudera.htrace.TraceScope.close(TraceScope.java:70)
  at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106)
 at
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
  at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 at java.lang.Thread.run(Thread.java:744)

 And the test just stops, which I imagine is a byproduct of this exception.
  I inspected at this point and there are two traces the one it throws on is
 the key is user and value is my username. It's trying to convert it to an
 int
 ...
 return new PairString, String(new String(key),
 Integer.toString(Bytes.toInt(value)));
 ...

 Any ideas?

 -Dan

 --
 Dan Di Spaltro




-- 
Dan Di Spaltro


Re: ManagedTests and 4.1.0-RC1

2014-08-28 Thread Samarth Jain
Dan,

Can you tell me how you are running your tests? Do you have the test class
annotated with the right category annotation - @Category(
HBaseManagedTimeTest.class). Also, can you send over your test class to see
what might be causing problems?

Thanks,
Samarth


On Thu, Aug 28, 2014 at 10:34 AM, Dan Di Spaltro dan.dispal...@gmail.com
wrote:

 Any idea on this, it's blocking my usage in tests and I can't tell if I am
 just setting something up incorrectly?  Also I am concerned that this can
 affect production since this code path I would assume is used frequently.

 -Dan


 On Tue, Aug 26, 2014 at 10:49 PM, Dan Di Spaltro dan.dispal...@gmail.com
 wrote:

 I inherit from the BaseHBaseManagedTimeIT and implement my own tests
 using the infrastructure you've put together.  It's worked pretty well,
 minus the fact I use an Ivy resolver which doesn't deal with jarless pom's
 well.

 So I've upgraded from 4.0 to 4.1 and ran into a single issue that looks
 related to Tracing, and I can't really figure it out.  When I start the
 cluster everything works as expected but after I am done creating tables
 like clockwork I get this:

 58062 [defaultRpcServer.handler=2,queue=0,port=53950] WARN
  org.apache.hadoop.ipc.RpcServer  -
 defaultRpcServer.handler=2,queue=0,port=53950: caught:
 java.lang.IllegalArgumentException: offset (0) + length (4) exceed the
 capacity of the array: 3
  at
 org.apache.hadoop.hbase.util.Bytes.explainWrongLengthOrOffset(Bytes.java:600)
 at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:749)
  at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:725)
 at
 org.apache.phoenix.trace.TracingCompat.readAnnotation(TracingCompat.java:56)
  at
 org.apache.phoenix.trace.TraceMetricSource.receiveSpan(TraceMetricSource.java:121)
 at org.cloudera.htrace.Tracer.deliver(Tracer.java:81)
  at org.cloudera.htrace.impl.MilliSpan.stop(MilliSpan.java:70)
 at org.cloudera.htrace.TraceScope.close(TraceScope.java:70)
  at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106)
 at
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
  at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 at java.lang.Thread.run(Thread.java:744)

 And the test just stops, which I imagine is a byproduct of this
 exception.  I inspected at this point and there are two traces the one it
 throws on is the key is user and value is my username. It's trying to
 convert it to an int
 ...
 return new PairString, String(new String(key),
 Integer.toString(Bytes.toInt(value)));
 ...

 Any ideas?

 -Dan

 --
 Dan Di Spaltro




 --
 Dan Di Spaltro



Re: Tracing Q's

2014-08-28 Thread Dan Di Spaltro
On Wed, Aug 27, 2014 at 12:35 PM, Jesse Yates jesse.k.ya...@gmail.com
wrote:

 To start with, there are a bunch of things we are planning with tracing:
 https://issues.apache.org/jira/browse/PHOENIX-1121

 But to answer your questions,


 Can we use something like zipkin-htrace adapter for Phoenix traces? And
 if I did would the calls be coming from the RS?


 Yes, we could roll in something like that as well, but there just isn't a
 knob for it right now. You would need the same config on the server (all
 the tracing config goes through the same interface though, so it should be
 too hard). Right now, its all being written to an HBase table from both
 sides of the request, so you could pull that in later to populate zipkin as
 all.

 We could also add a span receiver to write to zipkin. I'd be more
 inclinded to write to zipkin from the phoenix table as thats more likely to
 be stable storage. All the same information would be there, but I'd trust
 my HBase tables :)


Yeah probably the best idea would be to see if you can wire up the zipkin
frontend to the phoenix backend, and just skip all the zipkin complexity
(collectors - storage - aggregators and so on.)



 * How do you get the trace id on a query you create?


 If there is something you are looking to trace, you could actually create
 a trace before creating your phoenix request, and pull the traceID out of
 there (you could also add any annotations you wanted, like the app server's
 request id) Phoenix will either continue the trace, if one is started, or
 start a new one, if configured to do so.

 Starting a new one is generally just for introspection into a running
 system to see how things are doing. It wouldn't be tied to anything in
 particular. There is some pending work in the above mentioned JIRA for
 adding tag (timeline annotations, in HTrace parlance)/annotations
 (key-value annotations) to a phoenix request/connection, but you should be
 able to do what you want just by starting the trace before making the
 phoenix request. If phoenix is configured correctly, it should just work
 with the rest of the phoenix trace sink infrastructure


Great call, probably at this point my only questions would be to the htrace
mailing list and about how things like futures (where thread local doesn't
really work) work with Spans.



  Do you have to load the DDL manually


 Nope, its part of the PhoenixTableMetricsWriter, here
 https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/trace/PhoenixTableMetricsWriter.java#L142.
 When it receives a metric (really, just a conversion of a span to a Hadoop
 metrics2 metric), it will create the table as needed.

 Hope that helps!


Thanks I still haven't got the new tracing configuration to work yet
(nothing is going in my table), but Ill keep trying and start a new thread
with anything I find.

Thanks for the help!



 ---
 Jesse Yates
 @jesse_yates
 jyates.github.com


 On Tue, Aug 26, 2014 at 7:21 PM, Dan Di Spaltro dan.dispal...@gmail.com
 wrote:

 I've used the concept of tracing quite a bit in previous projects and I
 had a couple questions:

 * Can we use something like zipkin-htrace adapter for Phoenix traces? And
 if I did would the calls be coming from the RS?
 * How do you get the trace id on a query you create?  Generally I've used
 something where I can log back to a client a trace/span, and then go look
 through the queries to match up why something took so long, etc. I could be
 thinking about this wrong...
 * Do you have to load the DDL manually, nothing seems to auto-create it,
 no system table seems to be created outside of sequences and tables.  I
 have the default config files from Phoenix on the classpath.  I also have
 the compat and server jars on the CP.  Below are the log lines I see in the
 master and regionserver.
   - I have set props.setProperty(phoenix.trace.frequency, always) for
 every query.

 2014-08-27 01:55:27,483 INFO  [main] trace.PhoenixMetricsSink: Writing
 tracing metrics to phoenix table
 2014-08-27 01:55:27,484 INFO  [main] trace.PhoenixMetricsSink:
 Instantiating writer class:
 org.apache.phoenix.trace.PhoenixTableMetricsWriter
 2014-08-27 01:55:27,490 INFO  [main] trace.PhoenixTableMetricsWriter:
 Phoenix tracing writer started

 Thanks for the help,

 -Dan

 --
 Dan Di Spaltro





-- 
Dan Di Spaltro


Re: ManagedTests and 4.1.0-RC1

2014-08-28 Thread Dan Di Spaltro
I basically inherit from BaseClientManagedTimeIT and write a junit tests

It's been working great up until 4.1.

This code just doesn't look right, why would an annotation necessarily have
to be an int?

https://github.com/apache/phoenix/blob/29a7be42bfa468b12d16fd0756b987f5359c45c4/phoenix-hadoop2-compat/src/main/java/org/apache/phoenix/trace/TraceMetricSource.java#L122

then calls the below function, which takes bytes and makes an int from the
bytes...

https://github.com/apache/phoenix/blob/f99e5d8d609d326fb3571255cd8f47961b1c6860/phoenix-hadoop-compat/src/main/java/org/apache/phoenix/trace/TracingCompat.java#L56


On Thu, Aug 28, 2014 at 11:43 AM, Samarth Jain samarth.j...@gmail.com
wrote:

 Dan,

 Can you tell me how you are running your tests? Do you have the test class
 annotated with the right category annotation - @Category(
 HBaseManagedTimeTest.class). Also, can you send over your test class to
 see what might be causing problems?

 Thanks,
 Samarth


 On Thu, Aug 28, 2014 at 10:34 AM, Dan Di Spaltro dan.dispal...@gmail.com
 wrote:

 Any idea on this, it's blocking my usage in tests and I can't tell if I
 am just setting something up incorrectly?  Also I am concerned that this
 can affect production since this code path I would assume is used
 frequently.

 -Dan


 On Tue, Aug 26, 2014 at 10:49 PM, Dan Di Spaltro dan.dispal...@gmail.com
  wrote:

 I inherit from the BaseHBaseManagedTimeIT and implement my own tests
 using the infrastructure you've put together.  It's worked pretty well,
 minus the fact I use an Ivy resolver which doesn't deal with jarless pom's
 well.

 So I've upgraded from 4.0 to 4.1 and ran into a single issue that looks
 related to Tracing, and I can't really figure it out.  When I start the
 cluster everything works as expected but after I am done creating tables
 like clockwork I get this:

 58062 [defaultRpcServer.handler=2,queue=0,port=53950] WARN
  org.apache.hadoop.ipc.RpcServer  -
 defaultRpcServer.handler=2,queue=0,port=53950: caught:
 java.lang.IllegalArgumentException: offset (0) + length (4) exceed the
 capacity of the array: 3
  at
 org.apache.hadoop.hbase.util.Bytes.explainWrongLengthOrOffset(Bytes.java:600)
 at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:749)
  at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:725)
 at
 org.apache.phoenix.trace.TracingCompat.readAnnotation(TracingCompat.java:56)
  at
 org.apache.phoenix.trace.TraceMetricSource.receiveSpan(TraceMetricSource.java:121)
 at org.cloudera.htrace.Tracer.deliver(Tracer.java:81)
  at org.cloudera.htrace.impl.MilliSpan.stop(MilliSpan.java:70)
 at org.cloudera.htrace.TraceScope.close(TraceScope.java:70)
  at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106)
 at
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
  at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 at java.lang.Thread.run(Thread.java:744)

 And the test just stops, which I imagine is a byproduct of this
 exception.  I inspected at this point and there are two traces the one it
 throws on is the key is user and value is my username. It's trying to
 convert it to an int
 ...
 return new PairString, String(new String(key),
 Integer.toString(Bytes.toInt(value)));
 ...

 Any ideas?

 -Dan

 --
 Dan Di Spaltro




 --
 Dan Di Spaltro





-- 
Dan Di Spaltro