Re: Error when using Pig Storage: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
Hi , With both PHOENIX-1184 , PHOENIX-1183 tickets fixed, I happened to do a quick test of Pig scripts on a CDH 5.1.0 backed cluster and things are looking good. Below are the steps I followed, a) Downloaded the binaries from https://dist.apache.org/repos/dist/dev/phoenix/phoenix-4.1.0-rc1/bin/ b) Copied phoenix-4.1.0-server-hadoop2.jar to HBase Region Server lib path and restarted. c) Copied phoenix-4.1.0-client-hadoop2.jar and phoenix-pig-4.1.0-hadoop2.jar on to the gateway node where I planned to run my Pig scripts. d) Registered the following jars within the Pig script zookeeper.jar hbase-hadoop2-compat.jar hbase-client.jar hbase-protocol-0.98.1-cdh5.1.0.jar phoenix-4.1.0-client-hadoop2.jar phoenix-pig-4.1.0-hadoop2.jar Regards Ravi On Tue, Aug 19, 2014 at 5:43 PM, Russell Jurney russell.jur...@gmail.com wrote: I agree the vendor should resolve these issues. Hortonworks has already included Phoenix in HDP. Cloudera is behind the curve here. I'm told they'll include Phoenix when they feel they can support it well. That being said, wouldn't adding CDH/HDP options in pom.xml make the project easier to use, and result in more people trying to use CDH/HDP with Phoenix (and more peiople using Phoenix in general), which would bring up bugs like the ones here? Ideally the vendors would fix these JIRAs. That would seem to be a good thing. ᐧ On Tue, Aug 19, 2014 at 5:34 PM, Andrew Purtell apurt...@apache.org wrote: Maybe pick on didn't get close enough to what I was after. Maybe this is something I can fix. If I were to add the cloudera/hortonworks maven repos, and then add some supported options for hadoop beyond 1/2, that would pretty much do it, right? I doubt it, because v4 and master branches probably won't compile against either, certainly not against CDH 5.1, their HBase releases are out of step and stale with respect to the latest Apache HBase 0.98 and Apache Phoenix 4 releases. Getting back to my point, it's unfair in my opinion to expect the upstream volunteer Apache projects to track all of the commercial options and the vagaries of their arbitrary code freezes and curated additional patches. It's unfair to expect Salesforce to fund such an effort, unless Salesforce has somehow gone into the Hadoop distribution business. Certainly I am not speaking on behalf of Salesforce or anyone else here. On the other hand, I think it would be totally reasonable to request your favorite vendor address Phoenix related issues with *their* derivative distributions. On Tue, Aug 19, 2014 at 3:53 PM, Russell Jurney russell.jur...@gmail.com wrote: Maybe this is something I can fix. If I were to add the cloudera/hortonworks maven repos, and then add some supported options for hadoop beyond 1/2, that would pretty much do it, right? ᐧ On Tue, Aug 19, 2014 at 3:49 PM, Jesse Yates jesse.k.ya...@gmail.com wrote: FWIW internally at Salesforce we also patch the HBase and Hadoop poms to support our own internal 'light forks'. Its really not a big deal to manage - a couple of jenkins jobs (one to automate, one to track open source changes and ensure your patch(es) still work, etc) and you are good to go. I imagine this is also what various distributors are doing for their forks as well. --- Jesse Yates @jesse_yates jyates.github.com On Tue, Aug 19, 2014 at 3:36 PM, Russell Jurney russell.jur...@gmail.com wrote: First of all, I apologize if you feel like I was picking on you. I was not trying to do that. My understanding is that Salesforce pays people to work on Phoenix. Is that not the case? I'm hoping one of them will add spark-like support for CDH and HDP to advance the project. And I don't mention the POM thing to pick on someone. The majority of HBase users are not going to be able to use Phoenix because they run a commercial distribution of Hadoop and aren't pom wizards. That seems kind of important for the well being of the project. ᐧ On Tue, Aug 19, 2014 at 3:26 PM, Andrew Purtell apurt...@apache.org wrote: I don't think an Apache project should spend precious bandwidth tracking the various and sundry redistributors of Apache ecosystem projects. This is putting the cart before the horse. The horse is the Apache upstream projects. The cart is the commercial distributions leveraging the Apache ecosystem for profit. Spark is not a good example, it is supported by a commercial concern, Databricks. What commercial company supports Phoenix? Why do you think it is appropriate to pick on volunteers because editing POM files is too much work? On Tue, Aug 19, 2014 at 3:09 PM, Russell Jurney russell.jur...@gmail.com wrote: I also created https://issues.apache.org/jira/browse/PHOENIX-1185 because requiring users to hand-edit the pom.xml just to build against CDH and HDP is nuts. ᐧ On Tue, Aug 19, 2014 at 3:03 PM, Russell Jurney russell.jur...@gmail.com
Re: ManagedTests and 4.1.0-RC1
Any idea on this, it's blocking my usage in tests and I can't tell if I am just setting something up incorrectly? Also I am concerned that this can affect production since this code path I would assume is used frequently. -Dan On Tue, Aug 26, 2014 at 10:49 PM, Dan Di Spaltro dan.dispal...@gmail.com wrote: I inherit from the BaseHBaseManagedTimeIT and implement my own tests using the infrastructure you've put together. It's worked pretty well, minus the fact I use an Ivy resolver which doesn't deal with jarless pom's well. So I've upgraded from 4.0 to 4.1 and ran into a single issue that looks related to Tracing, and I can't really figure it out. When I start the cluster everything works as expected but after I am done creating tables like clockwork I get this: 58062 [defaultRpcServer.handler=2,queue=0,port=53950] WARN org.apache.hadoop.ipc.RpcServer - defaultRpcServer.handler=2,queue=0,port=53950: caught: java.lang.IllegalArgumentException: offset (0) + length (4) exceed the capacity of the array: 3 at org.apache.hadoop.hbase.util.Bytes.explainWrongLengthOrOffset(Bytes.java:600) at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:749) at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:725) at org.apache.phoenix.trace.TracingCompat.readAnnotation(TracingCompat.java:56) at org.apache.phoenix.trace.TraceMetricSource.receiveSpan(TraceMetricSource.java:121) at org.cloudera.htrace.Tracer.deliver(Tracer.java:81) at org.cloudera.htrace.impl.MilliSpan.stop(MilliSpan.java:70) at org.cloudera.htrace.TraceScope.close(TraceScope.java:70) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) at java.lang.Thread.run(Thread.java:744) And the test just stops, which I imagine is a byproduct of this exception. I inspected at this point and there are two traces the one it throws on is the key is user and value is my username. It's trying to convert it to an int ... return new PairString, String(new String(key), Integer.toString(Bytes.toInt(value))); ... Any ideas? -Dan -- Dan Di Spaltro -- Dan Di Spaltro
Re: ManagedTests and 4.1.0-RC1
Dan, Can you tell me how you are running your tests? Do you have the test class annotated with the right category annotation - @Category( HBaseManagedTimeTest.class). Also, can you send over your test class to see what might be causing problems? Thanks, Samarth On Thu, Aug 28, 2014 at 10:34 AM, Dan Di Spaltro dan.dispal...@gmail.com wrote: Any idea on this, it's blocking my usage in tests and I can't tell if I am just setting something up incorrectly? Also I am concerned that this can affect production since this code path I would assume is used frequently. -Dan On Tue, Aug 26, 2014 at 10:49 PM, Dan Di Spaltro dan.dispal...@gmail.com wrote: I inherit from the BaseHBaseManagedTimeIT and implement my own tests using the infrastructure you've put together. It's worked pretty well, minus the fact I use an Ivy resolver which doesn't deal with jarless pom's well. So I've upgraded from 4.0 to 4.1 and ran into a single issue that looks related to Tracing, and I can't really figure it out. When I start the cluster everything works as expected but after I am done creating tables like clockwork I get this: 58062 [defaultRpcServer.handler=2,queue=0,port=53950] WARN org.apache.hadoop.ipc.RpcServer - defaultRpcServer.handler=2,queue=0,port=53950: caught: java.lang.IllegalArgumentException: offset (0) + length (4) exceed the capacity of the array: 3 at org.apache.hadoop.hbase.util.Bytes.explainWrongLengthOrOffset(Bytes.java:600) at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:749) at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:725) at org.apache.phoenix.trace.TracingCompat.readAnnotation(TracingCompat.java:56) at org.apache.phoenix.trace.TraceMetricSource.receiveSpan(TraceMetricSource.java:121) at org.cloudera.htrace.Tracer.deliver(Tracer.java:81) at org.cloudera.htrace.impl.MilliSpan.stop(MilliSpan.java:70) at org.cloudera.htrace.TraceScope.close(TraceScope.java:70) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) at java.lang.Thread.run(Thread.java:744) And the test just stops, which I imagine is a byproduct of this exception. I inspected at this point and there are two traces the one it throws on is the key is user and value is my username. It's trying to convert it to an int ... return new PairString, String(new String(key), Integer.toString(Bytes.toInt(value))); ... Any ideas? -Dan -- Dan Di Spaltro -- Dan Di Spaltro
Re: Tracing Q's
On Wed, Aug 27, 2014 at 12:35 PM, Jesse Yates jesse.k.ya...@gmail.com wrote: To start with, there are a bunch of things we are planning with tracing: https://issues.apache.org/jira/browse/PHOENIX-1121 But to answer your questions, Can we use something like zipkin-htrace adapter for Phoenix traces? And if I did would the calls be coming from the RS? Yes, we could roll in something like that as well, but there just isn't a knob for it right now. You would need the same config on the server (all the tracing config goes through the same interface though, so it should be too hard). Right now, its all being written to an HBase table from both sides of the request, so you could pull that in later to populate zipkin as all. We could also add a span receiver to write to zipkin. I'd be more inclinded to write to zipkin from the phoenix table as thats more likely to be stable storage. All the same information would be there, but I'd trust my HBase tables :) Yeah probably the best idea would be to see if you can wire up the zipkin frontend to the phoenix backend, and just skip all the zipkin complexity (collectors - storage - aggregators and so on.) * How do you get the trace id on a query you create? If there is something you are looking to trace, you could actually create a trace before creating your phoenix request, and pull the traceID out of there (you could also add any annotations you wanted, like the app server's request id) Phoenix will either continue the trace, if one is started, or start a new one, if configured to do so. Starting a new one is generally just for introspection into a running system to see how things are doing. It wouldn't be tied to anything in particular. There is some pending work in the above mentioned JIRA for adding tag (timeline annotations, in HTrace parlance)/annotations (key-value annotations) to a phoenix request/connection, but you should be able to do what you want just by starting the trace before making the phoenix request. If phoenix is configured correctly, it should just work with the rest of the phoenix trace sink infrastructure Great call, probably at this point my only questions would be to the htrace mailing list and about how things like futures (where thread local doesn't really work) work with Spans. Do you have to load the DDL manually Nope, its part of the PhoenixTableMetricsWriter, here https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/trace/PhoenixTableMetricsWriter.java#L142. When it receives a metric (really, just a conversion of a span to a Hadoop metrics2 metric), it will create the table as needed. Hope that helps! Thanks I still haven't got the new tracing configuration to work yet (nothing is going in my table), but Ill keep trying and start a new thread with anything I find. Thanks for the help! --- Jesse Yates @jesse_yates jyates.github.com On Tue, Aug 26, 2014 at 7:21 PM, Dan Di Spaltro dan.dispal...@gmail.com wrote: I've used the concept of tracing quite a bit in previous projects and I had a couple questions: * Can we use something like zipkin-htrace adapter for Phoenix traces? And if I did would the calls be coming from the RS? * How do you get the trace id on a query you create? Generally I've used something where I can log back to a client a trace/span, and then go look through the queries to match up why something took so long, etc. I could be thinking about this wrong... * Do you have to load the DDL manually, nothing seems to auto-create it, no system table seems to be created outside of sequences and tables. I have the default config files from Phoenix on the classpath. I also have the compat and server jars on the CP. Below are the log lines I see in the master and regionserver. - I have set props.setProperty(phoenix.trace.frequency, always) for every query. 2014-08-27 01:55:27,483 INFO [main] trace.PhoenixMetricsSink: Writing tracing metrics to phoenix table 2014-08-27 01:55:27,484 INFO [main] trace.PhoenixMetricsSink: Instantiating writer class: org.apache.phoenix.trace.PhoenixTableMetricsWriter 2014-08-27 01:55:27,490 INFO [main] trace.PhoenixTableMetricsWriter: Phoenix tracing writer started Thanks for the help, -Dan -- Dan Di Spaltro -- Dan Di Spaltro
Re: ManagedTests and 4.1.0-RC1
I basically inherit from BaseClientManagedTimeIT and write a junit tests It's been working great up until 4.1. This code just doesn't look right, why would an annotation necessarily have to be an int? https://github.com/apache/phoenix/blob/29a7be42bfa468b12d16fd0756b987f5359c45c4/phoenix-hadoop2-compat/src/main/java/org/apache/phoenix/trace/TraceMetricSource.java#L122 then calls the below function, which takes bytes and makes an int from the bytes... https://github.com/apache/phoenix/blob/f99e5d8d609d326fb3571255cd8f47961b1c6860/phoenix-hadoop-compat/src/main/java/org/apache/phoenix/trace/TracingCompat.java#L56 On Thu, Aug 28, 2014 at 11:43 AM, Samarth Jain samarth.j...@gmail.com wrote: Dan, Can you tell me how you are running your tests? Do you have the test class annotated with the right category annotation - @Category( HBaseManagedTimeTest.class). Also, can you send over your test class to see what might be causing problems? Thanks, Samarth On Thu, Aug 28, 2014 at 10:34 AM, Dan Di Spaltro dan.dispal...@gmail.com wrote: Any idea on this, it's blocking my usage in tests and I can't tell if I am just setting something up incorrectly? Also I am concerned that this can affect production since this code path I would assume is used frequently. -Dan On Tue, Aug 26, 2014 at 10:49 PM, Dan Di Spaltro dan.dispal...@gmail.com wrote: I inherit from the BaseHBaseManagedTimeIT and implement my own tests using the infrastructure you've put together. It's worked pretty well, minus the fact I use an Ivy resolver which doesn't deal with jarless pom's well. So I've upgraded from 4.0 to 4.1 and ran into a single issue that looks related to Tracing, and I can't really figure it out. When I start the cluster everything works as expected but after I am done creating tables like clockwork I get this: 58062 [defaultRpcServer.handler=2,queue=0,port=53950] WARN org.apache.hadoop.ipc.RpcServer - defaultRpcServer.handler=2,queue=0,port=53950: caught: java.lang.IllegalArgumentException: offset (0) + length (4) exceed the capacity of the array: 3 at org.apache.hadoop.hbase.util.Bytes.explainWrongLengthOrOffset(Bytes.java:600) at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:749) at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:725) at org.apache.phoenix.trace.TracingCompat.readAnnotation(TracingCompat.java:56) at org.apache.phoenix.trace.TraceMetricSource.receiveSpan(TraceMetricSource.java:121) at org.cloudera.htrace.Tracer.deliver(Tracer.java:81) at org.cloudera.htrace.impl.MilliSpan.stop(MilliSpan.java:70) at org.cloudera.htrace.TraceScope.close(TraceScope.java:70) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) at java.lang.Thread.run(Thread.java:744) And the test just stops, which I imagine is a byproduct of this exception. I inspected at this point and there are two traces the one it throws on is the key is user and value is my username. It's trying to convert it to an int ... return new PairString, String(new String(key), Integer.toString(Bytes.toInt(value))); ... Any ideas? -Dan -- Dan Di Spaltro -- Dan Di Spaltro -- Dan Di Spaltro