Re: phoenix-spark and pyspark

2016-01-20 Thread Nick Dimiduk
I finally got to the bottom of things. There were two issues at play in my particular environment. 1. An Ambari bug [0] means my spark-defaults.conf file was garbage. I hardly thought of it when I hit the issue with MR job submission; its impact on Spark was much more subtle. 2. YARN client

Re: Telco HBase POC

2016-01-20 Thread Vijay Vangapandu
Hi guys, We recently migrated one of our user facing use cases to HBase and we are using Phoenix as query layer. We managed to get a singles record in 30Ms. Here is the response times breakdown. 75th - 29MS 95th - 43MS 99th - 76 MS We have about 6Billion records in store and each row contains

Re: phoenix-spark and pyspark

2016-01-20 Thread Josh Mahonin
That's great to hear. Looking forward to the doc patch! On Wed, Jan 20, 2016 at 3:43 PM, Nick Dimiduk wrote: > Josh -- I deployed my updated phoenix build across the cluster, added the > phoenix-client-spark.jar to configs on the whole cluster, and now basic > dataframe

Re: phoenix-spark and pyspark

2016-01-20 Thread Nick Dimiduk
Well, I spoke too soon. It's working, but in local mode only. When I invoke `pyspark --master yarn` (or yarn-client), the submitted application goes from ACCEPTED to FAILED, with a NumberFormatException [0] in my container log. Now that Phoenix is on my classpath, I'm suspicious that the versions

RE: Telco HBase POC

2016-01-20 Thread Willem Conradie
Hi James, Thanks for being willing to assist. This is what the input data record will look like (test data) : UserID DateTime TXNID DeviceID IPAddress UsageArray URIArray 12345678901 20151006124945 992194978 123456789012345 111.111.111.111

RE: Telco HBase POC

2016-01-20 Thread Riesland, Zack
I have a similar data pattern and 100ms response time is fairly consistent. I’ve been trying hard to find the right set of configs to get closer to 10-20ms with no luck, but I’m finding that 100ms average is pretty reasonable. From: Willem Conradie [mailto:willem.conra...@pbtgroup.co.za] Sent: