Not able to connect to Phoenix Queryserver from Spark

2017-10-25 Thread cmbendre
I am trying to connect to Phoenix queryserver from Spark. Following Scala code works perfectly fine when i run it without spark. *import java.sql.{Connection, DriverManager, PreparedStatement, ResultSet, Statement} Class.forName("org.apache.phoenix.queryserver.client.Driver") val connection=

Index merge / join for multiple where clause

2017-06-05 Thread cmbendre
Hi, I have created a local index for every column in my table. Lets say my primary key is P_KEY and columns are COL1, COL2, etc When i do the query - "SELECT P_KEY FROM TABLENAME WHERE COL1='xxx'" It is scanning only the local index, and returning the results fast. But when i query - "SELECT

Large CSV bulk load stuck

2017-06-02 Thread cmbendre
Hi, I need some help in understanding how CsvBulkLoadTool works. I am trying to load data ~ 200 GB (There are 100 files of 2 GB each) from hdfs to Phoenix with 1 master and 4 region-servers. These region servers have 32 GB RAM and 16 cores each. Total HDFS disk space is 4 TB. The table is

Class org.apache.phoenix.mapreduce.bulkload.TableRowkeyPair not found

2017-06-01 Thread cmbendre
Trying to bulk load CSV file on Phoenix 4.9.0 on EMR. Following is the command - /export HADOOP_CLASSPATH=$(hbase mapredcp):/usr/lib/hbase/conf hadoop jar /usr/lib/phoenix/phoenix-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool -Dfs.permissions.umask-mode=000 --table PROFILESTORE

Re: Phoenix 4.9.0 with Spark 2.0

2017-05-31 Thread cmbendre
I saw that JIRA. But the issue is i am using Phoenix on AWS EMR, which comes with 4.9.0. I can upgrade the Phoenix version on EMR, but i am not sure if that needs more configuration on EMR side. Will replacing the Phoenix bin directory on all master and core nodes suffice ? Thanks Chaitanya

Phoenix 4.9.0 with Spark 2.0

2017-05-30 Thread cmbendre
Hi, I am trying Phoenix connector from Spark 2.0. I am using Phoenix 4.9.0 on EMR. My command to start spark shell - /./bin/spark-shell --master local --jars /usr/lib/phoenix/phoenix-spark-4.9.0-HBase-1.2.jar --jars /usr/lib/phoenix/phoenix-client.jar --conf

What is hardware used for the official performance page ?

2017-05-25 Thread cmbendre
Hi, There performance reports on Phoenix website - http://phoenix-bin.github.io/client/performance/latest.htm What is the cluster capacity used (RAM, Cores, CPU, Number of RegionServers) for testing these ? Sorry if this is indeed mentioned on the website and i missed it. Thanks Chaitanya

Re: Speeding Up Group By Queries

2017-05-25 Thread cmbendre
Hi, I am observing a similar behavior.I am doing POC of Phoenix since our query workloads are a mix of point lookups and aggregations. As i can see, Phoenix performs well on point lookups based on either PK or secondary index. But when it comes to aggregations, it has to do a full scan and its

Async Index Creation fails due to permission issue

2017-05-23 Thread cmbendre
I created an ASYNC index and ran the IndexTool Map-Reduce job to populate it. Here is the command i used hbase org.apache.phoenix.mapreduce.index.IndexTool --data-table MYTABLE --index-table MYTABLE_GLOBAL_INDEX --output-path MYTABLE_GLOBAL_INDEX_HFILE I can see that index HFiles are created

Enable Tracing in AWS EMR

2017-05-18 Thread cmbendre
Hi, I am trying to run pherf-cluster.py on AWS EMR with Hadoop - 2.7.1 + HBase - 1.3 + Phoenix 4.9. I am able to run pherf-standalone.py but pherf-cluster.py fails with following error - /Exception in thread "main" java.lang.AbstractMethodError:

Re: NoClassDefFoundError with pherf-cluster.py

2017-05-16 Thread cmbendre
Hi, I am running into the same issue. I added the "/usr/lib/phoenix/*" as you suggested but now i get new error, /Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/lang3/StringUtils at

Error in Pherf Test

2017-05-16 Thread cmbendre
I am using Pherf utility https://phoenix.apache.org/pherf.html on AWS EMR cluster to test performance of Phoenix + HBase. I tried to run these on a sample schema provided in config directory - "user_defined_schema.sql" + "user_defined_scenario.xml". But i always get different errors regarding

Re: Export large query results to CSV

2017-05-15 Thread cmbendre
Thank you for the info. I am open to writing code or contributing to the project if this feature is missing. Let me create an issue for this. -- View this message in context: http://apache-phoenix-user-list.1124778.n5.nabble.com/Export-large-query-results-to-CSV-tp3530p3537.html Sent from

Export large query results to CSV

2017-05-12 Thread cmbendre
Hi, Some of our queries on Phoenix cluster gives millions of rows as a result. How do i export these results to a csv file or s3 location ? The query output size is approximately in range of GBs Thanks, Chaitanya -- View this message in context: