Re: Spark Plugin Information
Hi Josh, I am using CDH 5.5.2 with HBase 1.0.0, Phoenix 4.5.2, and Spark 1.6.0. I looked up the error and found others who led me to ask the question. I’ll try to use Phoenix 4.7.0 client jar and see what happens. The error I am getting is: java.sql.SQLException: ERROR 103 (08004): Unable to establish connection. at org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:388) at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145) at org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:296) at org.apache.phoenix.query.ConnectionQueryServicesImpl.access$300(ConnectionQueryServicesImpl.java:179) at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1917) at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1896) at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:77) at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:1896) at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:180) at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:132) at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:151) at java.sql.DriverManager.getConnection(DriverManager.java:571) at java.sql.DriverManager.getConnection(DriverManager.java:187) at org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:93) at org.apache.phoenix.mapreduce.util.ConnectionUtil.getInputConnection(ConnectionUtil.java:57) at org.apache.phoenix.mapreduce.util.ConnectionUtil.getInputConnection(ConnectionUtil.java:45) at org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil.getSelectColumnMetadataList(PhoenixConfigurationUtil.java:280) at org.apache.phoenix.spark.PhoenixRDD.toDataFrame(PhoenixRDD.scala:101) at org.apache.phoenix.spark.PhoenixRelation.schema(PhoenixRelation.scala:57) at org.apache.spark.sql.execution.datasources.LogicalRelation.(LogicalRelation.scala:37) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125) at org.apache.spark.sql.SQLContext.load(SQLContext.scala:1153) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:30) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:38) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:40) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:42) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:44) at $iwC$$iwC$$iwC$$iwC$$iwC.(:46) at $iwC$$iwC$$iwC$$iwC.(:48) at $iwC$$iwC$$iwC.(:50) at $iwC$$iwC.(:52) at $iwC.(:54) at (:56) at .(:60) at .() at .(:7) at .() at $print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop
Re: Spark Plugin Information
Hi Ben, If you have a reproducible test case, please file a JIRA for it. The documentation (https://phoenix.apache.org/phoenix_spark.html) is accurate and verified for up to Phoenix 4.7.0 and Spark 1.6.0. Although not supported by the Phoenix project at large, you may find this Docker image useful as a configuration reference: https://github.com/jmahonin/docker-phoenix/tree/phoenix_spark Good luck! Josh On Fri, Apr 8, 2016 at 3:11 PM, Benjamin Kim wrote: > I want to know if there is a update/patch coming to Spark or the Spark > plugin? I see that the Spark plugin does not work because HBase classes are > missing from the Spark Assembly jar. So, when Spark does reflection, it > does not look for HBase client classes in the Phoenix Plugin jar but only > in the Spark Assembly jar. Is this true? > > If someone can enlighten me on this topic, please let me know. > > Thanks, > Ben > > > > > > >
Spark Plugin Information
I want to know if there is a update/patch coming to Spark or the Spark plugin? I see that the Spark plugin does not work because HBase classes are missing from the Spark Assembly jar. So, when Spark does reflection, it does not look for HBase client classes in the Phoenix Plugin jar but only in the Spark Assembly jar. Is this true? If someone can enlighten me on this topic, please let me know. Thanks, Ben
Spark Plugin Information
I want to know if there is a update/patch coming to Spark or the Spark plugin? I see that the Spark plugin does not work because HBase classes are missing from the Spark Assembly jar. So, when Spark does reflection, it does not look for HBase client classes in the Phoenix Plugin jar but only in the Spark Assembly jar. Is this true? If someone can enlighten me on this topic, please let me know. Thanks, Ben
Re: Missing Rows In Table After Bulk Load
Are the primary keys in the .csv file are all unique? (no rows overwriting other rows) On Fri, Apr 8, 2016 at 10:21 AM, Amit Shah wrote: > Hi, > > I am using phoenix 4.6 and hbase 1.0. After bulk loading 10 mil records > into a table using the psql.py utility, I tried querying the table using > the sqlline.py utility through a select count(*) query. I see only 0.1 > million records. > > What could be missing? > > The psql.py logs are > > python psql.py localhost -t TRANSACTIONS_TEST > ../examples/Transactions_big.csv > csv columns from database. > Table row timestamp column position: -1 > Table name: SYSTEM.CATALOG > CSV Upsert complete. 1000 rows upserted > Time: 4679.317 sec(s) > > > 0: jdbc:phoenix:localhost> select count(*) from TRANSACTIONS_TEST; > Table row timestamp column position: -1 > Table name: TRANSACTIONS_TEST > Table row timestamp column position: -1 > Table name: SYSTEM.CATALOG > +--+ > | COUNT(1) | > +--+ > | 184402 | > +--+ > 1 row selected (2.173 seconds) > > Thanks, > Amit >
Missing Rows In Table After Bulk Load
Hi, I am using phoenix 4.6 and hbase 1.0. After bulk loading 10 mil records into a table using the psql.py utility, I tried querying the table using the sqlline.py utility through a select count(*) query. I see only 0.1 million records. What could be missing? The psql.py logs are python psql.py localhost -t TRANSACTIONS_TEST ../examples/Transactions_big.csv csv columns from database. Table row timestamp column position: -1 Table name: SYSTEM.CATALOG CSV Upsert complete. 1000 rows upserted Time: 4679.317 sec(s) 0: jdbc:phoenix:localhost> select count(*) from TRANSACTIONS_TEST; Table row timestamp column position: -1 Table name: TRANSACTIONS_TEST Table row timestamp column position: -1 Table name: SYSTEM.CATALOG +--+ | COUNT(1) | +--+ | 184402 | +--+ 1 row selected (2.173 seconds) Thanks, Amit
Re: [HELP:]Save Spark Dataframe in Phoenix Table
Hi Divya, That's strange. Are you able to post a snippet of your code to look at? And are you sure that you're saving the dataframes as per the docs ( https://phoenix.apache.org/phoenix_spark.html)? Depending on your HDP version, it may or may not actually have phoenix-spark support. Double-check that your Spark configuration is setup with the right worker/driver classpath settings. and that the phoenix JARs contain the necessary phoenix-spark classes (e.g. org.apache.phoenix.spark.PhoenixRelation). If not, I suggest following up with Hortonworks. Josh On Fri, Apr 8, 2016 at 1:22 AM, Divya Gehlot wrote: > Hi, > I hava a Hortonworks Hadoop cluster having below Configurations : > Spark 1.5.2 > HBASE 1.1.x > Phoenix 4.4 > > I am able to connect to Phoenix through JDBC connection and able to read > the Phoenix tables . > But while writing the data back to Phoenix table > I am getting below error : > > org.apache.spark.sql.AnalysisException: > org.apache.phoenix.spark.DefaultSource does not allow user-specified > schemas.; > > Can any body help in resolving the above errors or any other solution of > saving Spark Dataframes to Phoenix. > > Would really appareciate the help. > > Thanks, > Divya >