Re: Spark Plugin Information

2016-04-08 Thread Benjamin Kim
Hi Josh,

I am using CDH 5.5.2 with HBase 1.0.0, Phoenix 4.5.2, and Spark 1.6.0. I looked 
up the error and found others who led me to ask the question. I’ll try to use 
Phoenix 4.7.0 client jar and see what happens.

The error I am getting is:

java.sql.SQLException: ERROR 103 (08004): Unable to establish connection.
at 
org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:388)
at 
org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:296)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.access$300(ConnectionQueryServicesImpl.java:179)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1917)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1896)
at 
org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:77)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:1896)
at 
org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:180)
at 
org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:132)
at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:151)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:187)
at 
org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:93)
at 
org.apache.phoenix.mapreduce.util.ConnectionUtil.getInputConnection(ConnectionUtil.java:57)
at 
org.apache.phoenix.mapreduce.util.ConnectionUtil.getInputConnection(ConnectionUtil.java:45)
at 
org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil.getSelectColumnMetadataList(PhoenixConfigurationUtil.java:280)
at org.apache.phoenix.spark.PhoenixRDD.toDataFrame(PhoenixRDD.scala:101)
at 
org.apache.phoenix.spark.PhoenixRelation.schema(PhoenixRelation.scala:57)
at 
org.apache.spark.sql.execution.datasources.LogicalRelation.(LogicalRelation.scala:37)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
at org.apache.spark.sql.SQLContext.load(SQLContext.scala:1153)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:30)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:38)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:40)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:42)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:44)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:46)
at $iwC$$iwC$$iwC$$iwC.(:48)
at $iwC$$iwC$$iwC.(:50)
at $iwC$$iwC.(:52)
at $iwC.(:54)
at (:56)
at .(:60)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop

Re: Spark Plugin Information

2016-04-08 Thread Josh Mahonin
Hi Ben,

If you have a reproducible test case, please file a JIRA for it. The
documentation (https://phoenix.apache.org/phoenix_spark.html) is accurate
and verified for up to Phoenix 4.7.0 and Spark 1.6.0.

Although not supported by the Phoenix project at large, you may find this
Docker image useful as a configuration reference:
https://github.com/jmahonin/docker-phoenix/tree/phoenix_spark

Good luck!

Josh

On Fri, Apr 8, 2016 at 3:11 PM, Benjamin Kim  wrote:

> I want to know if there is a update/patch coming to Spark or the Spark
> plugin? I see that the Spark plugin does not work because HBase classes are
> missing from the Spark Assembly jar. So, when Spark does reflection, it
> does not look for HBase client classes in the Phoenix Plugin jar but only
> in the Spark Assembly jar. Is this true?
>
> If someone can enlighten me on this topic, please let me know.
>
> Thanks,
> Ben
>
>
>
>
>
>
>


Spark Plugin Information

2016-04-08 Thread Benjamin Kim
I want to know if there is a update/patch coming to Spark or the Spark plugin? 
I see that the Spark plugin does not work because HBase classes are missing 
from the Spark Assembly jar. So, when Spark does reflection, it does not look 
for HBase client classes in the Phoenix Plugin jar but only in the Spark 
Assembly jar. Is this true?

If someone can enlighten me on this topic, please let me know.

Thanks,
Ben

Spark Plugin Information

2016-04-08 Thread Benjamin Kim
I want to know if there is a update/patch coming to Spark or the Spark plugin? 
I see that the Spark plugin does not work because HBase classes are missing 
from the Spark Assembly jar. So, when Spark does reflection, it does not look 
for HBase client classes in the Phoenix Plugin jar but only in the Spark 
Assembly jar. Is this true?

If someone can enlighten me on this topic, please let me know.

Thanks,
Ben








Re: Missing Rows In Table After Bulk Load

2016-04-08 Thread Steve Terrell
Are the primary keys in the .csv file are all unique? (no rows overwriting
other rows)

On Fri, Apr 8, 2016 at 10:21 AM, Amit Shah  wrote:

> Hi,
>
> I am using phoenix 4.6 and hbase 1.0. After bulk loading 10 mil records
> into a table using the psql.py utility, I tried querying the table using
> the sqlline.py utility through a select count(*) query. I see only 0.1
> million records.
>
> What could be missing?
>
> The psql.py logs are
>
> python psql.py localhost -t TRANSACTIONS_TEST
> ../examples/Transactions_big.csv
> csv columns from database.
> Table row timestamp column position: -1
> Table name:  SYSTEM.CATALOG
> CSV Upsert complete. 1000 rows upserted
> Time: 4679.317 sec(s)
>
>
> 0: jdbc:phoenix:localhost> select count(*) from TRANSACTIONS_TEST;
> Table row timestamp column position: -1
> Table name:  TRANSACTIONS_TEST
> Table row timestamp column position: -1
> Table name:  SYSTEM.CATALOG
> +--+
> | COUNT(1) |
> +--+
> | 184402   |
> +--+
> 1 row selected (2.173 seconds)
>
> Thanks,
> Amit
>


Missing Rows In Table After Bulk Load

2016-04-08 Thread Amit Shah
Hi,

I am using phoenix 4.6 and hbase 1.0. After bulk loading 10 mil records
into a table using the psql.py utility, I tried querying the table using
the sqlline.py utility through a select count(*) query. I see only 0.1
million records.

What could be missing?

The psql.py logs are

python psql.py localhost -t TRANSACTIONS_TEST
../examples/Transactions_big.csv
csv columns from database.
Table row timestamp column position: -1
Table name:  SYSTEM.CATALOG
CSV Upsert complete. 1000 rows upserted
Time: 4679.317 sec(s)


0: jdbc:phoenix:localhost> select count(*) from TRANSACTIONS_TEST;
Table row timestamp column position: -1
Table name:  TRANSACTIONS_TEST
Table row timestamp column position: -1
Table name:  SYSTEM.CATALOG
+--+
| COUNT(1) |
+--+
| 184402   |
+--+
1 row selected (2.173 seconds)

Thanks,
Amit


Re: [HELP:]Save Spark Dataframe in Phoenix Table

2016-04-08 Thread Josh Mahonin
Hi Divya,

That's strange. Are you able to post a snippet of your code to look at? And
are you sure that you're saving the dataframes as per the docs (
https://phoenix.apache.org/phoenix_spark.html)?

Depending on your HDP version, it may or may not actually have
phoenix-spark support. Double-check that your Spark configuration is setup
with the right worker/driver classpath settings. and that the phoenix JARs
contain the necessary phoenix-spark classes
(e.g. org.apache.phoenix.spark.PhoenixRelation). If not, I suggest
following up with Hortonworks.

Josh



On Fri, Apr 8, 2016 at 1:22 AM, Divya Gehlot 
wrote:

> Hi,
> I hava a Hortonworks Hadoop cluster having below Configurations :
> Spark 1.5.2
> HBASE 1.1.x
> Phoenix 4.4
>
> I am able to connect to Phoenix through JDBC connection and able to read
> the Phoenix tables .
> But while writing the data back to Phoenix table
> I am getting below error :
>
> org.apache.spark.sql.AnalysisException:
> org.apache.phoenix.spark.DefaultSource does not allow user-specified
> schemas.;
>
> Can any body help in resolving the above errors or any other solution of
> saving Spark Dataframes to Phoenix.
>
> Would really appareciate the help.
>
> Thanks,
> Divya
>