thanks again all.
My primary objective was to write to Hbase directly from Spark streaming
and Phoenix was really the catalyst here.
My point being that if I manage to write directly from Spark streaming to
Hbase would that being a better option.
FYI, I can read from phoenix table on Hbase
Hi Mich,
I'd encourage you to use this mechanism mentioned by Josh:
Another option is to use Phoenix-JDBC from within Spark Streaming. I've got
a toy example of using Spark streaming with Phoenix DataFrames, but it
could just as easily be a batched JDBC upsert.
Trying to write directly to HBase
Thanks Josh, I will try your code as well.
I wrote this simple program based on some code that directly creates or
populates an Hbase table called "new" from Spark 2
import org.apache.spark._
import org.apache.spark.rdd.NewHadoopRDD
import org.apache.hadoop.hbase.{HBaseConfiguration,
Hi Mich,
You're correct that the rowkey is the primary key, but if you're writing to
HBase directly and bypassing Phoenix, you'll have to be careful about the
construction of your row keys to adhere to the Phoenix data types and row
format. I don't think it's very well documented, but you might
Thank you all. very helpful.
I have not tried the method Ciureanu suggested but will do so.
Now I will be using Spark Streaming to populate Hbase table. I was hoping
to do this through Phoenix but managed to write a script to write to Hbase
table from Spark 2 itself.
Having worked with Hbase I
In Spark 1.4 it worked via JDBC - sure it would work in 1.6 / 2.0 without
issues.
Here's a sample code I used (it was getting data in parallel 24 partitions)
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.rdd.JdbcRDD
import java.sql.{Connection,
JIRA on hbase side:
HBASE-16179
FYI
On Fri, Oct 7, 2016 at 6:07 AM, Josh Mahonin wrote:
> Hi Mich,
>
> There's an open ticket about this issue here:
> https://issues.apache.org/jira/browse/PHOENIX-
>
> Long story short, Spark changed their API (again), breaking the
Hi Mich,
There's an open ticket about this issue here:
https://issues.apache.org/jira/browse/PHOENIX-
Long story short, Spark changed their API (again), breaking the existing
integration. I'm not sure the level of effort to get it working with Spark
2.0, but based on examples from other