Re: Load HFiles in Apache Phoenix

2017-10-23 Thread snhir...@gmail.com


On 2017-10-20 17:23, James Taylor  wrote: 
> If you put together a nice example, we can post a link to it from the FAQ.
> Sorry, but with open source, the answer is often "go look at the source
> code". :-)

Hi, James.

I've been through the tests and cannot find anything close to the body of code 
I inherited.  My hat is off to whomever figured this out!  To be sure I have 
the fundamentals, would it be correct to say that this is simply a shorthand to 
use the Phoenix query engine to act as a standalone encoding mechanism?  

The code I'm looking at does not explicitly set autoCommit to off.  Isn't that 
somewhat essential?  Is autoCommit = off the default for Spark JDBC connections?





Re: Load HFiles in Apache Phoenix

2017-10-20 Thread James Taylor
If you put together a nice example, we can post a link to it from the FAQ.
Sorry, but with open source, the answer is often "go look at the source
code". :-)

On Fri, Oct 20, 2017 at 2:13 PM, snhir...@gmail.com 
wrote:

>
>
> On 2017-10-20 17:07, James Taylor  wrote:
> > Load Phoenix into Eclipse and search for references to
> > PhoenixRuntime.getUncommittedDataIterator(). There's even a unit test
> does
> > this.
> >
>
> Ok, I appreciate the response.  But I've already encountered the source
> code during my searches and it really isn't very enlightening in terms of
> how one simply uses it.  I'll take your advice and go after the unit test
> next.
>


Re: Load HFiles in Apache Phoenix

2017-10-20 Thread snhir...@gmail.com


On 2017-10-20 17:07, James Taylor  wrote: 
> Load Phoenix into Eclipse and search for references to
> PhoenixRuntime.getUncommittedDataIterator(). There's even a unit test does
> this.
> 

Ok, I appreciate the response.  But I've already encountered the source code 
during my searches and it really isn't very enlightening in terms of how one 
simply uses it.  I'll take your advice and go after the unit test next.


Re: Load HFiles in Apache Phoenix

2017-10-20 Thread James Taylor
Load Phoenix into Eclipse and search for references to
PhoenixRuntime.getUncommittedDataIterator(). There's even a unit test does
this.

On Fri, Oct 20, 2017 at 2:04 PM, snhir...@gmail.com 
wrote:

>
>
> On 2017-10-20 16:49, James Taylor  wrote:
> > Here's a little more info:
> > https://phoenix.apache.org/faq.html#Why_empty_key_value
> >
> > Lot's of hits here too:
> > http://search-hadoop.com/?project=Phoenix=empty+key+value
> >
> > On Fri, Oct 20, 2017 at 1:45 PM, sn5  wrote:
> >
> > > It would be very helpful to see a complete, working example (preferably
> > > with
> > > some comments) of this hfile load technique.  Apparently it's a known
> > > idiom,
> > > but I've spent most of the afternoon searching Google and cannot find a
> > > single reference other than this thread.  In particular I do not
> understand
> > > what is meant by "
> > > ..loading the empty column".
> > >
> > >
> > >
> > >
> > >
> > > --
> > > Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/
> > >
> >
>
> Thanks, but that answers only a part of my question.  I would like to see
> a reference to the entire idiom of using the uncommitted data from a
> transaction that will be subsequently rolled back.  I can sort of infer
> what's going on from that original post, but cannot find any further
> references or examples.
>
>


Re: Load HFiles in Apache Phoenix

2017-10-20 Thread snhir...@gmail.com


On 2017-10-20 16:49, James Taylor  wrote: 
> Here's a little more info:
> https://phoenix.apache.org/faq.html#Why_empty_key_value
> 
> Lot's of hits here too:
> http://search-hadoop.com/?project=Phoenix=empty+key+value
> 
> On Fri, Oct 20, 2017 at 1:45 PM, sn5  wrote:
> 
> > It would be very helpful to see a complete, working example (preferably
> > with
> > some comments) of this hfile load technique.  Apparently it's a known
> > idiom,
> > but I've spent most of the afternoon searching Google and cannot find a
> > single reference other than this thread.  In particular I do not understand
> > what is meant by "
> > ..loading the empty column".
> >
> >
> >
> >
> >
> > --
> > Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/
> >
> 

Thanks, but that answers only a part of my question.  I would like to see a 
reference to the entire idiom of using the uncommitted data from a transaction 
that will be subsequently rolled back.  I can sort of infer what's going on 
from that original post, but cannot find any further references or examples.



Re: Load HFiles in Apache Phoenix

2017-10-20 Thread James Taylor
Here's a little more info:
https://phoenix.apache.org/faq.html#Why_empty_key_value

Lot's of hits here too:
http://search-hadoop.com/?project=Phoenix=empty+key+value

On Fri, Oct 20, 2017 at 1:45 PM, sn5  wrote:

> It would be very helpful to see a complete, working example (preferably
> with
> some comments) of this hfile load technique.  Apparently it's a known
> idiom,
> but I've spent most of the afternoon searching Google and cannot find a
> single reference other than this thread.  In particular I do not understand
> what is meant by "
> ..loading the empty column".
>
>
>
>
>
> --
> Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/
>


Re: Load HFiles in Apache Phoenix

2017-10-20 Thread sn5
It would be very helpful to see a complete, working example (preferably with
some comments) of this hfile load technique.  Apparently it's a known idiom,
but I've spent most of the afternoon searching Google and cannot find a
single reference other than this thread.  In particular I do not understand
what is meant by "
..loading the empty column".





--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/


Re: Load HFiles in Apache Phoenix

2016-04-27 Thread James Taylor
Hi Abel,
Yes, you need to either include the empty key value or you need to declare
your table as a view instead of a table (in which case it'd be read-only).
Thanks,
James

On Wed, Apr 27, 2016 at 12:17 PM, Abel Fernández 
wrote:

> Hi,
>
> I am trying to load files in Apache Phoenix using HFiles. I do not have a
> csv so I need to load the Hfiles from a RDD.
>
> My problem is that I am not able to see the files using the apache api
> (select * from...) but when I do a scan of the table I see the files.
>
> Do I need to include the empty column?
>
> This is the code I am using:
>
> class ExtendedProductRDDFunctions[A <: scala.Product](data:
> org.apache.spark.rdd.RDD[A]) extends
>   ProductRDDFunctions[A](data) with Serializable with Logging {
>
> Create the Hfiles:
> ---
>   def toHFile(
>   sc: SparkContext,
>   tableName: String,
>   columns: Seq[String],
>   conf: Configuration = new Configuration,
>   zkUrl: Option[String] = None
>   ): RDD[(ByteArrayWrapper, FamiliesQualifiersValues)] = {
>
> val config = ConfigurationUtil.getOutputConfiguration(tableName,
> columns, zkUrl, Some(conf))
> val tableBytes = Bytes.toBytes(tableName)
> ConfigurationUtil.encodeColumns(config)
> val jdbcUrl = zkUrl.map(getJdbcUrl).getOrElse(getJdbcUrl(config))
> val query = QueryUtil.constructUpsertStatement(tableName,
> columns.toList.asJava, null)
>
> val columnsInfo = ConfigurationUtil.decodeColumns(config)
> val a = sc.broadcast(columnsInfo)
>
> logInfo("toHFile data size: "+data.count())
> data.flatMap(x => mapRow(x, jdbcUrl, tableBytes, query, a.value))
>   }
>
>   def mapRow(product: Product,
>  jdbcUrl: String,
>  tableBytes: Array[Byte],
>  query: String,
>  columnsInfo: List[ColumnInfo]): List[(ByteArrayWrapper,
> FamiliesQualifiersValues)] = {
>
> val conn = DriverManager.getConnection(jdbcUrl)
> var hRows:Iterator[(ByteArrayWrapper, FamiliesQualifiersValues)] = null
> val preparedStatement = conn.prepareStatement(query)
>
>
> columnsInfo.zip(product.productIterator.toList).zipWithIndex.foreach(setInStatement(preparedStatement))
> preparedStatement.execute()
>
> val uncommittedDataIterator =
> PhoenixRuntime.getUncommittedDataIterator(conn, true)
> hRows = uncommittedDataIterator.asScala
>   .flatMap(kvPair => kvPair.getSecond.asScala.map(kf => createPut(kf)))
>
> conn.rollback()
> hRows.toList
>   }
>
>   private def createPut(keyValue:
> KeyValue):(ByteArrayWrapper,FamiliesQualifiersValues)={
>
> val key = new ByteArrayWrapper(keyValue.getRow)
> val family = new FamiliesQualifiersValues
>
> family.+=(keyValue.getFamily,keyValue.getQualifier,keyValue.getValue)
> (key,family)
>   }
>   }
>
>   Load into Apache Phoenix
>   -
>   val sortedRdd = rdd
>   .keyBy(k => k._1.toString)
>   .reduceByKey((key,value) => value)
>   .map(v => v._2)
>
>   def apacheBulkSave(hBaseContext: HBaseContext, table: String,outputPath:
> String) ={
> rdd.hbaseBulkLoadThinRows(hBaseContext,
>   TableName.valueOf(table),
>   f => f,
>   outputPath
> )
>   }
>
> --
> Un saludo - Best Regards.
> Abel
>


Load HFiles in Apache Phoenix

2016-04-27 Thread Abel Fernández
Hi,

I am trying to load files in Apache Phoenix using HFiles. I do not have a
csv so I need to load the Hfiles from a RDD.

My problem is that I am not able to see the files using the apache api
(select * from...) but when I do a scan of the table I see the files.

Do I need to include the empty column?

This is the code I am using:

class ExtendedProductRDDFunctions[A <: scala.Product](data:
org.apache.spark.rdd.RDD[A]) extends
  ProductRDDFunctions[A](data) with Serializable with Logging {

Create the Hfiles:
---
  def toHFile(
  sc: SparkContext,
  tableName: String,
  columns: Seq[String],
  conf: Configuration = new Configuration,
  zkUrl: Option[String] = None
  ): RDD[(ByteArrayWrapper, FamiliesQualifiersValues)] = {

val config = ConfigurationUtil.getOutputConfiguration(tableName,
columns, zkUrl, Some(conf))
val tableBytes = Bytes.toBytes(tableName)
ConfigurationUtil.encodeColumns(config)
val jdbcUrl = zkUrl.map(getJdbcUrl).getOrElse(getJdbcUrl(config))
val query = QueryUtil.constructUpsertStatement(tableName,
columns.toList.asJava, null)

val columnsInfo = ConfigurationUtil.decodeColumns(config)
val a = sc.broadcast(columnsInfo)

logInfo("toHFile data size: "+data.count())
data.flatMap(x => mapRow(x, jdbcUrl, tableBytes, query, a.value))
  }

  def mapRow(product: Product,
 jdbcUrl: String,
 tableBytes: Array[Byte],
 query: String,
 columnsInfo: List[ColumnInfo]): List[(ByteArrayWrapper,
FamiliesQualifiersValues)] = {

val conn = DriverManager.getConnection(jdbcUrl)
var hRows:Iterator[(ByteArrayWrapper, FamiliesQualifiersValues)] = null
val preparedStatement = conn.prepareStatement(query)


columnsInfo.zip(product.productIterator.toList).zipWithIndex.foreach(setInStatement(preparedStatement))
preparedStatement.execute()

val uncommittedDataIterator =
PhoenixRuntime.getUncommittedDataIterator(conn, true)
hRows = uncommittedDataIterator.asScala
  .flatMap(kvPair => kvPair.getSecond.asScala.map(kf => createPut(kf)))

conn.rollback()
hRows.toList
  }

  private def createPut(keyValue:
KeyValue):(ByteArrayWrapper,FamiliesQualifiersValues)={

val key = new ByteArrayWrapper(keyValue.getRow)
val family = new FamiliesQualifiersValues

family.+=(keyValue.getFamily,keyValue.getQualifier,keyValue.getValue)
(key,family)
  }
  }

  Load into Apache Phoenix
  -
  val sortedRdd = rdd
  .keyBy(k => k._1.toString)
  .reduceByKey((key,value) => value)
  .map(v => v._2)

  def apacheBulkSave(hBaseContext: HBaseContext, table: String,outputPath:
String) ={
rdd.hbaseBulkLoadThinRows(hBaseContext,
  TableName.valueOf(table),
  f => f,
  outputPath
)
  }

-- 
Un saludo - Best Regards.
Abel