Re: Load HFiles in Apache Phoenix
On 2017-10-20 17:23, James Taylorwrote: > If you put together a nice example, we can post a link to it from the FAQ. > Sorry, but with open source, the answer is often "go look at the source > code". :-) Hi, James. I've been through the tests and cannot find anything close to the body of code I inherited. My hat is off to whomever figured this out! To be sure I have the fundamentals, would it be correct to say that this is simply a shorthand to use the Phoenix query engine to act as a standalone encoding mechanism? The code I'm looking at does not explicitly set autoCommit to off. Isn't that somewhat essential? Is autoCommit = off the default for Spark JDBC connections?
Re: Load HFiles in Apache Phoenix
If you put together a nice example, we can post a link to it from the FAQ. Sorry, but with open source, the answer is often "go look at the source code". :-) On Fri, Oct 20, 2017 at 2:13 PM, snhir...@gmail.comwrote: > > > On 2017-10-20 17:07, James Taylor wrote: > > Load Phoenix into Eclipse and search for references to > > PhoenixRuntime.getUncommittedDataIterator(). There's even a unit test > does > > this. > > > > Ok, I appreciate the response. But I've already encountered the source > code during my searches and it really isn't very enlightening in terms of > how one simply uses it. I'll take your advice and go after the unit test > next. >
Re: Load HFiles in Apache Phoenix
On 2017-10-20 17:07, James Taylorwrote: > Load Phoenix into Eclipse and search for references to > PhoenixRuntime.getUncommittedDataIterator(). There's even a unit test does > this. > Ok, I appreciate the response. But I've already encountered the source code during my searches and it really isn't very enlightening in terms of how one simply uses it. I'll take your advice and go after the unit test next.
Re: Load HFiles in Apache Phoenix
Load Phoenix into Eclipse and search for references to PhoenixRuntime.getUncommittedDataIterator(). There's even a unit test does this. On Fri, Oct 20, 2017 at 2:04 PM, snhir...@gmail.comwrote: > > > On 2017-10-20 16:49, James Taylor wrote: > > Here's a little more info: > > https://phoenix.apache.org/faq.html#Why_empty_key_value > > > > Lot's of hits here too: > > http://search-hadoop.com/?project=Phoenix=empty+key+value > > > > On Fri, Oct 20, 2017 at 1:45 PM, sn5 wrote: > > > > > It would be very helpful to see a complete, working example (preferably > > > with > > > some comments) of this hfile load technique. Apparently it's a known > > > idiom, > > > but I've spent most of the afternoon searching Google and cannot find a > > > single reference other than this thread. In particular I do not > understand > > > what is meant by " > > > ..loading the empty column". > > > > > > > > > > > > > > > > > > -- > > > Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/ > > > > > > > Thanks, but that answers only a part of my question. I would like to see > a reference to the entire idiom of using the uncommitted data from a > transaction that will be subsequently rolled back. I can sort of infer > what's going on from that original post, but cannot find any further > references or examples. > >
Re: Load HFiles in Apache Phoenix
On 2017-10-20 16:49, James Taylorwrote: > Here's a little more info: > https://phoenix.apache.org/faq.html#Why_empty_key_value > > Lot's of hits here too: > http://search-hadoop.com/?project=Phoenix=empty+key+value > > On Fri, Oct 20, 2017 at 1:45 PM, sn5 wrote: > > > It would be very helpful to see a complete, working example (preferably > > with > > some comments) of this hfile load technique. Apparently it's a known > > idiom, > > but I've spent most of the afternoon searching Google and cannot find a > > single reference other than this thread. In particular I do not understand > > what is meant by " > > ..loading the empty column". > > > > > > > > > > > > -- > > Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/ > > > Thanks, but that answers only a part of my question. I would like to see a reference to the entire idiom of using the uncommitted data from a transaction that will be subsequently rolled back. I can sort of infer what's going on from that original post, but cannot find any further references or examples.
Re: Load HFiles in Apache Phoenix
Here's a little more info: https://phoenix.apache.org/faq.html#Why_empty_key_value Lot's of hits here too: http://search-hadoop.com/?project=Phoenix=empty+key+value On Fri, Oct 20, 2017 at 1:45 PM, sn5wrote: > It would be very helpful to see a complete, working example (preferably > with > some comments) of this hfile load technique. Apparently it's a known > idiom, > but I've spent most of the afternoon searching Google and cannot find a > single reference other than this thread. In particular I do not understand > what is meant by " > ..loading the empty column". > > > > > > -- > Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/ >
Re: Load HFiles in Apache Phoenix
It would be very helpful to see a complete, working example (preferably with some comments) of this hfile load technique. Apparently it's a known idiom, but I've spent most of the afternoon searching Google and cannot find a single reference other than this thread. In particular I do not understand what is meant by " ..loading the empty column". -- Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/
Re: Load HFiles in Apache Phoenix
Hi Abel, Yes, you need to either include the empty key value or you need to declare your table as a view instead of a table (in which case it'd be read-only). Thanks, James On Wed, Apr 27, 2016 at 12:17 PM, Abel Fernándezwrote: > Hi, > > I am trying to load files in Apache Phoenix using HFiles. I do not have a > csv so I need to load the Hfiles from a RDD. > > My problem is that I am not able to see the files using the apache api > (select * from...) but when I do a scan of the table I see the files. > > Do I need to include the empty column? > > This is the code I am using: > > class ExtendedProductRDDFunctions[A <: scala.Product](data: > org.apache.spark.rdd.RDD[A]) extends > ProductRDDFunctions[A](data) with Serializable with Logging { > > Create the Hfiles: > --- > def toHFile( > sc: SparkContext, > tableName: String, > columns: Seq[String], > conf: Configuration = new Configuration, > zkUrl: Option[String] = None > ): RDD[(ByteArrayWrapper, FamiliesQualifiersValues)] = { > > val config = ConfigurationUtil.getOutputConfiguration(tableName, > columns, zkUrl, Some(conf)) > val tableBytes = Bytes.toBytes(tableName) > ConfigurationUtil.encodeColumns(config) > val jdbcUrl = zkUrl.map(getJdbcUrl).getOrElse(getJdbcUrl(config)) > val query = QueryUtil.constructUpsertStatement(tableName, > columns.toList.asJava, null) > > val columnsInfo = ConfigurationUtil.decodeColumns(config) > val a = sc.broadcast(columnsInfo) > > logInfo("toHFile data size: "+data.count()) > data.flatMap(x => mapRow(x, jdbcUrl, tableBytes, query, a.value)) > } > > def mapRow(product: Product, > jdbcUrl: String, > tableBytes: Array[Byte], > query: String, > columnsInfo: List[ColumnInfo]): List[(ByteArrayWrapper, > FamiliesQualifiersValues)] = { > > val conn = DriverManager.getConnection(jdbcUrl) > var hRows:Iterator[(ByteArrayWrapper, FamiliesQualifiersValues)] = null > val preparedStatement = conn.prepareStatement(query) > > > columnsInfo.zip(product.productIterator.toList).zipWithIndex.foreach(setInStatement(preparedStatement)) > preparedStatement.execute() > > val uncommittedDataIterator = > PhoenixRuntime.getUncommittedDataIterator(conn, true) > hRows = uncommittedDataIterator.asScala > .flatMap(kvPair => kvPair.getSecond.asScala.map(kf => createPut(kf))) > > conn.rollback() > hRows.toList > } > > private def createPut(keyValue: > KeyValue):(ByteArrayWrapper,FamiliesQualifiersValues)={ > > val key = new ByteArrayWrapper(keyValue.getRow) > val family = new FamiliesQualifiersValues > > family.+=(keyValue.getFamily,keyValue.getQualifier,keyValue.getValue) > (key,family) > } > } > > Load into Apache Phoenix > - > val sortedRdd = rdd > .keyBy(k => k._1.toString) > .reduceByKey((key,value) => value) > .map(v => v._2) > > def apacheBulkSave(hBaseContext: HBaseContext, table: String,outputPath: > String) ={ > rdd.hbaseBulkLoadThinRows(hBaseContext, > TableName.valueOf(table), > f => f, > outputPath > ) > } > > -- > Un saludo - Best Regards. > Abel >
Load HFiles in Apache Phoenix
Hi, I am trying to load files in Apache Phoenix using HFiles. I do not have a csv so I need to load the Hfiles from a RDD. My problem is that I am not able to see the files using the apache api (select * from...) but when I do a scan of the table I see the files. Do I need to include the empty column? This is the code I am using: class ExtendedProductRDDFunctions[A <: scala.Product](data: org.apache.spark.rdd.RDD[A]) extends ProductRDDFunctions[A](data) with Serializable with Logging { Create the Hfiles: --- def toHFile( sc: SparkContext, tableName: String, columns: Seq[String], conf: Configuration = new Configuration, zkUrl: Option[String] = None ): RDD[(ByteArrayWrapper, FamiliesQualifiersValues)] = { val config = ConfigurationUtil.getOutputConfiguration(tableName, columns, zkUrl, Some(conf)) val tableBytes = Bytes.toBytes(tableName) ConfigurationUtil.encodeColumns(config) val jdbcUrl = zkUrl.map(getJdbcUrl).getOrElse(getJdbcUrl(config)) val query = QueryUtil.constructUpsertStatement(tableName, columns.toList.asJava, null) val columnsInfo = ConfigurationUtil.decodeColumns(config) val a = sc.broadcast(columnsInfo) logInfo("toHFile data size: "+data.count()) data.flatMap(x => mapRow(x, jdbcUrl, tableBytes, query, a.value)) } def mapRow(product: Product, jdbcUrl: String, tableBytes: Array[Byte], query: String, columnsInfo: List[ColumnInfo]): List[(ByteArrayWrapper, FamiliesQualifiersValues)] = { val conn = DriverManager.getConnection(jdbcUrl) var hRows:Iterator[(ByteArrayWrapper, FamiliesQualifiersValues)] = null val preparedStatement = conn.prepareStatement(query) columnsInfo.zip(product.productIterator.toList).zipWithIndex.foreach(setInStatement(preparedStatement)) preparedStatement.execute() val uncommittedDataIterator = PhoenixRuntime.getUncommittedDataIterator(conn, true) hRows = uncommittedDataIterator.asScala .flatMap(kvPair => kvPair.getSecond.asScala.map(kf => createPut(kf))) conn.rollback() hRows.toList } private def createPut(keyValue: KeyValue):(ByteArrayWrapper,FamiliesQualifiersValues)={ val key = new ByteArrayWrapper(keyValue.getRow) val family = new FamiliesQualifiersValues family.+=(keyValue.getFamily,keyValue.getQualifier,keyValue.getValue) (key,family) } } Load into Apache Phoenix - val sortedRdd = rdd .keyBy(k => k._1.toString) .reduceByKey((key,value) => value) .map(v => v._2) def apacheBulkSave(hBaseContext: HBaseContext, table: String,outputPath: String) ={ rdd.hbaseBulkLoadThinRows(hBaseContext, TableName.valueOf(table), f => f, outputPath ) } -- Un saludo - Best Regards. Abel