HBaseStorage writes to the column descriptors specified in the constructor and in your case you're telling it to use 'A:tag A:value'. If you want to write to other columns you need to statically define them there.
If you want to use a dynamic column names, you could subclass HBaseStorage and re-implement the part where the HBase Put happens to use your tag value as the column descriptor instead of a static column list. On Mon, Sep 24, 2012 at 7:30 AM, HAJIHASHEMI, ZAHRA (AG/1000) < [email protected]> wrote: > Hi, > > > > I have a text file which has my hbase table information. It is comma > separated. The first is attribute name (which I want it to be as column > qualifier) and the second is attribute value. The file looks like this: > > COMMON_NAME,corn > > SCIENTIFIC_NAME,Zea mays > > GENETIC_BACKGROUND,LH244 > > TISSUE,tassel > > DEV_STAGE,V7-V8 > > TREATMENT,"Microspore mothercell stage (V7-V8), <0.5in" > > ECTOPIC_TYPE, > > > > So I want to load this file and store it into the hbase table. The table > schema is discovery_rnaseq_library (A: attribute_name, value: > attribute_value) > > Here is my pig script: > > > library_tag = LOAD '/my_path/345_lib_description.txt' USING > PigStorage(',') AS (tag:chararray, value:chararray); > library_id = LOAD 'discovery_rnaseq_library' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('A:organism' ,'-loadKey > true') as (id:int, name:chararray); > grpd = group library_id all; > data_id = foreach grpd generate ((MAX(library_id.id))+1) as id; > finalData = CROSS data_id, library_tag; > STORE library_tag INTO 'hbase://library' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('A:tag A:value'); > > > > And I also scan the table to get the max id and use it to insert new > record. > > > > The problem is it just insert all of the records with "tag" as column > qualifier. Here is what I get after running this pig script: > > COLUMN CELL > > A:tag timestamp=1348451755196, > value=REPLICATE_NUMBER > > A:value timestamp=1348451755196, value=1 > > Whereas I want it to be something like this: > COLUMN CELL > A: COMMON_NAME timestamp=1348451755196, value=corn > > A:SCIENTIFIC_NAME timestamp=1348451755196, value= Zea mays > > ... > > > I highly appreciate any comments. > > Thanks! > > -Zara > > This e-mail message may contain privileged and/or confidential > information, and is intended to be received only by persons entitled > to receive such information. If you have received this e-mail in error, > please notify the sender immediately. Please delete it and > all attachments from any servers, hard drives or any other media. Other > use of this e-mail by you is strictly prohibited. > > All e-mails and attachments sent and received are subject to monitoring, > reading and archival by Monsanto, including its > subsidiaries. The recipient of this e-mail is solely responsible for > checking for the presence of "Viruses" or other "Malware". > Monsanto, along with its subsidiaries, accepts no liability for any damage > caused by any such code transmitted by or accompanying > this e-mail or any attachment. > > > The information contained in this email may be subject to the export > control laws and regulations of the United States, potentially > including but not limited to the Export Administration Regulations (EAR) > and sanctions regulations issued by the U.S. Department of > Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this > information you are obligated to comply with all > applicable U.S. export laws and regulations. > -- *Note that I'm no longer using my Yahoo! email address. Please email me at [email protected] going forward.*
