[ 
https://issues.apache.org/jira/browse/PIG-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-1205:
-----------------------------------

          Status: Resolved  (was: Patch Available)
    Release Note: 
HBaseStorage has been significantly reworked with this release.

Usage:
{code}
my_data = LOAD 'hbase://table_name' USING 
org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfamily:col1 
colfamily:col2', '-caching 100') as (col1:int, col2:chararray);

STORE my_date INTO 'hbaseL//other_table' USING 
org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfamily:col1 
colfamily:col2');
{code}

HBaseStorage can now write data into HBase as well as read it. The first 
argument is a space-delimited list of columns to be loaded (or stored). Columns 
are specified as columnfamily:column_name. The second argument is an optional 
set of key-value pairs used to control HBaseStorage behavior. Available 
arguments are:

* {{monospaced}}-loadKey{{monospaced}} Used to load the row key; false by 
default. If true, the first field in the returned tuple will be the value of 
the row key.
* {{monospaced}}-gt, -gte, -lt, and -lte{{monospaced}} Used to specify bounds 
on row keys to be scanned. The keys are specified as binary data, using the hex 
representation. Any slashes have to be double-escaped (two slashes per single 
"real" slash) to be parsed correctly.
* {{monospaced}}-caching{{monospaced}} Used to specify the number of rows to be 
cached per HBase RPC call. See 
http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#setScannerCaching%28int%29
 for more information about this HBase feature.
* {{monospaced}}-limit{{monospaced}} Used to control how many rows *per scanned 
region* will be retrieved. This can of course speed up processing if you just 
want a few rows. The total number of rows returned will be up to number of 
regions * limit. The limit is applied after any -gt, -lt, etc filters. Pig's 
LIMIT operator can be used in conjunction with this argument.
* {{monospaced}}-caster{{monospaced}} Used to specify a LoadCaster (or 
LoadStoreCaster, for storage) used to convert the data stored in HBase into Pig 
data. By default, the Utf8StorageConverter is used, which stores all data as 
its string representation. The string "HBaseBinaryConverter" can be used to 
specify that data is stored in HBase's native binary format. Note that the 
HBaseBinary converter does not work with complex data types such as maps, 
tuples, and bags. You can also specify a full class path such as 
org.apache.pig.backend.hadoop.hbase.HBaseBinaryConverter to use your own 
Caster. The default caster can be changed by setting the pig.hbase.caster 
property in pig,properties

HBaseStorage matches column arguments to tuple fields based on their ordinal 
position. When storing, the first field is expected to be the key value.
      Resolution: Fixed

> Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc
> ------------------------------------------------------------------------------
>
>                 Key: PIG-1205
>                 URL: https://issues.apache.org/jira/browse/PIG-1205
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.7.0
>            Reporter: Jeff Zhang
>            Assignee: Dmitriy V. Ryaboy
>             Fix For: 0.8.0
>
>         Attachments: hbase-0.20.6-test.jar, hbase-0.20.6.jar, PIG_1205.patch, 
> PIG_1205_2.patch, PIG_1205_3.patch, PIG_1205_4.patch, PIG_1205_5.path, 
> PIG_1205_6.patch, PIG_1205_7.patch, PIG_1205_8.patch, PIG_1205_9.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to