Hi Ralph.
Can you please try to modify the STORE command in the script to the
following.
STORE D into 'hbase://$table_name/period,deployment,file_id, recnum'
using org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize
1000');
Primarily, Phoenix generates the default UPSERT query to the table and it
assumes the order to be that of the columns mentioned in your CREATE table.
In your case, I see you are reordering the columns during the STORE command
. Hence, with the above change, Phoenix constructs the right UPSERT query
for you with the columns you mention after $table_name.
Also, to have the look at the query Phoenix has generated, you should see a
log entry which starts with "
*Phoenix Generic Upsert Statement: *
That also will give insights into the UPSERT query.
Happy to help!!
Regards
Ravi
On Fri, Dec 5, 2014 at 2:57 PM, Perko, Ralph J <[email protected]> wrote:
> Hi, I wrote a series of pig scripts to load data that were working well
> with 4.0, but since upgrading to 4.2.x (4.2.1 currently) are now failing.
>
> Here is an example:
>
> Table def:
> CREATE TABLE IF NOT EXISTS t1_log_dns
> (
> period BIGINT NOT NULL,
> deployment VARCHAR NOT NULL,
> file_id VARCHAR NOT NULL,
> recnum INTEGER NOT NULL,
> f1 VARCHAR,
> f2 VARCHAR,
> f3 VARCHAR,
> f4 BIGINT,
> ...
> CONSTRAINT pkey PRIMARY KEY (period, deployment, file_id, recnum)
> )
> IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=10,SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
>
> --- some index def’s – same error occurs with or without them
>
> Pig script:
>
> register $phoenix_jar;
> register $udf_jar;
>
> Z = load '$data' as (
> file_id,
> recnum,
> period,
> deployment,
> ... more fields
> );
>
> -- put it all together and generate final output!
> D = foreach Z generate
> period,
> deployment,
> file_id,
> recnum ,
> ... more fields;
>
> STORE D into 'hbase://$table_name' using
> org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize 1000');
>
> Error:
> 2014-12-05 14:24:06,450 [main] ERROR
> org.apache.pig.tools.pigstats.SimplePigStats - ERROR: Unable to process
> column RECNUM:INTEGER, innerMessage=java.lang.String cannot be coerced to
> INTEGER
> 2014-12-05 14:24:06,450 [main] ERROR
> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
> 2014-12-05 14:24:06,452 [main] INFO
> org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
>
> HadoopVersion PigVersion UserId StartedAt FinishedAt Features
> 2.4.0.2.1.5.0-695 0.12.1.2.1.5.0-695 perko 2014-12-05 14:23:17 2014-12-05
> 14:24:06 UNKNOWN
>
> Based on the error it would seem that some non-integer value cannot be
> cast to an integer. But the data does not show this. Stepping through the
> Pig script and running "dump" on each variable
> shows the data in the right place and the right coercible type – for
> example the recnum has nothing but single digits of sample data.
>
> I have tried to set "recnum" to an int in pig but this just pushes the
> error up to the previous field - file_id:
>
> ERROR 2999: Unexpected internal error. Unable to process column
> FILE_ID:VARCHAR, innerMessage=java.lang.Integer cannot be coerced to VARCHAR
>
> Other times I get a different error:
>
> Unable to process column _SALT:BINARY,
> innerMessage=org.apache.phoenix.schema.TypeMismatchException: ERROR 203
> (22005): Type mismatch. BINARY cannot be coerced to LONG
>
> Is there something obvious I am doing wrong? Did something significant
> change between 4.0 and 4.2.x in this regard? I would not rule out some
> silly user error I inadvertently introduced :-/
>
> Thanks for your help
> Ralph
>
>