Hi, I wrote a series of pig scripts to load data that were working well with
4.0, but since upgrading to 4.2.x (4.2.1 currently) are now failing.
Here is an example:
Table def:
CREATE TABLE IF NOT EXISTS t1_log_dns
(
period BIGINT NOT NULL,
deployment VARCHAR NOT NULL,
file_id VARCHAR NOT NULL,
recnum INTEGER NOT NULL,
f1 VARCHAR,
f2 VARCHAR,
f3 VARCHAR,
f4 BIGINT,
...
CONSTRAINT pkey PRIMARY KEY (period, deployment, file_id, recnum)
)
IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=10,SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
--- some index def’s – same error occurs with or without them
Pig script:
register $phoenix_jar;
register $udf_jar;
Z = load '$data' as (
file_id,
recnum,
period,
deployment,
... more fields
);
-- put it all together and generate final output!
D = foreach Z generate
period,
deployment,
file_id,
recnum ,
... more fields;
STORE D into 'hbase://$table_name' using
org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize 1000');
Error:
2014-12-05 14:24:06,450 [main] ERROR
org.apache.pig.tools.pigstats.SimplePigStats - ERROR: Unable to process column
RECNUM:INTEGER, innerMessage=java.lang.String cannot be coerced to INTEGER
2014-12-05 14:24:06,450 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil
- 1 map reduce job(s) failed!
2014-12-05 14:24:06,452 [main] INFO
org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.4.0.2.1.5.0-695 0.12.1.2.1.5.0-695 perko 2014-12-05 14:23:17 2014-12-05
14:24:06 UNKNOWN
Based on the error it would seem that some non-integer value cannot be cast to
an integer. But the data does not show this. Stepping through the Pig script
and running "dump" on each variable
shows the data in the right place and the right coercible type – for example
the recnum has nothing but single digits of sample data.
I have tried to set "recnum" to an int in pig but this just pushes the error up
to the previous field - file_id:
ERROR 2999: Unexpected internal error. Unable to process column
FILE_ID:VARCHAR, innerMessage=java.lang.Integer cannot be coerced to VARCHAR
Other times I get a different error:
Unable to process column _SALT:BINARY,
innerMessage=org.apache.phoenix.schema.TypeMismatchException: ERROR 203
(22005): Type mismatch. BINARY cannot be coerced to LONG
Is there something obvious I am doing wrong? Did something significant change
between 4.0 and 4.2.x in this regard? I would not rule out some silly user
error I inadvertently introduced :-/
Thanks for your help
Ralph