Ravi,

Your suggestion worked – thank you!

But I am now getting a org.apache.phoenix.schema.ConstraintViolationException 
on some data files.

"T1_LOG_DNS.PERIOD may not be null”

However there is no record with a null value for this field.

I tried hardcoding a value in the pig script to see if I could get past this 
error and it just moved the error to the next field:

"T1_LOG_DNS.DEPLOYMENT may not be null”

This is an intermittent error and does not happen with every file but does have 
consistently with the same file.

Thank you for the help

Ralph


__________________________________________________
Ralph Perko
Pacific Northwest National Laboratory
(509) 375-2272
[email protected]


From: Ravi Kiran <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Friday, December 5, 2014 at 3:20 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: pig and phoenix

Hi Ralph.
   Can you please try to modify the STORE command in the script to the 
following.
   STORE D into 'hbase://$table_name/period,deployment,file_id, recnum' using 
org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize 1000');

Primarily, Phoenix generates the default UPSERT query to the table and it 
assumes the order to be that of the columns mentioned in your CREATE table. In 
your case, I see you are reordering the columns during the STORE command . 
Hence, with the above change, Phoenix constructs the right UPSERT query for you 
with the columns you mention after $table_name.

Also, to have the look at the query Phoenix has generated, you should see a log 
entry which starts with  "Phoenix Generic Upsert Statement:
That also will give insights into the UPSERT query.

Happy to help!!

Regards
Ravi


On Fri, Dec 5, 2014 at 2:57 PM, Perko, Ralph J 
<[email protected]<mailto:[email protected]>> wrote:
Hi, I wrote a series of pig scripts to load data that were working well with 
4.0, but since upgrading  to 4.2.x (4.2.1 currently) are now failing.

Here is an example:

Table def:
CREATE TABLE IF NOT EXISTS t1_log_dns
(
  period BIGINT NOT NULL,
  deployment VARCHAR NOT NULL,
  file_id VARCHAR NOT NULL,
  recnum INTEGER NOT NULL,
  f1 VARCHAR,
  f2 VARCHAR,
  f3 VARCHAR,
  f4 BIGINT,
...
 CONSTRAINT pkey PRIMARY KEY (period, deployment, file_id, recnum)
) 
IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=10,SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';

--- some index def’s – same error occurs with or without them

Pig script:

register $phoenix_jar;
register $udf_jar;

Z = load '$data' as (
file_id,
recnum,
period,
deployment,
... more fields
);

-- put it all together and generate final output!
D = foreach Z generate
period,
deployment,
file_id,
recnum ,
... more fields;

STORE D into 'hbase://$table_name' using 
org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize 1000');

Error:
2014-12-05 14:24:06,450 [main] ERROR 
org.apache.pig.tools.pigstats.SimplePigStats - ERROR: Unable to process column 
RECNUM:INTEGER, innerMessage=java.lang.String cannot be coerced to INTEGER
2014-12-05 14:24:06,450 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil 
- 1 map reduce job(s) failed!
2014-12-05 14:24:06,452 [main] INFO  
org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:

HadoopVersion PigVersionUserId StartedAtFinishedAt Features
2.4.0.2.1.5.0-695 0.12.1.2.1.5.0-695perko 2014-12-05 14:23:172014-12-05 
14:24:06 UNKNOWN

Based on the error it would seem that some non-integer value cannot be cast to 
an integer.  But the data does not show this.  Stepping through the Pig script 
and running "dump" on each variable
shows the data in the right place and the right coercible type – for example 
the recnum has nothing but single digits of sample data.

I have tried to set "recnum" to an int in pig but this just pushes the error up 
to the previous field - file_id:

ERROR 2999: Unexpected internal error. Unable to process column 
FILE_ID:VARCHAR, innerMessage=java.lang.Integer cannot be coerced to VARCHAR

Other times I get a different error:

Unable to process column _SALT:BINARY, 
innerMessage=org.apache.phoenix.schema.TypeMismatchException: ERROR 203 
(22005): Type mismatch. BINARY cannot be coerced to LONG

Is there something obvious I am doing wrong?  Did something significant change 
between 4.0 and 4.2.x in this regard?  I would not rule out some silly user 
error I inadvertently introduced :-/

Thanks for your help
Ralph


Reply via email to