Dug into one of the row that was having a similar problem, throwing IllegalArgumentException [1] instead of BufferUnderflowException, but both seemed to be data issue on how the varchar array is stored in an unexpected format in HBase.
The row looks like: *A_VARCHAR_OF_170_CHARS*\x00\x00\x00\x80\x01\x00\x00\x02\xAD\x00\x00\x00 I could not make sense of it based on the 4.13 encoding (hence Phoenix is throwing an exception), and I looked back to 4.8 and it doesn't seem like the old format either... Anyone recognize the hex encoding by any chance, or is this some sort of data corruption? Thanks, - Will [1] java.lang.IllegalArgumentException at java.nio.Buffer.position(Buffer.java:244) at org.apache.phoenix.schema.types.PArrayDataType.createPhoenixArray(PArrayDataType.java:1025) at org.apache.phoenix.schema.types.PArrayDataType.toObject(PArrayDataType.java:375) at org.apache.phoenix.schema.types.PVarcharArray.toObject(PVarcharArray.java:65) at org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:1011) at org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:75) at org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:609) at sqlline.Rows$Row.<init>(Rows.java:183) at sqlline.BufferedRows.<init>(BufferedRows.java:38) at sqlline.SqlLine.print(SqlLine.java:1660) at sqlline.Commands.execute(Commands.java:833) at sqlline.Commands.sql(Commands.java:732) at sqlline.SqlLine.dispatch(SqlLine.java:813) at sqlline.SqlLine.begin(SqlLine.java:686) at sqlline.SqlLine.start(SqlLine.java:398) at sqlline.SqlLine.main(SqlLine.java:291) On Wed, Oct 17, 2018 at 3:21 PM William Shen <wills...@marinsoftware.com> wrote: > Thank Jaanai. > > At first we thought it was data issue too, but as we restored the table > from snapshot to a separate schema on the same cluster to triage, the > exception no longer happens... Does that give further clue on what the > issue might've been? > > 0: jdbc:phoenix:journalnode,test> SELECT A, B, C, D FROM SCHEMA.TABLE > where A = 13100423; > > java.nio.BufferUnderflowException > > at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:151) > > at java.nio.ByteBuffer.get(ByteBuffer.java:715) > > at > org.apache.phoenix.schema.types.PArrayDataType.createPhoenixArray(PArrayDataType.java:1028) > > at > org.apache.phoenix.schema.types.PArrayDataType.toObject(PArrayDataType.java:375) > > at > org.apache.phoenix.schema.types.PVarcharArray.toObject(PVarcharArray.java:65) > > at org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:1011) > > at > org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:75) > > at > org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:609) > > at sqlline.Rows$Row.<init>(Rows.java:183) > > at sqlline.BufferedRows.<init>(BufferedRows.java:38) > > at sqlline.SqlLine.print(SqlLine.java:1660) > > at sqlline.Commands.execute(Commands.java:833) > > at sqlline.Commands.sql(Commands.java:732) > > at sqlline.SqlLine.dispatch(SqlLine.java:813) > > at sqlline.SqlLine.begin(SqlLine.java:686) > > at sqlline.SqlLine.start(SqlLine.java:398) > > at sqlline.SqlLine.main(SqlLine.java:291) > > > > 0: jdbc:phoenix:journalnode,test> SELECT A, B, C, D FROM SCHEMA.CORRUPTION > where A = 13100423; > > +-----------+--------+--------+-------------+ > > | A | B | C | D | > > +-----------+--------+--------+-------------+ > > | 13100423 | 5159 | 7 | ['female'] | > > +-----------+--------+--------+-------------+ > > 1 row selected (1.76 seconds) > > On Sun, Oct 14, 2018 at 8:39 PM Jaanai Zhang <cloud.pos...@gmail.com> > wrote: > >> It looks a bug that the remained part greater than retrieved the length >> in ByteBuffer, Maybe the position of ByteBuffer or the length of target >> byte array exists some problems. >> >> ---------------------------------------- >> Jaanai Zhang >> Best regards! >> >> >> >> William Shen <wills...@marinsoftware.com> 于2018年10月12日周五 下午11:53写道: >> >>> Hi all, >>> >>> We are running Phoenix 4.13, and periodically we would encounter the >>> following exception when querying from Phoenix in our staging environment. >>> Initially, we thought we had some incompatible client version connecting >>> and creating data corruption, but after ensuring that we are only >>> connecting with 4.13 clients, we still see this issue come up from time to >>> time. So far, fortunately, since it is in staging, we are able to identify >>> and delete the data to restore service. >>> >>> However, would like to ask for guidance on what else we could look for >>> to identify the cause of this exception. Could this perhaps caused by >>> something other than data corruption? >>> >>> Thanks in advance! >>> >>> The exception looks like: >>> >>> 18/10/12 15:45:58 WARN scheduler.TaskSetManager: Lost task 32.2 in stage >>> 14.0 (TID 1275, ...datanode..., executor 82): >>> java.nio.BufferUnderflowException >>> >>> at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:151) >>> >>> at java.nio.ByteBuffer.get(ByteBuffer.java:715) >>> >>> at >>> org.apache.phoenix.schema.types.PArrayDataType.createPhoenixArray(PArrayDataType.java:1028) >>> >>> at >>> org.apache.phoenix.schema.types.PArrayDataType.toObject(PArrayDataType.java:375) >>> >>> at >>> org.apache.phoenix.schema.types.PVarcharArray.toObject(PVarcharArray.java:65) >>> >>> at >>> org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:1011) >>> >>> at >>> org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:75) >>> >>> at >>> org.apache.phoenix.jdbc.PhoenixResultSet.getObject(PhoenixResultSet.java:525) >>> >>> at >>> org.apache.phoenix.spark.PhoenixRecordWritable$$anonfun$readFields$1.apply$mcVI$sp(PhoenixRecordWritable.scala:96) >>> >>> at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) >>> >>> at >>> org.apache.phoenix.spark.PhoenixRecordWritable.readFields(PhoenixRecordWritable.scala:93) >>> >>> at >>> org.apache.phoenix.mapreduce.PhoenixRecordReader.nextKeyValue(PhoenixRecordReader.java:168) >>> >>> at >>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:174) >>> >>> at >>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) >>> >>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) >>> >>> at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1596) >>> >>> at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157) >>> >>> at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157) >>> >>> at >>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1870) >>> >>> at >>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1870) >>> >>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) >>> >>> at org.apache.spark.scheduler.Task.run(Task.scala:89) >>> >>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:229) >>> >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>> >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>> >>> at java.lang.Thread.run(Thread.java:748) >>> >>> >>>