[ 
https://issues.apache.org/jira/browse/DRILL-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Barclay (Drill) updated DRILL-3955:
------------------------------------------
    Description: 
If all of the rows read by a given {{HBaseRecordReader}} have no HBase columns 
in a given HBase column family, {{HBaseRecordReader}} doesn't create a Drill 
column for that HBase column family.

Later, in a {{ProjectRecordBatch}}'s {{setupNewSchema}}, because no Drill 
column exists for that HBase column family, that {{setupNewSchema}} creates a 
dummy Drill column using the usual {{NullableIntVector}} type.  In particular, 
it is not a map vector as {{HBaseRecordReader}} creates when it sees an HBase 
column family.

Should {{HBaseRecordReader}} and/or something around setting up for reading 
HBase (including setting up that {{ProjectRecordBatch}}) make sure that all 
HBase column families are represented with map vectors so that 
{{setupNewSchema}} doesn't create a dummy field of type {{NullableIntVector}}?


The problem is that, currently, when an HBase table is read in two separate 
fragments, one fragment (seeing rows with columns in the column family) can get 
a map vector for the column family while the other (seeing only rows with no 
columns in the column familar) can get the {{NullableIntVector}}.  Downstream 
code that receives the two batches ends up with an unresolved conflict, 
yielding IndexOutOfBoundsExceptions as in DRILL-3954.

It's not clear whether there is only one bug\--that downstream code doesn't 
resolve {{NullableIntValue}} dummy fields right (DRILL-TBD)\--or two\--that the 
HBase reading code should set up a Drill column for every HBase column family 
(regardless of whether it has any columns in the rows that were read) and that 
downstream code doesn't resolve {{NullableIntValue}} dummy fields (resolution 
is applicable to sources other than just HBase).






  was:
If all of the rows read by a given {{HBaseRecordReader}} have no HBase columns 
in a given HBase column family, {{HBaseRecordReader}} doesn't create a Drill 
column for that HBase column family.

Later, in a {{ProjectRecordBatch}}'s {{setupNewSchema}}, because no Drill 
column exists for that HBase column family, that {{setupNewSchema}} creates a 
dummy Drill column using the usual {{NullableIntVector}} type.  In particular, 
it is not a map vector as {{HBaseRecordReader}} creates when it sees an HBase 
column family.

Should {{HBaseRecordReader}} and/or something around setting up for reading 
HBase (including setting up that {{ProjectRecordBatch}}) make sure that all 
HBase column families are represented with map vectors so that 
{{setupNewSchema}} doesn't create a dummy field of type {{NullableIntVector}}?


The problem is that, currently, when an HBase table is read in two separate 
fragments, one fragment (seeing rows with columns in the column family) can get 
a map vector for the column family while the other (seeing only rows with no 
columns in the column familar) can get the {{NullableIntVector}}.  Downstream 
code that receives the two batches ends up with an unresolved conflict, 
yielding IndexOutOfBoundsExceptions as in DRILL-3954.

It's not clear whether there is only one bug--that downstream code doesn't 
resolve {{NullableIntValue}} dummy fields right (DRILL-TBD)--or two--that the 
HBase reading code should set up a Drill column for every HBase column family 
(regardless of whether it has any columns in the rows that were read) and that 
downstream code doesn't resolve {{NullableIntValue}} dummy fields (resolution 
is applicable to sources other than just HBase).







> Possible bug in creation of Drill columns for HBase column families
> -------------------------------------------------------------------
>
>                 Key: DRILL-3955
>                 URL: https://issues.apache.org/jira/browse/DRILL-3955
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Daniel Barclay (Drill)
>
> If all of the rows read by a given {{HBaseRecordReader}} have no HBase 
> columns in a given HBase column family, {{HBaseRecordReader}} doesn't create 
> a Drill column for that HBase column family.
> Later, in a {{ProjectRecordBatch}}'s {{setupNewSchema}}, because no Drill 
> column exists for that HBase column family, that {{setupNewSchema}} creates a 
> dummy Drill column using the usual {{NullableIntVector}} type.  In 
> particular, it is not a map vector as {{HBaseRecordReader}} creates when it 
> sees an HBase column family.
> Should {{HBaseRecordReader}} and/or something around setting up for reading 
> HBase (including setting up that {{ProjectRecordBatch}}) make sure that all 
> HBase column families are represented with map vectors so that 
> {{setupNewSchema}} doesn't create a dummy field of type {{NullableIntVector}}?
> The problem is that, currently, when an HBase table is read in two separate 
> fragments, one fragment (seeing rows with columns in the column family) can 
> get a map vector for the column family while the other (seeing only rows with 
> no columns in the column familar) can get the {{NullableIntVector}}.  
> Downstream code that receives the two batches ends up with an unresolved 
> conflict, yielding IndexOutOfBoundsExceptions as in DRILL-3954.
> It's not clear whether there is only one bug\--that downstream code doesn't 
> resolve {{NullableIntValue}} dummy fields right (DRILL-TBD)\--or two\--that 
> the HBase reading code should set up a Drill column for every HBase column 
> family (regardless of whether it has any columns in the rows that were read) 
> and that downstream code doesn't resolve {{NullableIntValue}} dummy fields 
> (resolution is applicable to sources other than just HBase).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to