Huh!  That's what I was afraid you'd say.  I'm still confused :-(

If "it will give all rows that contain _any_ of these families", then why does adding a family give me *less* rows?

Leaving my row start/stop and filtering code constant, and just un-commenting an addFamily() dramatically reduces the number of results returned from a scan.

P



On 3/15/12 2:42 PM, Himanshu Vashishtha wrote:
" Let's also say there are 1000 rows with A,B,C and 500 rows with only B and C.

If I add families A, B and C and scan with no filter will I get 1500,
1000 or 500 results?"

In this case, you will get 1000 rows. In case you add only B, you will
get 500 rows.

It's not like if you add families A, B and C, it will give you _only_
those rows that have _all_ three families; rather it will give all
rows that contain _any_ of these families.

Hope this helps.

Experts are welcome to chime in if I am missing something :)

Thanks,
Himanshu


On Thu, Mar 15, 2012 at 11:48 AM, Peter Wolf<[email protected]>  wrote:
Hi Lars, still confused...

My table *should* have values for families A, B and C.  Let's say I have a
bug, and some rows only have values for B and C.  Let's also say there are
1000 rows with A,B,C and 500 rows with only B and C.

If I add families A, B and C and scan with no filter will I get 1500, 1000
or 500 results?

Many thanks
P




On 3/15/12 1:17 PM, lars hofhansl wrote:
Hi haijia,

In that case HBase will still return the data for columns in family B and
C.But if you only added family A then HBase would only return "rows" for
which family A has any columns.

-- Lars
________________________________

From: Haijia Zhou<[email protected]>
To: [email protected]; lars hofhansl<[email protected]>
Sent: Thursday, March 15, 2012 10:12 AM
Subject: Re: Scan.addFamiliy reduces results


I have the same confusion. Say if I added three column families A, B anc C
to the scan, now if a row has data for column family B and C but no data for
A, then it won't be returned  in the next() method?
What if the requirement is to get row data regardless of whether there's
data for a specific column family or not?


On Thu, Mar 15, 2012 at 1:04 PM, lars hofhansl<[email protected]>
  wrote:

Hi Peter,
for HBase you have keep in mind that it is a sparse columnar (or
KeyValue) store: (rowkey, columnfamily, column, TS) ->    value

A scan only returns those KeyValues that match the scan. So when you set
families on your scan you'll only get those rows for which the scan found
any columns.

Makes sense?

-- Lars



________________________________
  From: Peter Wolf<[email protected]>
To: [email protected]
Sent: Thursday, March 15, 2012 9:52 AM
Subject: Re: Scan.addFamiliy reduces results


Thanks Doug,

I had read that, and I just read it again.  But I am missing something...

Why does adding a family reduce the number of results?  Is there an
implied filter of some form?  Does addFamily add some constraint on
which rows are returned?

Note that all my rows *ought* to have values in all the families.

Thanks
Peter

On 3/15/12 12:39 PM, Doug Meil wrote:
re:  "However, I am getting different number of results, depending on
which families are added"

Yes.

I'd suggest you read this in the RefGuide.

http://hbase.apache.org/book.html#datamodel





On 3/15/12 12:08 PM, "Peter Wolf"<[email protected]>     wrote:

Hi all,

I am doing a scan on a table with multiple families.  My code looks
like
this...

           Scan scan = new Scan(calculateStartRowKey(a),
calculateEndRowKey(b));

           scan.setCaching(10000);
           Filter filter = new SingleColumnValueFilter(xFamily, xColumn,
CompareFilter.CompareOp.EQUAL, Bytes.toBytes(x));
           scan.setFilter(filter);
           scan
                   .addFamily(xFamily)
                   .addFamily(yFamily)
                   .addFamily(zFamily);

           ResultScanner scanner = hTable.getScanner(scan);

           Iterator<Result>     it = scanner.iterator();
           int resultCount = 0;
           while (it.hasNext()) {
                 Result result = it.next();

                 resultCount++;
           }

However, I am getting different number of results, depending on which
families are added.  For example these give different result counts

           scan
                   //.addFamily(xFamily)
                   .addFamily(yFamily)
                   .addFamily(zFamily);
and
           scan
                   .addFamily(xFamily)
                   .addFamily(yFamily)
                   .addFamily(zFamily);


There is no error message, and I don't see anything in the Scan
documentation.  Does anyone know what is going on?

Thanks
Peter




Reply via email to