Hi haijia, In that case HBase will still return the data for columns in family B and C.But if you only added family A then HBase would only return "rows" for which family A has any columns.
-- Lars ________________________________ From: Haijia Zhou <[email protected]> To: [email protected]; lars hofhansl <[email protected]> Sent: Thursday, March 15, 2012 10:12 AM Subject: Re: Scan.addFamiliy reduces results I have the same confusion. Say if I added three column families A, B anc C to the scan, now if a row has data for column family B and C but no data for A, then it won't be returned in the next() method? What if the requirement is to get row data regardless of whether there's data for a specific column family or not? On Thu, Mar 15, 2012 at 1:04 PM, lars hofhansl <[email protected]> wrote: Hi Peter, >for HBase you have keep in mind that it is a sparse columnar (or KeyValue) >store: (rowkey, columnfamily, column, TS) -> value > >A scan only returns those KeyValues that match the scan. So when you set >families on your scan you'll only get those rows for which the scan found any >columns. > >Makes sense? > >-- Lars > > > >________________________________ > From: Peter Wolf <[email protected]> >To: [email protected] >Sent: Thursday, March 15, 2012 9:52 AM >Subject: Re: Scan.addFamiliy reduces results > > >Thanks Doug, > >I had read that, and I just read it again. But I am missing something... > >Why does adding a family reduce the number of results? Is there an >implied filter of some form? Does addFamily add some constraint on >which rows are returned? > >Note that all my rows *ought* to have values in all the families. > >Thanks >Peter > >On 3/15/12 12:39 PM, Doug Meil wrote: >> re: "However, I am getting different number of results, depending on >> which families are added" >> >> Yes. >> >> I'd suggest you read this in the RefGuide. >> >> http://hbase.apache.org/book.html#datamodel >> >> >> >> >> >> On 3/15/12 12:08 PM, "Peter Wolf"<[email protected]> wrote: >> >>> Hi all, >>> >>> I am doing a scan on a table with multiple families. My code looks like >>> this... >>> >>> Scan scan = new Scan(calculateStartRowKey(a), >>> calculateEndRowKey(b)); >>> >>> scan.setCaching(10000); >>> Filter filter = new SingleColumnValueFilter(xFamily, xColumn, >>> CompareFilter.CompareOp.EQUAL, Bytes.toBytes(x)); >>> scan.setFilter(filter); >>> scan >>> .addFamily(xFamily) >>> .addFamily(yFamily) >>> .addFamily(zFamily); >>> >>> ResultScanner scanner = hTable.getScanner(scan); >>> >>> Iterator<Result> it = scanner.iterator(); >>> int resultCount = 0; >>> while (it.hasNext()) { >>> Result result = it.next(); >>> >>> resultCount++; >>> } >>> >>> However, I am getting different number of results, depending on which >>> families are added. For example these give different result counts >>> >>> scan >>> //.addFamily(xFamily) >>> .addFamily(yFamily) >>> .addFamily(zFamily); >>> and >>> scan >>> .addFamily(xFamily) >>> .addFamily(yFamily) >>> .addFamily(zFamily); >>> >>> >>> There is no error message, and I don't see anything in the Scan >>> documentation. Does anyone know what is going on? >>> >>> Thanks >>> Peter >>> >>> >>> >>
