As I told you in the other message, if you don't addColumn() the column you are filtering on, by default it will return any row that doesn't contain the said column: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.html#setFilterIfMissing(boolean)
So when you uncomment the addColumn(), the filter kicks in and actually filters values. When the addColumn() is commented, all rows are returned. On Mar 15, 2012, at 20:05 , Peter Wolf wrote: > Huh! That's what I was afraid you'd say. I'm still confused :-( > > If "it will give all rows that contain _any_ of these families", then > why does adding a family give me *less* rows? > > Leaving my row start/stop and filtering code constant, and just > un-commenting an addFamily() dramatically reduces the number of results > returned from a scan. > > P > > > > On 3/15/12 2:42 PM, Himanshu Vashishtha wrote: >> " Let's also say there are 1000 rows with A,B,C and 500 rows with only B and >> C. >> >> If I add families A, B and C and scan with no filter will I get 1500, >> 1000 or 500 results?" >> >> In this case, you will get 1000 rows. In case you add only B, you will >> get 500 rows. >> >> It's not like if you add families A, B and C, it will give you _only_ >> those rows that have _all_ three families; rather it will give all >> rows that contain _any_ of these families. >> >> Hope this helps. >> >> Experts are welcome to chime in if I am missing something :) >> >> Thanks, >> Himanshu >> >> >> On Thu, Mar 15, 2012 at 11:48 AM, Peter Wolf<[email protected]> wrote: >>> Hi Lars, still confused... >>> >>> My table *should* have values for families A, B and C. Let's say I have a >>> bug, and some rows only have values for B and C. Let's also say there are >>> 1000 rows with A,B,C and 500 rows with only B and C. >>> >>> If I add families A, B and C and scan with no filter will I get 1500, 1000 >>> or 500 results? >>> >>> Many thanks >>> P >>> >>> >>> >>> >>> On 3/15/12 1:17 PM, lars hofhansl wrote: >>>> Hi haijia, >>>> >>>> In that case HBase will still return the data for columns in family B and >>>> C.But if you only added family A then HBase would only return "rows" for >>>> which family A has any columns. >>>> >>>> -- Lars >>>> ________________________________ >>>> >>>> From: Haijia Zhou<[email protected]> >>>> To: [email protected]; lars hofhansl<[email protected]> >>>> Sent: Thursday, March 15, 2012 10:12 AM >>>> Subject: Re: Scan.addFamiliy reduces results >>>> >>>> >>>> I have the same confusion. Say if I added three column families A, B anc C >>>> to the scan, now if a row has data for column family B and C but no data >>>> for >>>> A, then it won't be returned in the next() method? >>>> What if the requirement is to get row data regardless of whether there's >>>> data for a specific column family or not? >>>> >>>> >>>> On Thu, Mar 15, 2012 at 1:04 PM, lars hofhansl<[email protected]> >>>> wrote: >>>> >>>> Hi Peter, >>>>> for HBase you have keep in mind that it is a sparse columnar (or >>>>> KeyValue) store: (rowkey, columnfamily, column, TS) -> value >>>>> >>>>> A scan only returns those KeyValues that match the scan. So when you set >>>>> families on your scan you'll only get those rows for which the scan found >>>>> any columns. >>>>> >>>>> Makes sense? >>>>> >>>>> -- Lars >>>>> >>>>> >>>>> >>>>> ________________________________ >>>>> From: Peter Wolf<[email protected]> >>>>> To: [email protected] >>>>> Sent: Thursday, March 15, 2012 9:52 AM >>>>> Subject: Re: Scan.addFamiliy reduces results >>>>> >>>>> >>>>> Thanks Doug, >>>>> >>>>> I had read that, and I just read it again. But I am missing something... >>>>> >>>>> Why does adding a family reduce the number of results? Is there an >>>>> implied filter of some form? Does addFamily add some constraint on >>>>> which rows are returned? >>>>> >>>>> Note that all my rows *ought* to have values in all the families. >>>>> >>>>> Thanks >>>>> Peter >>>>> >>>>> On 3/15/12 12:39 PM, Doug Meil wrote: >>>>>> re: "However, I am getting different number of results, depending on >>>>>> which families are added" >>>>>> >>>>>> Yes. >>>>>> >>>>>> I'd suggest you read this in the RefGuide. >>>>>> >>>>>> http://hbase.apache.org/book.html#datamodel >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 3/15/12 12:08 PM, "Peter Wolf"<[email protected]> wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I am doing a scan on a table with multiple families. My code looks >>>>>>> like >>>>>>> this... >>>>>>> >>>>>>> Scan scan = new Scan(calculateStartRowKey(a), >>>>>>> calculateEndRowKey(b)); >>>>>>> >>>>>>> scan.setCaching(10000); >>>>>>> Filter filter = new SingleColumnValueFilter(xFamily, xColumn, >>>>>>> CompareFilter.CompareOp.EQUAL, Bytes.toBytes(x)); >>>>>>> scan.setFilter(filter); >>>>>>> scan >>>>>>> .addFamily(xFamily) >>>>>>> .addFamily(yFamily) >>>>>>> .addFamily(zFamily); >>>>>>> >>>>>>> ResultScanner scanner = hTable.getScanner(scan); >>>>>>> >>>>>>> Iterator<Result> it = scanner.iterator(); >>>>>>> int resultCount = 0; >>>>>>> while (it.hasNext()) { >>>>>>> Result result = it.next(); >>>>>>> >>>>>>> resultCount++; >>>>>>> } >>>>>>> >>>>>>> However, I am getting different number of results, depending on which >>>>>>> families are added. For example these give different result counts >>>>>>> >>>>>>> scan >>>>>>> //.addFamily(xFamily) >>>>>>> .addFamily(yFamily) >>>>>>> .addFamily(zFamily); >>>>>>> and >>>>>>> scan >>>>>>> .addFamily(xFamily) >>>>>>> .addFamily(yFamily) >>>>>>> .addFamily(zFamily); >>>>>>> >>>>>>> >>>>>>> There is no error message, and I don't see anything in the Scan >>>>>>> documentation. Does anyone know what is going on? >>>>>>> >>>>>>> Thanks >>>>>>> Peter >>>>>>> >>>>>>> >>>>>>> >
