I've done some more testing with a simple table on a pseudo-distributed system (my laptop). Below is the test script with the various tests' outputs. I guess where I'm really getting confused is when I query for info:lame = Washington and want the info:fname column returned, why are all fname's getting returned, not just Washington (row 3…).
The more complex table I was working with earlier, both on my laptop and a
cluster, contains information about network traffic. I was attempting to only
return the keys for all hosts from a given country. I added a column to
restrict the amount of stuff I was getting back, since I was only expecting to
see the row key we defined during puts. Maybe I'm actually getting back
exactly what I'm supposed to. Not sure. I could still be thinking too
"yesSQL".
I also noticed that I was able to actually see the values contained in the
returnes from the more complex table, while here, I'm getting vlen=X, and when
running the shell, I get back the actual name value, i.e., George.
Scott
include Java
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.hbase.client.HTable
import org.apache.hadoop.hbase.client.ResultScanner
import org.apache.hadoop.hbase.client.Scan
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.FilterList
import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter
import org.apache.hadoop.hbase.filter.KeyOnlyFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.util.Bytes
conf = HBaseConfiguration.new
admin = HBaseAdmin.new(conf)
table = HTable.new('sample_names')
scan = Scan.new
result_scanner = ResultScanner.new
filterList = FilterList.new
filterList.addFilter(SingleColumnValueFilter.new(Bytes.toBytes('info'),Bytes.toBytes('lname'),CompareFilter::CompareOp.valueOf('EQUAL'),SubstringComparator.new('Washington')))
filterList.addFilter(KeyOnlyFilter.new)
#scan.addColumn(Bytes.toBytes('info'), Bytes.toBytes('lname'))
scan.addColumn(Bytes.toBytes('info'), Bytes.toBytes('fname'))
scan.setFilter(filterList)
result_scanner = table.getScanner(scan)
result_scanner.each do |res|
puts(res)
end
result_scanner.close
# scan of table from shell:
# ROW COLUMN+CELL
# 1 column=info:fname, timestamp=1326979816243, value=John
# 1 column=info:lname, timestamp=1326979823380, value=Smith
# 2 column=info:fname, timestamp=1326979829610, value=Jane
# 2 column=info:lname, timestamp=1326979834954, value=Doe
# 3 column=info:fname, timestamp=1326979841429, value=George
# 3 column=info:lname, timestamp=1326979849407,
value=Washington
# 4 column=info:fname, timestamp=1326979856746, value=Ben
# 4 column=info:lname, timestamp=1326979862339,
value=Franklin
#
# with empty filter list:
# keyvalues={1/info:fname/1326979816243/Put/vlen=4,
1/info:lname/1326979823380/Put/vlen=5}
# keyvalues={2/info:fname/1326979829610/Put/vlen=4,
2/info:lname/1326979834954/Put/vlen=3}
# keyvalues={3/info:fname/1326979841429/Put/vlen=6,
3/info:lname/1326979849407/Put/vlen=10}
# keyvalues={4/info:fname/1326979856746/Put/vlen=3,
4/info:lname/1326979862339/Put/vlen=8}
#
# with only KeyOnlyFilter in list:
# keyvalues={1/info:fname/1326979816243/Put/vlen=0,
1/info:lname/1326979823380/Put/vlen=0}
# keyvalues={2/info:fname/1326979829610/Put/vlen=0,
2/info:lname/1326979834954/Put/vlen=0}
# keyvalues={3/info:fname/1326979841429/Put/vlen=0,
3/info:lname/1326979849407/Put/vlen=0}
# keyvalues={4/info:fname/1326979856746/Put/vlen=0,
4/info:lname/1326979862339/Put/vlen=0}
#
# with info:lname column and empty filter list:
# keyvalues={1/info:lname/1326979823380/Put/vlen=5}
# keyvalues={2/info:lname/1326979834954/Put/vlen=3}
# keyvalues={3/info:lname/1326979849407/Put/vlen=10}
# keyvalues={4/info:lname/1326979862339/Put/vlen=8}
#
# with info:lname column and KeyOnlyFilter:
# keyvalues={1/info:lname/1326979823380/Put/vlen=0}
# keyvalues={2/info:lname/1326979834954/Put/vlen=0}
# keyvalues={3/info:lname/1326979849407/Put/vlen=0}
# keyvalues={4/info:lname/1326979862339/Put/vlen=0}
#
# as above, but adding column after setting filter:
# keyvalues={1/info:lname/1326979823380/Put/vlen=0}
# keyvalues={2/info:lname/1326979834954/Put/vlen=0}
# keyvalues={3/info:lname/1326979849407/Put/vlen=0}
# keyvalues={4/info:lname/1326979862339/Put/vlen=0}
#
# with only SingleColumnValueFilter in list:
# keyvalues={3/info:fname/1326979841429/Put/vlen=6,
3/info:lname/1326979849407/Put/vlen=10}
#
# with KeyOnlyFilter then SingleColumnValueFilter in list:
# <returns nothing>
#
# with SingleColumnValueFilter then KeyOnlyFilter in list:
# keyvalues={3/info:fname/1326979841429/Put/vlen=0,
3/info:lname/1326979849407/Put/vlen=0}
#
# with SingleColumnValueFilter then KeyOnlyFilter in list, set filter, add
info:fname column:
# keyvalues={1/info:fname/1326979816243/Put/vlen=0}
# keyvalues={2/info:fname/1326979829610/Put/vlen=0}
# keyvalues={3/info:fname/1326979841429/Put/vlen=0}
# keyvalues={4/info:fname/1326979856746/Put/vlen=0}
#
# with SingleColumnValueFilter then KeyOnlyFilter in list, add info:fname
column, set filter:
# keyvalues={1/info:fname/1326979816243/Put/vlen=0}
# keyvalues={2/info:fname/1326979829610/Put/vlen=0}
# keyvalues={3/info:fname/1326979841429/Put/vlen=0}
# keyvalues={4/info:fname/1326979856746/Put/vlen=0}
--- Let us all bask in television's warm glowing warming glow ---
Scott Brunza 860.326.3637 [email protected]
This e-mail and any files transmitted with it may be proprietary
and are intended solely for the use of the individual or entity
to whom they are addressed. If you have received this e-mail in
error please notify the sender.
smime.p7s
Description: S/MIME cryptographic signature
