Thanks Anil,

Any idea why the 'select count(*)' query would return '0' when I can query 
other ways and get data?

________________________________
From: Anil Gupta [anilgupt...@gmail.com]
Sent: Saturday, July 25, 2015 1:36 PM
To: user@phoenix.apache.org
Subject: Re: Exception from RowCounter

Hey Zack,

As per the row_counter job error, it seems like your row counter job does not 
have hbase-site.xml file in its classpath and due to  that it's not connecting 
to correct cluster. Check which zookeeper cluster that job is connecting?

Sent from my iPhone

On Jul 25, 2015, at 1:23 PM, James Taylor 
<jamestay...@apache.org<mailto:jamestay...@apache.org>> wrote:

Hi Zack,
If you find an issue, the best way for us is to come up with the smallest set 
of data the demonstrates the issue and a small script that consistently 
reproduces it. It's just not feasible for us to work from a billion row data 
set.

RowCounter is not  a Phoenix API, so you'll need to go to the HBase email lists 
for help on that.

>From a schema design perspective, it seems a bit unwieldy what you're doing. 
>Why not just salt the data table instead of manually salting it and managing 
>that yourself?

Thanks,
James

On Sat, Jul 25, 2015 at 4:04 AM, Riesland, Zack 
<zack.riesl...@sensus.com<mailto:zack.riesl...@sensus.com>> wrote:
I decided to start from scratch with my table schema in attempt to get a better 
distribution across my regions/region servers.

So, I created a table like this:

CREATE TABLE fma.er_keyed_gz_hashed_indexed_meterkey_immutable (
    hashed_key varchar not null,
    meter_key varchar ,
    …
    endpoint_id integer,
    sample_point  integer not null,
   …
    CONSTRAINT pk_fma_er_keyed_gz_hashed_indexed_meterkey_immutable PRIMARY KEY 
(hashed_key, sample_point)
    )
    COMPRESSION='GZ'
    SPLIT ON ('0-', '1-', '2-', '3-', '4-', '5-', '6-', '7-', '8-', '9-', 
'10-', '11-', '12-', '13-', '14-', '15-', '16-', '17-', '18-', '19-', '20-', 
'21-', '22-', '23-', '24-', '25-', '26-', '27-', '28-', '29-', '30-', '31-', 
'32-', '33-', '34-', '35-', '36-', '37-', '38-', '39-', '40-', '41-', '42-', 
'43-', '44-', '45-', '46-', '47-', '48-', '49-', '50-', '51-', '52-', '53-', 
'54-', '55-', '56-', '57-', '58-', '59-', '60-', '61-', '62-', '63-', '64-', 
'65-', '66-', '67-', '68-', '69-', '70-', '71-', '72-', '73-', '74-', '75-', 
'76-', '77-', '78-', '79-', '80-', '81-', '82-', '83-', '84-', '85-', '86-', 
'87-', '88-', '89-', '90-', '91-', '92-', '93-', '94-', '95-', '96-', '97-', 
'98-', '99-', '100-', '101-', '102-', '103-', '104-', '105-', '106-', '107-', 
'108-', '109-', '110-', '111-', '112-', '113-', '114-', '115-', '116-', '117-', 
'118-', '119-', '120-', '121-', '122-', '123-', '124-', '125-', '126-', '127-', 
'128-', '129-', '130-', '131-', '132-', '133-', '134-', '135-', '136-', '137-', 
'138-', '139-', '140-', '141-', '142-', '143-', '144-', '145-', '146-', '147-', 
'148-', '149-', '150-', '151-', '152-', '153-', '154-', '155-', '156-', '157-', 
'158-', '159-', '160-', '161-', '162-', '163-', '164-', '165-', '166-', '167-', 
'168-', '169-', '170-', '171-', '172-', '173-', '174-', '175-', '176-', '177-', 
'178-', '179-', '180-', '181-', '182-', '183-', '184-', '185-', '186-', '187-', 
'188-', '189-', '190-', '191-', '192-', '193-', '194-', '195-', '196-', '197-', 
'198-', '199-', '200-', '201-', '202-', '203-', '204-', '205-', '206-', '207-', 
'208-', '209-', '210-', '211-', '212-', '213-', '214-', '215-', '216-', '217-', 
'218-', '219-', '220-', '221-', '222-', '223-', '224-', '225-', '226-', '227-', 
'228-', '229-', '230-', '231-', '232-', '233-', '234-', '235-', '236-', '237-', 
'238-', '239-', '240-', '241-', '242-', '243-', '244-', '245-', '246-', '247-', 
'248-', '249-', '250-', '251-', '252-', '253-', '254-', '255-', '256-', '257-', 
'258-', '259-', '260-', '261-', '262-', '263-', '264-', '265-', '266-', '267-', 
'268-', '269-', '270-', '271-', '272-', '273-', '274-', '275-', '276-', '277-', 
'278-', '279-', '280-', '281-', '282-', '283-', '284-', '285-', '286-', '287-', 
'288-', '289-', '290-', '291-', '292-', '293-', '294-', '295-', '296-', '297-', 
'298-', '299-', '300-', '301-', '302-', '303-', '304-', '305-', '306-', '307-', 
'308-', '309-', '310-', '311-', '312-', '313-', '314-', '315-', '316-', '317-', 
'318-', '319-', '320-', '321-', '322-', '323-', '324-', '325-', '326-', '327-', 
'328-', '329-', '330-', '331-', '332-', '333-', '334-', '335-', '336-', '337-', 
'338-', '339-', '340-', '341-', '342-', '343-', '344-', '345-', '346-', '347-', 
'348-', '349-', '350-', '351-', '352-', '353-', '354-', '355-', '356-', '357-', 
'358-', '359-', '360-', '361-', '362-', '363-', '364-', '365-', '366-', '367-', 
'368-', '369-', '370-', '371-', '372-', '373-', '374-', '375-', '376-', '377-', 
'378-', '379-', '380-', '381-', '382-', '383-', '384-', '385-', '386-', '387-', 
'388-', '389-', '390-', '391-', '392-', '393-', '394-', '395-', '396-', '397-', 
'398-', '399-', '400-', '401-', '402-', '403-', '404-', '405-', '406-', '407-', 
'408-', '409-', '410-', '411-', '412-', '413-', '414-', '415-', '416-', '417-', 
'418-', '419-', '420-', '421-', '422-', '423-', '424-', '425-', '426-', '427-', 
'428-', '429-', '430-', '431-', '432-', '433-', '434-', '435-', '436-', '437-', 
'438-', '439-', '440-', '441-', '442-', '443-', '444-', '445-', '446-', '447-', 
'448-', '449-', '450-', '451-', '452-', '453-', '454-', '455-', '456-', '457-', 
'458-', '459-', '460-', '461-', '462-', '463-', '464-', '465-', '466-', '467-', 
'468-', '469-', '470-', '471-', '472-', '473-', '474-', '475-', '476-', '477-', 
'478-', '479-', '480-', '481-', '482-', '483-', '484-', '485-', '486-', '487-', 
'488-', '489-', '490-', '491-', '492-', '493-', '494-', '495-', '496-', '497-', 
'498-', '499-', '500-', '501-', '502-', '503-', '504-', '505-', '506-', '507-', 
'508-', '509-', '510-', '511-')

The hashed key is the endpoint_id % 511, which is why the splitting is done 
this way.

I also added 2 secondary indexes, because I need to be able to query based on 
meter_key or endpoint_id + sample_point:

CREATE INDEX fma_er_keyed_gz_hashed_indexed_endpoint_include_sample_point on 
fma.er_keyed_gz_hashed_indexed_meterkey_immutable (endpoint_id) include 
(sample_point) SALT_BUCKETS = 256; --256 is max

CREATE INDEX fma_er_keyed_gz_hashed_indexed_meterkey on 
fma.er_keyed_gz_hashed_indexed_meterkey_immutable (meter_key) SALT_BUCKETS = 
256; --256 is max

This all seemed to work well.

Then I use the bulk import from CSV tool and imported about 1 billion rows.

This also seemed to work.

I can query by hashed_key and get immediate results.

If I query by endpoint_id or meter_key, it is MUCH slower (clue 1 that there’s 
a problem), but it gives me results eventually.

However, when I try to get a count, it returns 0:


> select count(*) from fma.er_keyed_gz_hashed_indexed_meterkey_immutable;
+------------+
|  COUNT(1)  |
+------------+
| 0          |
+------------+
1 row selected (0.075 seconds)

And when I try to do a RowCounter job, it fails completely (see below).

Can anyone help me diagnose what is going on here?

This is a cluster with 6 large region servers, that each have about 600 regions 
(from this and other tables).

This table has about 12 columns and 1 billion rows currently.

Thanks!

#  hbase org.apache.hadoop.hbase.mapreduce.RowCounter 
fma.er_keyed_gz_hashed_indexed_meterkey_immutable;


2015-07-25 06:53:36,425 DEBUG [main] util.RegionSizeCalculator: Region sizes 
calculated
2015-07-25 06:53:36,427 WARN  [main] hbase.HBaseConfiguration: Config option 
"hbase.regionserver.lease.period" is deprecated. Instead, use 
"hbase.client.scanner.timeout.period"
2015-07-25 06:53:36,468 WARN  [main] 
client.ConnectionManager$HConnectionImplementation: Encountered problems when 
prefetch hbase:meta table:
org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in hbase:meta 
for table: fma.er_keyed_gz_hashed_indexed_meterkey_immutable, 
row=fma.er_keyed_gz_hashed_indexed_meterkey_immutable,,99999999999999
        at 
org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:164)
        at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.prefetchRegionCache(ConnectionManager.java:1222)
        at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1286)
        at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1135)
        at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1118)
        at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1075)
        at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:909)
        at 
org.apache.hadoop.hbase.client.HTable.getRegionLocation(HTable.java:528)
        at 
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:165)
        at 
org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)
        at 
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)
        at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
        at 
org.apache.hadoop.hbase.mapreduce.RowCounter.main(RowCounter.java:191)
2015-07-25 06:53:36,471 INFO  [main] mapreduce.JobSubmitter: Cleaning up the 
staging area /user/root/.staging/job_1437395072897_1775
Exception in thread "main" org.apache.hadoop.hbase.TableNotFoundException: 
fma.er_keyed_gz_hashed_indexed_meterkey_immutable
        at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1319)
        at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1135)
        at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1118)
        at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1075)
        at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:909)
        at 
org.apache.hadoop.hbase.client.HTable.getRegionLocation(HTable.java:528)
        at 
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:165)
        at 
org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)
        at 
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)
        at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
        at 
org.apache.hadoop.hbase.mapreduce.RowCounter.main(RowCounter.java:191)

Reply via email to