The regionserver.out is empty. The regionserver.log contains only the following for the relevant time period:
Thu Jan 6 12:19:57 EST 2011 Starting regionserver on istevens.syncapse.local ulimit -n 256 2011-01-06 12:19:59,588 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Not starting a distinct region server because hbase.cluster.distributed is false Ian. On 2011-01-06, at 1:32 PM, Dmitriy Ryaboy wrote: > Do you happen to have the region server logs as well? > The .out as well as .log > > D > > On Thu, Jan 6, 2011 at 9:49 AM, Ian Stevens <[email protected]> wrote: > >> On 2011-01-05, at 5:23 PM, Dmitriy Ryaboy wrote: >> >>> That certainly sounds like a bug. I wonder if there is anything >> interesting >>> in the HBase logs when you run the job that gets the wrong result? >> >> Hi Dmitriy. I've posted the corresponding master.log and zookeeper.log from >> about the time of the failed query. I restarted HBase before making the >> query, so there might be noise in the log associated with a restart. >> >> master.log: http://pastebin.com/VwiXZ9BB >> zookeeper.log: http://pastebin.com/CnFVyFT2 >> >> I believe logging level is set to DEBUG for both logs. >> >> Let me know if you need further logging. >> >> thanks, >> Ian. >> >> >>> On Wed, Jan 5, 2011 at 1:14 PM, Ian Stevens <[email protected]> >> wrote: >>> >>>> Hi everyone. In considering Pig for our HBase querying needs, I've run >> into >>>> a discrepancy between the size of Pig's result set and the size of the >> table >>>> being queried. I hope this is due to a misunderstanding of HBase and Pig >> on >>>> my part. The test case which generates the discrepancy is fairly simple, >>>> however. >>>> >>>> The link below contains a Jython script which populates an HBase table >> with >>>> data in two column familes. A corresponding Pig query retrieves data for >> one >>>> column and saves it to a CSV: >>>> >>>> https://gist.github.com/766929 >>>> >>>> The Jython script has the following usage: >>>> >>>>> jython hbase_test.py [table] [column count] [row count] [batch count] >>>> >>>> This will populate a table named [table] with two column families. The >>>> first contains static data. The second contains the given number of >> columns, >>>> populated with data. >>>> >>>> The Pig query will return an inaccurate number of results for certain >> table >>>> sizes and configurations, most notably with tables exceeding 1.8 million >>>> rows in length and with more than 2 columns in the queried column >> family, >>>> eg. >>>> >>>>> jython hbase_test.py test 3 1800000 100000 >>>> >>>> For instance, if I execute the above command and the corresponding Pig >>>> query, the results number 905914. Note that if the table is re-populated >> and >>>> queried a second time, a different number results. If I run the query >> again >>>> without re-populating the table, I get the same number of results. The >> HBase >>>> shell returns an accurate row count. >>>> >>>> Some notes on reproducing this issue (or not): >>>> >>>> * If the Jython script doesn't populate the meta column family, the >> issue >>>> goes away with the same query. >>>> * If the Jython script populates 2 columns instead of 3, the issue goes >>>> away with the same query. >>>> * The size of the column key or its value may influence whether the >> issue >>>> occurs. >>>> For instance, if I change the script to store 'value_%d' instead of >>>> 'value_%d_%d', retaining the random int, the issue goes away with the >> same >>>> query. >>>> >>>> I am using Pig 0.8.0 and HBase 0.20.6 on a MacBook running Snow Leopard >>>> using the stock Java that came with the OS. Attached is a log of the Pig >>>> console output. The error logs contain nothing of import. >>>> >>>> Am I doing anything incorrectly? Is there a way I can work around this >>>> issue without compromising the column family being queried? >>>> >>>> This appears to be a fairly simple case of Pig/HBase usage. Can anyone >> else >>>> reproduce the issue? >>>> >>>> thanks, >>>> Ian. >>>> >>>> >> >>
