Hello Mehmet, If ProjectIds are sequential, then it is definitely not a feasible approach. Division is just to make sure that all the regions are evenly loaded. You can create pre-splitted tables to avoid the region hotspotting. Alternatively hash your rowkeys so that all the regionservers receive equal load.
Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Mon, Feb 18, 2013 at 4:48 AM, Michael Segel <[email protected]>wrote: > I'm not sure how a SQL interface above HBase will solve some of the issues > with regional hot spotting when using time as the key. Or the problem with > always adding data to the right of the last row. > > The same would apply with the project id, assuming that it too is a number > that grows incrementally with each project. > On Feb 17, 2013, at 4:50 PM, James Taylor <[email protected]> wrote: > > > Hello, > > Have you considered using Phoenix ( > https://github.com/forcedotcom/phoenix) for this use case? Phoenix is a > SQL layer on top of HBase. For this use case, you'd connect to your cluster > like this: > > > > Class.forName("com.salesforce.phoenix.jdbc.PhoenixDriver"); // register > driver > > Connection conn = > DriverManager..getConnection("jdbc:phoenix:localhost"); // connect to local > HBase > > > > Create a table like this (adding additional columns that you want to > measure, like txn_count below): > > > > conn.createStatement().execute( > > "CREATE TABLE event_log (\n" + > > " project_id INTEGER NOT NULL, \n" + > > " time DATE NOT NULL,\n" + > > "txn_count LONG\n" + > > "CONSTRAINT pk PRIMARY KEY (project_id, time))"); > > > > Then to insert data you'd do this: > > > > PreparedStatement preparedStmt = conn.prepareStatement( > > "UPSERT INTO event_log VALUES(?,?,0)"); > > > > and you'd bind the values in JDBC like this: > > > > preparedStmt.setInt(1, projectId); > > preparedStmt.setDate(2, time); > > preparedStmt.execute(); > > > > conn.commit(); // If upserting many values, you'd want to commit after > upserting maybe 1000-10000 rows > > > > Then at query data time, assuming you want to report on this data by > grouping into different "time buckets", you could do as show below. Phoenix > stores your date values at the millisecond granularity and you can decide a > query time how you'd like to roll it up: > > > > // Query with time bucket at the hour granularity > > conn.createStatement().execute( > > "SELECT\n" + > > " project_id, TRUNC(time,'HOUR') as time_bucket, \n" + > > " MIN(txnCount), MAX(txnCount), AVG(txnCount) FROM event_log\n" + > > "GROUP BY project_id, TRUNC(time,'HOUR')"); > > > > // Query with time bucket at the day granularity > > conn.createStatement().execute( > > "SELECT\n" + > > " project_id, TRUNC(time,'DAY') as time_bucket,\n" + > > " MIN(txnCount), MAX(txnCount), AVG(txnCount) FROM event_log\n" + > > "GROUP BY project_id, TRUNC(time,'DAY')"); > > > > You could, of course include a WHERE clause in the query to filter based > on the range of dates, particular projectIds, etc. like this: > > > > conn.prepareStatement( > > "SELECT\n" + > > " project_id, TRUNC(time,'DAY') as time_bucket,\n" + > > " MIN(txnCount), MAX(txnCount), AVG(txnCount) FROM event_log\n" + > > "WHERE project_id IN (?, ?, ?) AND date >= ? AND date < ?\n" + > > "GROUP BY project_id, TRUNC(time,'DAY')"); > > preparedStmt.setInt(1, projectId1); > > preparedStmt.setInt(2, projectId2); > > preparedStmt.setInt(3, projectId3); > > preparedStmt.setDate(4, beginDate); > > preparedStmt.setDate(5, endDate); > > preparedStmt.execute(); > > > > > > HTH. > > > > Regards, > > > > James > > > > On 02/17/2013 11:33 AM, Mehmet Simsek wrote: > >> Hi, > >> > >> I want to hold event log data in hbase but I couldn't decide row key. I > must hold project id and time,I will use project ld and time combination > while searching. > >> > >> Row key can be below > >> > >> ProjectId+timeInMs > >> > >> In similiar application(open source TSDB) time is divided 1000 to round > in this project.I can use this strategy but I don't know how we decide what > divider must be? 1000 or 10000. > >> > >> Why time is divided 1000 in this application? why didn't be hold > without division? > >> > >> Can you explain this strategy? > >> > > > > > > Michael Segel | (m) 312.755.9623 > > Segel and Associates > > >
