Re: Indexes

Justin Workman Mon, 03 Feb 2014 09:15:06 -0800

Thanks. Is there an ETA on the 3.0 release?


On Mon, Feb 3, 2014 at 9:52 AM, James Taylor <[email protected]> wrote:

> There will be an upgrade step required to go from 2.x to 3.0, as the
> system table has changed (and probably will a bit more still before we
> release).
>
> For now, you can do the following if you want to test out 3.0.0-snapshot:
> - Remove com.salesforce.* coprocessors on existing tables. If you haven't
> added any of your own, probably easiest to just remove all coprocessors.
> - Re-issue your DDL commands. If you have existing data against that
> table, it'd be best to open a connection at a timestamp earlier than any of
> your data using the CURRENT_SCN connection property. If you don't care
> about doing point-in-time queries at an earlier timestamp (or flash-back
> queries), than you don't need to worry about this, and you can just
> re-issue the DDL.
>
> Thanks,
> James
>
>
> On Mon, Feb 3, 2014 at 8:40 AM, Justin Workman 
> <[email protected]>wrote:
>
>> We updated to the 3.0.0-SNAPSHOT in an effort to also test the Flume
>> component, and we are not able to query any of our existing tables now
>> through sqlline or a java jdbc connection. However the Flume component
>> works fine writing to a new table. Here is the error we are getting when
>> doing a select count(1) from keywords;
>>
>> Error: org.apache.hadoop.hbase.DoNotRetryIOException: keywords: at index 4
>> at
>> com.salesforce.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:83)
>>  at
>> com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:1034)
>> at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
>>  at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>>  at org.apache.hadoop.hbase.regionserver.HRegion.exec(HRegion.java:5482)
>> at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.execCoprocessor(HRegionServer.java:3720)
>>  at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>  at java.lang.reflect.Method.invoke(Method.java:606)
>> at
>> org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:308)
>>  at
>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
>> Caused by: java.lang.NullPointerException: at index 4
>> at
>> com.google.common.collect.ImmutableList.checkElementNotNull(ImmutableList.java:305)
>>  at
>> com.google.common.collect.ImmutableList.construct(ImmutableList.java:296)
>> at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:272)
>>  at com.salesforce.phoenix.schema.PTableImpl.init(PTableImpl.java:290)
>> at com.salesforce.phoenix.schema.PTableImpl.<init>(PTableImpl.java:219)
>>  at
>> com.salesforce.phoenix.schema.PTableImpl.makePTable(PTableImpl.java:212)
>> at
>> com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:436)
>>  at
>> com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.buildTable(MetaDataEndpointImpl.java:254)
>> at
>> com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:1082)
>>  at
>> com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.addIndexToTable(MetaDataEndpointImpl.java:279)
>> at
>> com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:430)
>>  at
>> com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.buildTable(MetaDataEndpointImpl.java:254)
>> at
>> com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:1082)
>>  at
>> com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:1028)
>> ... 10 more (state=08000,code=101)
>>
>>
>> On Thu, Jan 30, 2014 at 4:01 PM, Justin Workman <[email protected]
>> > wrote:
>>
>>> I will test with the latest master build. When this table goes live we
>>> will shorten the cf name, that was a mistake. Thanks for all the info. I do
>>> think going forward we will be creating these tables via Phoenix. We are
>>> still testing the flume sink and pig handlers before completely committing.
>>>
>>> I'll update the list once I've had a chance to test with the latest
>>> build and file a Jira if the problem persists.
>>>
>>> Thanks!
>>> Justin
>>>
>>> Sent from my iPhone
>>>
>>> On Jan 30, 2014, at 1:25 PM, James Taylor <[email protected]>
>>> wrote:
>>>
>>> Thanks for all the detail, Justin. Based on this, it looks like a bug
>>> related to using case sensitive column names. Maryann checked in a fix
>>> related to this, so it might be fixed in the latest on master.
>>>
>>> If it's not fixed, would you mind filing a JIRA?
>>>
>>> FWIW, you may want to consider a shorter column family name, like "k" or
>>> "kw" as that'll make your table smaller. Also, did you know you can provide
>>> your HBase table and column family config parameters in your CREATE TABLE
>>> statement and it'll create the HBase table and the column families, like
>>> below?
>>>
>>> CREATE TABLE SEO.KEYWORDIDEAS (
>>>     "pk" VARCHAR PRIMARY KEY,
>>>     "keyword"."jobId" VARCHAR,
>>>     "keyword"."jobName" VARCHAR,
>>>     "keyword"."jobType" VARCHAR,
>>>     "keyword"."keywordText" VARCHAR,
>>>     "keyword"."parentKeywordText" VARCHAR,
>>>     "keyword"."refinementName" VARCHAR,
>>>     "keyword"."refinementValue" VARCHAR,
>>>     "keyword"."relatedKeywordRank" VARCHAR
>>>     ) IMMUTABLE_ROWS=true, COMPRESSION='SNAPPY' ;
>>>
>>>
>>>
>>>
>>> On Thu, Jan 30, 2014 at 8:50 AM, Justin Workman <
>>> [email protected]> wrote:
>>>
>>>> I don't think that is the issue we are hitting. Details are below. The
>>>> Hbase table does have more columns than we are defining the phoenix table.
>>>> We were hoping to just be able to use the dynamic column features for
>>>> if/when we need to access data in other columns in the underlying table. As
>>>> you can see from the output of the explain statement below, a simple query
>>>> does not use the index.
>>>>
>>>> However if I create another identical table using Phoenix and upsert
>>>> into that new table from the table below, create the same index on that
>>>> table and then run the same select query, it does use the index on that
>>>> table.
>>>>
>>>> So I am still very confused as to why the index is not invoked when the
>>>> table is created on top of an existing Hbase table.
>>>>
>>>> Hbase Create Table
>>>> > create 'SEO.KEYWORDIDEAS', { NAME=>'keyword', COMPRESSION=>'SNAPPY' }
>>>>
>>>> Phoenix Create Table
>>>> CREATE TABLE SEO.KEYWORDIDEAS (
>>>>     "pk" VARCHAR PRIMARY KEY,
>>>>     "keyword"."jobId" VARCHAR,
>>>>     "keyword"."jobName" VARCHAR,
>>>>     "keyword"."jobType" VARCHAR,
>>>>     "keyword"."keywordText" VARCHAR,
>>>>     "keyword"."parentKeywordText" VARCHAR,
>>>>     "keyword"."refinementName" VARCHAR,
>>>>     "keyword"."refinementValue" VARCHAR,
>>>>     "keyword"."relatedKeywordRank" VARCHAR
>>>>     ) IMMUTABLE_ROWS=true;
>>>>
>>>> Create Index
>>>> CREATE INDEX KWDIDX ON SEO.KEYWORDIDEAS ("parentKeywordText");
>>>>
>>>> Show and count indexes
>>>>
>>>> +-----------+-------------+------------+------------+-----------------+------------+------+------------------+-------------+-------------+-------------+--------+------------------+-----------+-----------+
>>>> | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | NON_UNIQUE | INDEX_QUALIFIER |
>>>> INDEX_NAME | TYPE | ORDINAL_POSITION | COLUMN_NAME | ASC_OR_DESC |
>>>> CARDINALITY | PAGES  | FILTER_CONDITION | DATA_TYPE | TYPE_NAME |
>>>>
>>>> +-----------+-------------+------------+------------+-----------------+------------+------+------------------+-------------+-------------+-------------+--------+------------------+-----------+-----------+
>>>> | null      | SEO         | KEYWORDIDEAS | true       | null
>>>>  | KWDIDX     | 3    | 1                | keyword:parentKeywordText | A
>>>>       | null        | null   | null             | |
>>>> | null      | SEO         | KEYWORDIDEAS | true       | null
>>>>  | KWDIDX     | 3    | 2                | :pk         | A           | null
>>>>        | null   | null             | 12        | V |
>>>> | null      | SEO         | KEYWORDIDEAS | true       | null
>>>>  | RA_TEST_ID | 3    | 1                | keyword:jobId | A           |
>>>> null        | null   | null             | 12        | |
>>>> | null      | SEO         | KEYWORDIDEAS | true       | null
>>>>  | RA_TEST_ID | 3    | 2                | :pk         | A           | null
>>>>        | null   | null             | 12        | V |
>>>>
>>>> +-----------+-------------+------------+------------+-----------------+------------+------+------------------+-------------+-------------+-------------+--------+------------------+-----------+-----------+
>>>>
>>>> > select count(1) from seo.keywordideas;
>>>> +----------+
>>>> | COUNT(1) |
>>>> +----------+
>>>> | 423229   |
>>>> +----------+
>>>> > select count(1) from seo.kwdidx;
>>>> +----------+
>>>> | COUNT(1) |
>>>> +----------+
>>>> | 423229   |
>>>> +----------+
>>>>
>>>> > explain select count(1) from seo.keywords where "parentKeywordText" =
>>>> 'table';
>>>> +------------+
>>>> |    PLAN    |
>>>> +------------+
>>>> | CLIENT PARALLEL 18-WAY FULL SCAN OVER SEO.KEYWORDIDEAS |
>>>> |     SERVER FILTER BY keyword.parentKeywordText = 'sheets' |
>>>> |     SERVER AGGREGATE INTO SINGLE ROW |
>>>> +------------+
>>>>
>>>> Now here is where I can get the index to be invoked.
>>>> > CREATE TABLE SEO.NEW_KEYWORDIDEAS (
>>>>     PK VARCHAR PRIMARY KEY,
>>>>     JOB_ID VARCHAR
>>>>     JOB_NAME VARCHAR,
>>>>     JOB_TYPE VARCHAR,
>>>>     KEYWORD_TEXT VARCHAR,
>>>>     PARENT_KEYWORD_TEXT VARCHAR,
>>>>     REFINEMENT_NAME VARCHAR,
>>>>     REFINEMENT_VALUE VARCHAR,
>>>>     RELATED_KEYWORD_RANK VARCHAR
>>>>     ) IMMUTABLE_ROWS=true;
>>>>
>>>> > UPSERT INTO SEO.NEW_KEYWORDIEAS SELECT * FROM SEO.KEYWORDIDEAS;
>>>>
>>>> > CREATE INDEX NEW_KWD_IDX ON SEO.NEW_KEYWORDIDEAS
>>>> (PARENT_KEYWORD_TEXT);
>>>>
>>>> > explain select count(1) from seo.new_keywordideas where
>>>> parent_keyword_text = 'table';
>>>>
>>>> +------------+
>>>>
>>>> |    PLAN    |
>>>>
>>>> +------------+
>>>>
>>>> | CLIENT PARALLEL 1-WAY RANGE SCAN OVER SEO.NEW_KWD_IDX ['table'] |
>>>>
>>>> |     SERVER AGGREGATE INTO SINGLE ROW |
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Jan 29, 2014 at 5:21 PM, James Taylor 
>>>> <[email protected]>wrote:
>>>>
>>>>> Hi Justin,
>>>>> Please take a look at this FAQ:
>>>>> http://phoenix.incubator.apache.org/faq.html#/Why_isnnullt_my_secondary_index_being_used
>>>>>
>>>>> If that's not the case for you, can you include your CREATE TABLE,
>>>>> CREATE INDEX, SELECT statement, and EXPLAIN plan?
>>>>>
>>>>> Thanks,
>>>>> James
>>>>>
>>>>>
>>>>> On Wed, Jan 29, 2014 at 4:13 PM, Justin Workman <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> I am seeing some odd behavior with indexes and want some
>>>>>> clarification on how they are used.
>>>>>>
>>>>>> When I create an table in phoenix on top of an existing Hbase table,
>>>>>> and then create an index on this table, I can see the index get built and
>>>>>> populated properly, however no queries show that they are using this 
>>>>>> index
>>>>>> when I run an explain on the query.
>>>>>>
>>>>>> However, if I create an seperete table in Phoenix and do an upsert
>>>>>> from my hbase table into the new table that I created, and create the 
>>>>>> same
>>>>>> index as on the previous table. Then my queries show that they would use
>>>>>> the index when running them through the explain plan.
>>>>>>
>>>>>> Are we not able to create or use an index on a table we create over
>>>>>> an exiting Hbase table, or am I doing something wrong?
>>>>>>
>>>>>> Thanks in advance for any help.
>>>>>> Justin
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Indexes

Reply via email to