Re: Tuning MutationState size

2017-11-13 Thread Marcin Januszkiewicz
We're using version 4.12 (on cdh 5.12). I issued the CREATE INDEX statement
with phoenix-sqlline with all default parameters.

On Fri, Nov 10, 2017 at 2:21 AM, Sergey Soldatov <sergeysolda...@gmail.com>
wrote:

> Could you provide the version you are using? Do you have autocommit turned
> on and have you changed the following properties:
> phoenix.mutate.batchSize
> phoenix.mutate.maxSize
> phoenix.mutate.maxSizeBytes
>
> Thanks,
> Sergey
>
> If you are using more recent version, than you may consider to
> On Thu, Nov 9, 2017 at 5:41 AM, Marcin Januszkiewicz <
> januszkiewicz.mar...@gmail.com> wrote:
>
>> I was trying to create a global index table but it failed out with:
>>
>> Error: ERROR 730 (LIM02): MutationState size is bigger than maximum
>> allowed number of bytes (state=LIM02,code=730)
>> java.sql.SQLException: ERROR 730 (LIM02): MutationState size is bigger
>> than maximum allowed number of bytes
>> at org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newE
>> xception(SQLExceptionCode.java:489)
>> at org.apache.phoenix.exception.SQLExceptionInfo.buildException
>> (SQLExceptionInfo.java:150)
>> at org.apache.phoenix.execute.MutationState.throwIfTooBig(Mutat
>> ionState.java:359)
>> at org.apache.phoenix.execute.MutationState.join(MutationState.
>> java:447)
>> at org.apache.phoenix.compile.MutatingParallelIteratorFactory$
>> 1.close(MutatingParallelIteratorFactory.java:98)
>> at org.apache.phoenix.iterate.RoundRobinResultIterator$RoundRob
>> inIterator.close(RoundRobinResultIterator.java:298)
>> at org.apache.phoenix.iterate.RoundRobinResultIterator.next(Rou
>> ndRobinResultIterator.java:105)
>> at org.apache.phoenix.compile.UpsertCompiler$2.execute(UpsertCo
>> mpiler.java:821)
>> at org.apache.phoenix.compile.DelegateMutationPlan.execute(Dele
>> gateMutationPlan.java:31)
>> at org.apache.phoenix.compile.PostIndexDDLCompiler$1.execute(
>> PostIndexDDLCompiler.java:117)
>> at org.apache.phoenix.query.ConnectionQueryServicesImpl.updateD
>> ata(ConnectionQueryServicesImpl.java:3360)
>> at org.apache.phoenix.schema.MetaDataClient.buildIndex(MetaData
>> Client.java:1283)
>> at org.apache.phoenix.schema.MetaDataClient.createIndex(MetaDat
>> aClient.java:1595)
>> at org.apache.phoenix.compile.CreateIndexCompiler$1.execute(Cre
>> ateIndexCompiler.java:85)
>> at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixState
>> ment.java:394)
>> at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixState
>> ment.java:377)
>> at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>> at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(Pho
>> enixStatement.java:376)
>> at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(Pho
>> enixStatement.java:364)
>> at org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStat
>> ement.java:1738)
>> at sqlline.Commands.execute(Commands.java:822)
>> at sqlline.Commands.sql(Commands.java:732)
>>     at sqlline.SqlLine.dispatch(SqlLine.java:813)
>> at sqlline.SqlLine.begin(SqlLine.java:686)
>> at sqlline.SqlLine.start(SqlLine.java:398)
>> at sqlline.SqlLine.main(SqlLine.java:291)
>>
>> Is there a way to predict what max size will be sufficient, or which
>> other knobs to turn?
>>
>>
>> --
>> Pozdrawiam,
>> Marcin Januszkiewicz
>>
>
>


Tuning MutationState size

2017-11-09 Thread Marcin Januszkiewicz
I was trying to create a global index table but it failed out with:

Error: ERROR 730 (LIM02): MutationState size is bigger than maximum allowed
number of bytes (state=LIM02,code=730)
java.sql.SQLException: ERROR 730 (LIM02): MutationState size is bigger than
maximum allowed number of bytes
at
org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:489)
at
org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
at
org.apache.phoenix.execute.MutationState.throwIfTooBig(MutationState.java:359)
at
org.apache.phoenix.execute.MutationState.join(MutationState.java:447)
at
org.apache.phoenix.compile.MutatingParallelIteratorFactory$1.close(MutatingParallelIteratorFactory.java:98)
at
org.apache.phoenix.iterate.RoundRobinResultIterator$RoundRobinIterator.close(RoundRobinResultIterator.java:298)
at
org.apache.phoenix.iterate.RoundRobinResultIterator.next(RoundRobinResultIterator.java:105)
at
org.apache.phoenix.compile.UpsertCompiler$2.execute(UpsertCompiler.java:821)
at
org.apache.phoenix.compile.DelegateMutationPlan.execute(DelegateMutationPlan.java:31)
at
org.apache.phoenix.compile.PostIndexDDLCompiler$1.execute(PostIndexDDLCompiler.java:117)
at
org.apache.phoenix.query.ConnectionQueryServicesImpl.updateData(ConnectionQueryServicesImpl.java:3360)
at
org.apache.phoenix.schema.MetaDataClient.buildIndex(MetaDataClient.java:1283)
at
org.apache.phoenix.schema.MetaDataClient.createIndex(MetaDataClient.java:1595)
at
org.apache.phoenix.compile.CreateIndexCompiler$1.execute(CreateIndexCompiler.java:85)
at
org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:394)
at
org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:377)
at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
at
org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:376)
at
org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:364)
at
org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1738)
at sqlline.Commands.execute(Commands.java:822)
at sqlline.Commands.sql(Commands.java:732)
at sqlline.SqlLine.dispatch(SqlLine.java:813)
at sqlline.SqlLine.begin(SqlLine.java:686)
at sqlline.SqlLine.start(SqlLine.java:398)
at sqlline.SqlLine.main(SqlLine.java:291)

Is there a way to predict what max size will be sufficient, or which other
knobs to turn?


-- 
Pozdrawiam,
Marcin Januszkiewicz


Re: SELECT + ORDER BY vs self-join

2017-10-31 Thread Marcin Januszkiewicz
Sorry, I forgot to include the DDL for the index:
CREATE LOCAL INDEX ix_0 ON traces (UPPER(number)) INCLUDE (time, +
some other columns used for filtering).
I'm going to file a JIRA with the details, but local indexes are being
used in both cases.
My suspicion is that the first query will attempt to load all 30M rows
of data from HBase and then sort it, and the second will only sort 30M
(rowkey, time) pairs.

On Tue, Oct 31, 2017 at 8:02 AM, Sergey Soldatov
<sergeysolda...@gmail.com> wrote:
> I agree with James that this happens because the index was not involved
> because it doesn't cover all columns. I believe that in the second case, the
> RHT  is using the index to create a list of rowkeys and they are used for
> point lookups by skipscan.
>
> bq. When is using the self-join a worse choice than the simple select?
>
> Hash join has it's own limitations:
> 1.  RHT is supposed to be small, so it's better to keep LIMIT small (far
> less than 30 mil).
> 2.  Client is always involved to collect data from RHT, build the hash join
> cache and send it to all RSes.
>
>
> bq. Is there a better way to construct this query?
>
> Using local index may help in this case.
>
> Thanks,
> Sergey
>
>
> On Mon, Oct 30, 2017 at 11:26 PM, James Taylor <jamestay...@apache.org>
> wrote:
>>
>> Please file a JIRA and include the explain plan for each of the queries. I
>> suspect your index is not being used in the first query due to the selection
>> of all the columns. You can try hinting the query to force your index to be
>> used. See https://phoenix.apache.org/secondary_indexing.html#Index_Usage
>>
>> Thanks,
>> James
>>
>> On Mon, Oct 30, 2017 at 7:02 AM, Marcin Januszkiewicz
>> <januszkiewicz.mar...@gmail.com> wrote:
>>>
>>> We have a wide table with 100M records created with the following DDL:
>>>
>>> CREATE TABLE traces (
>>>   rowkey VARCHAR PRIMARY KEY,
>>>   time VARCHAR,
>>>   number VARCHAR,
>>>   +40 more columns)
>>>
>>> We want to select a large (~30M records) subset of this data with the
>>> query:
>>>
>>> SELECT *all columns*
>>>   FROM traces
>>>   WHERE (UPPER(number) LIKE 'PO %')
>>>   ORDER BY time DESC, ROWKEY
>>>   LIMIT 101;
>>>
>>> This times out after 15 minutes and puts a huge load on our cluster.
>>> We have an alternate way of selecting this data:
>>>
>>> SELECT t.rowkey, *all columns*
>>> FROM TRACES t
>>> JOIN (
>>>   SELECT rowkey
>>>   FROM TRACES
>>>   WHERE (UPPER(number) LIKE 'PO %')
>>>   ORDER BY time DESC, ROWKEY
>>>   LIMIT 101
>>> ) ix
>>> ON t.ROWKEY = ix.ROWKEY
>>> order by t.ROWKEY;
>>>
>>> Which completes in just under a minute.
>>> Is there a better way to construct this query?
>>> When is using the self-join a worse choice than the simple select?
>>> Given that we have a functional index on UPPER(number), could this
>>> potentially be a statistics-based optimizer decision?
>>>
>>> --
>>> Pozdrawiam,
>>> Marcin Januszkiewicz
>>
>>
>


Re: Indexes not used when ordering by primary key.

2017-10-30 Thread Marcin Januszkiewicz
On a possibly related note, how does Phoenix choose which index to use
if there multiple indexes are applicable? Right now it seems that the
one with the lower sequence number is used. If this is the case,
similar optimizations could be made to choose the index that will scan
over a smaller dataset.

On Sat, Oct 14, 2017 at 8:26 AM, James Taylor <jamestay...@apache.org> wrote:
> Couple of follow up comments:
> - if you use c1=‘X0’ the index should be used without a hint,  because it’s
> still ordered by the PK when using index.
> - this wouldn’t necessarily be the case for c1 LIKE 'X0%'.
>
>
> On Fri, Oct 13, 2017 at 8:33 PM James Taylor <jamestay...@apache.org> wrote:
>>
>> Yes, this is expected behavior. Phoenix can either optimize based on the
>> filter in the WHERE clause or the ORDER BY. Since it's not cost based,
>> Phoenix always chooses to optimize out the ORDER BY (because in general it's
>> more expensive and it doesn't know how much data will be filtered out by the
>> WHERE clause). By using the data table, we know that rows are already
>> returned in PK order, so there's no reordering required. The hint is
>> available to override this decision.
>>
>> It wouldn't be difficult to introduce some simple cost-based decisions if
>> statistics collection is enabled. In that case, we can get an estimate at
>> compile-time on how much data would be scanned when the index is used. If
>> the amount is low enough, the optimizer could choose to use the index and
>> reorder the results.
>>
>> Please file a JIRA and we can discuss further.
>>
>> Thanks,
>> James
>>
>> On Fri, Oct 13, 2017 at 6:47 AM, Marcin Januszkiewicz
>> <katamaran...@gmail.com> wrote:
>>>
>>> Small correction the index is local:
>>> CREATE LOCAL INDEX t_c1_ix ON t (c1);
>>>
>>> On Fri, Oct 13, 2017 at 3:43 PM, Marcin Januszkiewicz
>>> <katamaran...@gmail.com> wrote:
>>> > Hi,
>>> >
>>> > we have some data in a phoenix table that we always want to fetch in
>>> > the order determined by the primary key:
>>> >
>>> > CREATE TABLE t (
>>> >   rowkey VARCHAR PRIMARY KEY,
>>> >   c1 VARCHAR,
>>> >   c2 VARCHAR,
>>> > )
>>> >
>>> > SELECT rowkey, c1, c2 FROM t where c1 LIKE 'X0%' ORDER BY rowkey;
>>> >
>>> > We wanted to speed up searches using an index on c1:
>>> >
>>> > CREATE INDEX t_c1_ix ON t (c1);
>>> >
>>> > However, this index is only used if we drop the ORDER BY clause or use
>>> > a query hint. If we sort by any other field, such as c2, the index
>>> > will be used.
>>> >
>>> > Is this expected behavior? Is there any way of influencing phoenix to
>>> > use the indexes without using an index hint? The actual table has more
>>> > columns & indexes, and queries are creating programatically. Adding
>>> > code which would decide which hint to generate would be a little
>>> > problematic.
>>
>>
>


Querying table with index fails in some configurations.

2017-10-30 Thread Marcin Januszkiewicz
We have a phoenix table created with:

CREATE TABLE traces (
rowkey VARCHAR PRIMARY KEY,
time VARCHAR,
number VARCHAR,
security VARCHAR);

CREATE LOCAL INDEX t0
ON traces (UPPER(number))
INCLUDE (security,  time);

UPSERT INTO traces values (
  'BXRFW10BK-$d2affbaf',
  '2017-09-26T07:43:01.152Z',
  'PO X',
  'test');

The query:
SELECT rowkey
FROM traces
WHERE (UPPER(number) LIKE 'PO %')
  AND (security = 'test')
ORDER BY time;

causes an exception:

Error: org.apache.phoenix.exception.PhoenixIOException:
org.apache.hadoop.hbase.DoNotRetryIOException:
TRACES,,1509360217824.b44920f7875ef688c3224ce1681d1035.: null
at 
org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:96)
at 
org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:62)
at 
org.apache.phoenix.iterate.RegionScannerFactory$1.nextRaw(RegionScannerFactory.java:212)
at 
org.apache.phoenix.iterate.RegionScannerResultIterator.next(RegionScannerResultIterator.java:61)
at 
org.apache.phoenix.iterate.OrderedResultIterator.getResultIterator(OrderedResultIterator.java:255)
at 
org.apache.phoenix.iterate.OrderedResultIterator.next(OrderedResultIterator.java:199)
at 
org.apache.phoenix.iterate.NonAggregateRegionScannerFactory.getTopNScanner(NonAggregateRegionScannerFactory.java:320)
at 
org.apache.phoenix.iterate.NonAggregateRegionScannerFactory.getRegionScanner(NonAggregateRegionScannerFactory.java:162)
at 
org.apache.phoenix.coprocessor.ScanRegionObserver.doPostScannerOpen(ScanRegionObserver.java:72)
at 
org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.overrideDelegate(BaseScannerRegionObserver.java:236)
at 
org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.nextRaw(BaseScannerRegionObserver.java:284)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2541)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33648)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2183)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:183)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:163)
Caused by: java.lang.NullPointerException (state=08000,code=101)

However, the following queries succeed:

SELECT rowkey, security
FROM traces
WHERE (UPPER(number) LIKE 'PO %')
  AND (security = 'test')
ORDER BY time;

SELECT rowkey
FROM traces
WHERE (UPPER(number) LIKE 'PO %')
ORDER BY time;

SELECT rowkey
FROM traces
WHERE (UPPER(number) LIKE 'PO %')
  AND (security = 'test');


We're using Phoenix 4.12 adapted for CDH-5.12, so we would appreciate
if someone verified if this is actually an upstream bug before we file
JIRA ticket.

-- 
Pozdrawiam,
Marcin Januszkiewicz


Re: Indexes not used when ordering by primary key.

2017-10-13 Thread Marcin Januszkiewicz
Small correction the index is local:
CREATE LOCAL INDEX t_c1_ix ON t (c1);

On Fri, Oct 13, 2017 at 3:43 PM, Marcin Januszkiewicz
<katamaran...@gmail.com> wrote:
> Hi,
>
> we have some data in a phoenix table that we always want to fetch in
> the order determined by the primary key:
>
> CREATE TABLE t (
>   rowkey VARCHAR PRIMARY KEY,
>   c1 VARCHAR,
>   c2 VARCHAR,
> )
>
> SELECT rowkey, c1, c2 FROM t where c1 LIKE 'X0%' ORDER BY rowkey;
>
> We wanted to speed up searches using an index on c1:
>
> CREATE INDEX t_c1_ix ON t (c1);
>
> However, this index is only used if we drop the ORDER BY clause or use
> a query hint. If we sort by any other field, such as c2, the index
> will be used.
>
> Is this expected behavior? Is there any way of influencing phoenix to
> use the indexes without using an index hint? The actual table has more
> columns & indexes, and queries are creating programatically. Adding
> code which would decide which hint to generate would be a little
> problematic.


Indexes not used when ordering by primary key.

2017-10-13 Thread Marcin Januszkiewicz
Hi,

we have some data in a phoenix table that we always want to fetch in
the order determined by the primary key:

CREATE TABLE t (
  rowkey VARCHAR PRIMARY KEY,
  c1 VARCHAR,
  c2 VARCHAR,
)

SELECT rowkey, c1, c2 FROM t where c1 LIKE 'X0%' ORDER BY rowkey;

We wanted to speed up searches using an index on c1:

CREATE INDEX t_c1_ix ON t (c1);

However, this index is only used if we drop the ORDER BY clause or use
a query hint. If we sort by any other field, such as c2, the index
will be used.

Is this expected behavior? Is there any way of influencing phoenix to
use the indexes without using an index hint? The actual table has more
columns & indexes, and queries are creating programatically. Adding
code which would decide which hint to generate would be a little
problematic.


ROW_TIMESTAMP doesn't work with TIMESTAMP datatype.

2017-05-09 Thread Marcin Januszkiewicz
I'm encountering the following issue with Phoenix:

create table rtt (ts TIMESTAMP PRIMARY KEY ROW_TIMESTAMP);
select * from rtt where ts < now();
java.lang.NullPointerException
at 
org.apache.phoenix.compile.ScanRanges.getAscTimeRange(ScanRanges.java:680)
at 
org.apache.phoenix.compile.ScanRanges.getRowTimestampColumnRange(ScanRanges.java:658)
at org.apache.phoenix.compile.ScanRanges.create(ScanRanges.java:84)
at 
org.apache.phoenix.compile.WhereOptimizer.pushKeyExpressionsToScan(WhereOptimizer.java:305)
at 
org.apache.phoenix.compile.WhereCompiler.compile(WhereCompiler.java:149)
at 
org.apache.phoenix.compile.WhereCompiler.compile(WhereCompiler.java:100)
at 
org.apache.phoenix.compile.QueryCompiler.compileSingleFlatQuery(QueryCompiler.java:556)
at 
org.apache.phoenix.compile.QueryCompiler.compileSingleQuery(QueryCompiler.java:507)
at 
org.apache.phoenix.compile.QueryCompiler.compileSelect(QueryCompiler.java:202)
at 
org.apache.phoenix.compile.QueryCompiler.compile(QueryCompiler.java:157)
at 
org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(PhoenixStatement.java:420)
at 
org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(PhoenixStatement.java:394)
at 
org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:280)
at 
org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:270)
at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
at 
org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:269)
at 
org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1515)
at sqlline.Commands.execute(Commands.java:822)
at sqlline.Commands.sql(Commands.java:732)
at sqlline.SqlLine.dispatch(SqlLine.java:813)
at sqlline.SqlLine.begin(SqlLine.java:686)
at sqlline.SqlLine.start(SqlLine.java:398)
at sqlline.SqlLine.main(SqlLine.java:291)

There is no problem when I use the TIME, DATE, and UNSIGNED_LONG datatypes.
I am using phoenix-4.9 from the phoenix-for-cloudera branch, so I'm
not sure if this is a bug in the main branch or just in mine.
Could anyone check if you also encounter this behavior?


Limiting resource use during maintenance operations.

2017-03-10 Thread Marcin Januszkiewicz
Hi,

I have a 5-node HBase cluster with around 12GB of RAM available to
RegionServers. I've been evaluating Phoenix with a ~20M row HBase
table with 16 regions, and have had some problems. Deleting an index
from a view will quickly cause an out-of-memory error on the
RegionServers. Creating the index works reasonably fine. When I try to
create a table from the hbase table instead of a view, I have the same
problem - the RegionServers eventually crash from the load.

Is there a way to control the resources that will be available to
Phoenix to execute these actions? Is there a recommended minimum
hardware setup? Will there be tools available similar to the async
table indexer?

Regards,
Marcin


Re: Phoenix ignoring index and index hint with some queries over mapped hbase tables.

2017-03-06 Thread Marcin Januszkiewicz
On Fri, Mar 3, 2017 at 7:52 PM, James Taylor <jamestay...@apache.org> wrote:
> Hi Marcin,
> There's a few things going on here:
> 1) An index created on a view over HBase data won't be maintained by
> Phoenix. You'd need to maintain it yourself through some other external
> means. If you create a table that maps to your HBase data, then it will be
> maintained.

UPSERTING data into the view will also work, right?

> 2) An index would only be used if you match against a constant on the
> right-hand side (while you're matching against the "number" column). For
> example, the following query would use the index and limit the scan to only
> rows in which "number" starts with 'Queen':
>
> 0: jdbc:phoenix:> explain select * from "traces" where
> regexp_substr("number", 'Q.*') = 'Queen';
> +--+
> |PLAN
> |
> +--+
> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER traces
> [1,'Queen'] |
> | SERVER FILTER BY FIRST KEY ONLY AND REGEXP_SUBSTR("number", 'Q.*', 1)
> =  |
> +--+
>
> Note that with local indexes, interpreting when the index is used is a bit
> subtle, but you'll see a [1, ...] after the RANGE SCAN as an indication. The
> index may be used if it's a full table scan (since the data contained in the
> index table may be smaller than that in the data table), but that won't
> buying you very much.
> 3) The index would only be used if your REGEXP_SUBSTR has a constant string
> before any wildcard matches in the second argument. You could also
> potentially use a function index [1], but it'd only use the index if the
> REGEXP_SUBSTR makes the same exact call (i.e. same pattern argument) as was
> used when the functional index was created.
>
> HTH.  Thanks,
>
> James
>
> [1] https://phoenix.apache.org/secondary_indexing.html#Functional_Indexes


So, if I understand correctly, my best bet would be to use some
heuristics to hopefully extract a coarse prefix range from the regex?
That way I can do a preliminary range scan on the index and refine the
filtering on the returned data. I assume I could implement this as a
UDF with an appropriate newKeyPart method?



>
> On Fri, Mar 3, 2017 at 12:59 AM, Marcin Januszkiewicz
> <katamaran...@gmail.com> wrote:
>>
>> Hi,
>>
>> I have a table in hbase and created a view of it in phoenix, along
>> with a local index:
>>
>> create view "traces" (rowkey VARCHAR PRIMARY KEY, "cf"."time" VARCHAR,
>> "cf"."number" VARCHAR, "cf"."class" VARCHAR);
>> create local index idxf on "dmstab_vehicle_traces" ("cf"."number",
>> "cf"."class", rowkey);
>>
>> I need to filter rows based on a regex condition on the "number"
>> column, so I use queries like these:
>>
>> explain select * from "traces" where regexp_substr("number", 'Q.*') =
>> "number";
>>
>> CLIENT 2-CHUNK PARALLEL 2-WAY ROUND ROBIN RANGE SCAN OVER traces [2]
>> SERVER FILTER BY FIRST KEY ONLY AND REGEXP_SUBSTR("number", 'Q.*',
>> 1) = "number"
>>
>>
>> It's a little ugly and less efficient than using LIKE, but the
>> performance is still relatively acceptable thanks to the index.
>>
>> However, if I want to range of rowkeys to include, Phoenix stops using
>> the index, which slows down the query significantly:
>>
>> explain select * from "traces" where regexp_substr("number", 'Q.*') =
>> "number" and rowkey < 'BY';
>>
>> CLIENT 4-CHUNK 687413 ROWS 629145681 BYTES PARALLEL 1-WAY ROUND ROBIN
>> RANGE SCAN OVER traces [*] - ['BY']
>> SERVER FILTER BY REGEXP_SUBSTR(cf."number", 'Q.*', 1) = cf."number"
>>
>> Using an index hint doesn't change anything. Is there a way to make
>> this work, and is this a bug?
>
>


Phoenix ignoring index and index hint with some queries over mapped hbase tables.

2017-03-03 Thread Marcin Januszkiewicz
Hi,

I have a table in hbase and created a view of it in phoenix, along
with a local index:

create view "traces" (rowkey VARCHAR PRIMARY KEY, "cf"."time" VARCHAR,
"cf"."number" VARCHAR, "cf"."class" VARCHAR);
create local index idxf on "dmstab_vehicle_traces" ("cf"."number",
"cf"."class", rowkey);

I need to filter rows based on a regex condition on the "number"
column, so I use queries like these:

explain select * from "traces" where regexp_substr("number", 'Q.*') = "number";

CLIENT 2-CHUNK PARALLEL 2-WAY ROUND ROBIN RANGE SCAN OVER traces [2]
SERVER FILTER BY FIRST KEY ONLY AND REGEXP_SUBSTR("number", 'Q.*',
1) = "number"


It's a little ugly and less efficient than using LIKE, but the
performance is still relatively acceptable thanks to the index.

However, if I want to range of rowkeys to include, Phoenix stops using
the index, which slows down the query significantly:

explain select * from "traces" where regexp_substr("number", 'Q.*') =
"number" and rowkey < 'BY';

CLIENT 4-CHUNK 687413 ROWS 629145681 BYTES PARALLEL 1-WAY ROUND ROBIN
RANGE SCAN OVER traces [*] - ['BY']
SERVER FILTER BY REGEXP_SUBSTR(cf."number", 'Q.*', 1) = cf."number"

Using an index hint doesn't change anything. Is there a way to make
this work, and is this a bug?


Re:

2017-02-16 Thread Marcin Januszkiewicz
Are you maybe using a HBase distribution from a vendor? I had similar errors 
when trying to use Phoenix 4.9 on the Cloudera build of HBase.

On 2017-02-16 10:46 (+0100), "Nimrod Oren" 
wrote: 
> I didn't mean to sound rude.
> 
> I created another table using phoenix-sqlline according to the documentation:
> 
> CREATE TABLE IF NOT EXISTS TEST_SAVE_NEW (
>   IMEI VARCHAR NOT NULL PRIMARY KEY,
>   END_LATITUDE VARCHAR);
> 
> Still getting the same issue.
>