Re: Query optimization

2019-06-19 Thread Alexander Batyrshin
Is it possible not to full scan table1 for ’table1.col = ?’, but do this check 
only on subset table1.pk IN (…)?

> On 19 Jun 2019, at 23:31, Vincent Poon  wrote:
> 
> 'table1.col = ?' will be a full table scan of table1 unless you have a 
> secondary index on table.col
> Check the explain plan to see if it's working as expected
> 
> On Wed, Jun 19, 2019 at 7:43 AM Alexander Batyrshin <0x62...@gmail.com 
> > wrote:
>  Hello,
> We have 2 tables:
> 
> Table1 - big one (2000M+ rows):
> 
> CREATE TABLE table1 (
> pk varchar PRIMARY KEY,
> col varchar
> );
> 
> Table2 - small one (300K rows):
> 
> CREATE TABLE table2 (
> pk varchar PRIMARY KEY,
> other varchar
> );
> 
> Query like this work fast (~ 30sec):
> SELECT table1.pk ,  table1.col
> FROM table1
> WHERE table1.pk  IN ( SELECT table2.pk  
> FROM table2 )
> 
> But query like this work quite slow (>10min):
> SELECT table1.pk 
> FROM table1
> WHERE table1.col = ? AND table1.pk  IN ( SELECT table2.pk 
>  FROM table2 )
> 
> Also query below work slow:
> SELECT *
> FROM (
> SELECT table1.pk ,  table1.col
> FROM table1
> WHERE table1.pk  IN ( SELECT table2.pk 
>  FROM table2 )
> ) AS s
> WHERE s.col = ?
> 
> Is there any HINT that can optimize query?



java.io.IOException: Added a key not lexically larger than previous

2019-06-19 Thread Alexander Batyrshin
 Hello,
Are there any ideas where this problem comes from and how to fix?

Jun 18 21:38:05 prod022 hbase[148581]: 2019-06-18 21:38:05,348 WARN  
[MemStoreFlusher.0] regionserver.HStore: Failed flushing store file, retrying 
num=9
Jun 18 21:38:05 prod022 hbase[148581]: java.io.IOException: Added a key not 
lexically larger than previous. Current cell = 
\x0D100395583733fW+,WQ/d:p/1560882798036/DeleteColumn/vlen=0/seqid=30023231, 
lastCell = \x0D100395583733fW+,WQ/d:p/1560882798036/Put/vlen=29/seqid=30023591
Jun 18 21:38:05 prod022 hbase[148581]: at 
org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
Jun 18 21:38:05 prod022 hbase[148581]: at 
org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
Jun 18 21:38:05 prod022 hbase[148581]: at 
org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
Jun 18 21:38:05 prod022 hbase[148581]: at 
org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1053)
Jun 18 21:38:05 prod022 hbase[148581]: at 
org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
Jun 18 21:38:05 prod022 hbase[148581]: at 
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
Jun 18 21:38:05 prod022 hbase[148581]: at 
org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:969)
Jun 18 21:38:05 prod022 hbase[148581]: at 
org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2484)
Jun 18 21:38:05 prod022 hbase[148581]: at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
Jun 18 21:38:05 prod022 hbase[148581]: at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
Jun 18 21:38:05 prod022 hbase[148581]: at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
Jun 18 21:38:05 prod022 hbase[148581]: at 
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200)
Jun 18 21:38:05 prod022 hbase[148581]: at 
org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125)
Jun 18 21:38:05 prod022 hbase[148581]: at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:512)
Jun 18 21:38:05 prod022 hbase[148581]: at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:482)
Jun 18 21:38:05 prod022 hbase[148581]: at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
Jun 18 21:38:05 prod022 hbase[148581]: at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
Jun 18 21:38:05 prod022 hbase[148581]: at 
java.lang.Thread.run(Thread.java:748)
Jun 18 21:38:05 prod022 hbase[148581]: 2019-06-18 21:38:05,373 FATAL 
[MemStoreFlusher.0] regionserver.HRegionServer: ABORTING region server 
prod022,60020,1560521871613: Replay of WAL required. Forcing server shutdown
Jun 18 21:38:05 prod022 hbase[148581]: 
org.apache.hadoop.hbase.DroppedSnapshotException: region: 
TBL_C,\x0D04606203096428+jaVbx.,1558885224779.b4633aee06956663b05e8322ce34b0a3.
Jun 18 21:38:05 prod022 hbase[148581]: at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2675)
Jun 18 21:38:05 prod022 hbase[148581]: at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
Jun 18 21:38:05 prod022 hbase[148581]: at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
Jun 18 21:38:05 prod022 hbase[148581]: at 
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200)
Jun 18 21:38:05 prod022 hbase[148581]: at 
org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125)
Jun 18 21:38:05 prod022 hbase[148581]: at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:512)
Jun 18 21:38:05 prod022 hbase[148581]: at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:482)
Jun 18 21:38:05 prod022 hbase[148581]: at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
Jun 18 21:38:05 prod022 hbase[148581]: at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
Jun 18 21:38:05 prod022 hbase[148581]: at 
java.lang.Thread.run(Thread.java:748)
Jun 18 21:38:05 prod022 hbase[148581]: Caused by: java.io.IOException: Added a 
key not lexically larger than previous. Current cell = 
\x0D100395583733fW+,WQ/d:p/1560882798036/DeleteColumn/vlen=0/seqid=30023231, 
lastCell = \x0D100395583733fW+,WQ/d:p/1560882798036/Put/vlen=29/seqid=30023591
Jun 18 21:38:05 prod022 hbase[148581]: at 

Re: Query optimization

2019-06-19 Thread Vincent Poon
'table1.col = ?' will be a full table scan of table1 unless you have a
secondary index on table.col
Check the explain plan to see if it's working as expected

On Wed, Jun 19, 2019 at 7:43 AM Alexander Batyrshin <0x62...@gmail.com>
wrote:

>  Hello,
> We have 2 tables:
>
> Table1 - big one (2000M+ rows):
>
> CREATE TABLE table1 (
> pk varchar PRIMARY KEY,
> col varchar
> );
>
> Table2 - small one (300K rows):
>
> CREATE TABLE table2 (
> pk varchar PRIMARY KEY,
> other varchar
> );
>
> Query like this work fast (~ 30sec):
> SELECT table1.pk,  table1.col
> FROM table1
> WHERE table1.pk IN ( SELECT table2.pk FROM table2 )
>
> But query like this work quite slow (>10min):
> SELECT table1.pk
> FROM table1
> WHERE table1.col = ? AND table1.pk IN ( SELECT table2.pk FROM table2 )
>
> Also query below work slow:
> SELECT *
> FROM (
> SELECT table1.pk,  table1.col
> FROM table1
> WHERE table1.pk IN ( SELECT table2.pk FROM table2 )
> ) AS s
> WHERE s.col = ?
>
> Is there any HINT that can optimize query?
>


Re: A strange question about Phoenix

2019-06-19 Thread jesse
1) hbase clone-snapshot into my_table
2) sqlline.py zk:port console  to create my_table.

Very straight forward.



On Wed, Jun 19, 2019, 11:40 AM anil gupta  wrote:

> Sounds strange.
> What steps you followed to restore snapshot of Phoenix table?
>
> On Tue, Jun 18, 2019 at 9:34 PM jesse  wrote:
>
>> hi:
>>
>>  When my table is restored via hbase clone-snapshot,
>>
>> 1) sqlline.py console shows the proper number of records:  select
>> count(*) from my_table.
>> 2) select my_column from my_table limit 1 works fine.
>>
>>  However, select * from my_table limit 1; returns no row.
>>
>>  Do I need to perform some extra operations?
>>
>>  thanks
>>
>>
>>
>>
>>
>>
>>
>
> --
> Thanks & Regards,
> Anil Gupta
>


Re: A strange question about Phoenix

2019-06-19 Thread anil gupta
Sounds strange.
What steps you followed to restore snapshot of Phoenix table?

On Tue, Jun 18, 2019 at 9:34 PM jesse  wrote:

> hi:
>
>  When my table is restored via hbase clone-snapshot,
>
> 1) sqlline.py console shows the proper number of records:  select count(*)
> from my_table.
> 2) select my_column from my_table limit 1 works fine.
>
>  However, select * from my_table limit 1; returns no row.
>
>  Do I need to perform some extra operations?
>
>  thanks
>
>
>
>
>
>
>

-- 
Thanks & Regards,
Anil Gupta


Query optimization

2019-06-19 Thread Alexander Batyrshin
 Hello,
We have 2 tables:

Table1 - big one (2000M+ rows):

CREATE TABLE table1 (
pk varchar PRIMARY KEY,
col varchar
);

Table2 - small one (300K rows):

CREATE TABLE table2 (
pk varchar PRIMARY KEY,
other varchar
);

Query like this work fast (~ 30sec):
SELECT table1.pk,  table1.col
FROM table1
WHERE table1.pk IN ( SELECT table2.pk FROM table2 )

But query like this work quite slow (>10min):
SELECT table1.pk
FROM table1
WHERE table1.col = ? AND table1.pk IN ( SELECT table2.pk FROM table2 )

Also query below work slow:
SELECT *
FROM (
SELECT table1.pk,  table1.col
FROM table1
WHERE table1.pk IN ( SELECT table2.pk FROM table2 )
) AS s
WHERE s.col = ?

Is there any HINT that can optimize query?