Problems building Phoenix 4.2.2 with BigTop

2014-12-15 Thread Chabot, Jerry
I have an Apache BigTop VM setup. I verified I could build Apache Phoenix 
4.2.1. But, it fails after I change the version number to 4.2.2.  I changed the 
/usr/src/bigtop/bigtop.mk  in two places.

# Phoenix
PHOENIX_NAME=phoenix
PHOENIX_RELNOTES_NAME=Phoenix: A SQL skin over HBase
PHOENIX_PKG_NAME=phoenix
PHOENIX_BASE_VERSION=4.2.2
PHOENIX_PKG_VERSION=4.2.2
PHOENIX_RELEASE_VERSION=1
PHOENIX_TARBALL_DST=phoenix-$(PHOENIX_BASE_VERSION)-src.tar.gz
PHOENIX_TARBALL_SRC=$(PHOENIX_TARBALL_DST)
PHOENIX_DOWNLOAD_PATH=/phoenix/phoenix-$(PHOENIX_BASE_VERSION)/src/
PHOENIX_SITE=$(APACHE_MIRROR)$(PHOENIX_DOWNLOAD_PATH)
PHOENIX_ARCHIVE=$(APACHE_ARCHIVE)$(PHOENIX_DOWNLOAD_PATH)
$(eval $(call PACKAGE,phoenix,PHOENIX))

When I build using the command line I get this error:

[INFO]
[INFO] 
[INFO] Building Apache Phoenix 4.2.2
[INFO] 
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ phoenix ---
[INFO]
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ phoenix ---
[INFO]
[INFO] --- maven-source-plugin:2.2.1:jar-no-fork (attach-sources) @ phoenix ---
[INFO]
[INFO] --- maven-jar-plugin:2.4:test-jar (default) @ phoenix ---
[WARNING] JAR will be empty - no content was marked for inclusion!
[INFO] Building jar: 
/usr/src/bigtop/build/phoenix/rpm/BUILD/phoenix-4.2.2-src/target/phoenix-4.2.2-tests.jar
[INFO]
[INFO] --- maven-site-plugin:3.2:attach-descriptor (attach-descriptor) @ 
phoenix ---
[INFO]
[INFO] 
[INFO] Building Phoenix Core 4.2.2
[INFO] 
[WARNING] The POM for org.iq80.snappy:snappy:jar:1.0.4.1 is missing, no 
dependency information available
[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Phoenix  SUCCESS [4.173s]
[INFO] Phoenix Core .. FAILURE [2.799s]
[INFO] Phoenix - Flume ... SKIPPED
[INFO] Phoenix - Pig . SKIPPED
[INFO] Phoenix Assembly .. SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 7.834s
[INFO] Finished at: Mon Dec 15 14:13:38 UTC 2014
[INFO] Final Memory: 20M/289M
[INFO] 
[ERROR] Failed to execute goal on project phoenix-core: Could not resolve 
dependencies for project org.apache.phoenix:phoenix-core:jar:4.2.2: Failure to 
find org.iq80.snappy:snappy:jar:1.0.4.1 in 
https://repository.apache.org/content/repositories/releases/ was cached in the 
local repository, resolution will not be reattempted until the update interval 
of apache release has elapsed or updates are forced - [Help 1]
[ERROR]

I am new to using BigTop. Is there another setting I need to configure?

-Jerry



queries getting rejected

2014-12-15 Thread Abe Weinograd
Hi,

We resized (reduced) our cluster size.  Running any queries that aggregate
on our datasets, I am getting the following errors.  After the first query,
that error returns immediately AND even for an EXPLAIN.  I saw a couple of
other posts about this and the recommendation is to jack up the threadPool
and queue configs.  Is that right?

Abe

Task org.apache.phoenix.job.JobManager$JobFutureTask@513c74aa rejected from
org.apache.phoenix.job.JobManager$1@6312e0f8[Running, pool size = 128,
active threads = 128, queued tasks = 500, completed tasks = 2183]


Re: Problems building Phoenix 4.2.2 with BigTop

2014-12-15 Thread Andrew Purtell
I think I know what is going on. The Phoenix POM defines the property
snappy.version, and sets it to 0.3, this is the version of
org.iq80.snappy we want. However, I bet the Bigtop build scaffold launches
the Phoenix build with snappy.version set to 1.0.4.1. We should
probably fix that with a Bigtop JIRA.


On Mon, Dec 15, 2014 at 6:16 AM, Chabot, Jerry jerry.p.cha...@hp.com
wrote:

  I have an Apache BigTop VM setup. I verified I could build Apache
 Phoenix 4.2.1. But, it fails after I change the version number to 4.2.2.  I
 changed the /usr/src/bigtop/bigtop.mk  in two places.



 # Phoenix

 PHOENIX_NAME=phoenix

 PHOENIX_RELNOTES_NAME=Phoenix: A SQL skin over HBase

 PHOENIX_PKG_NAME=phoenix

 PHOENIX_BASE_VERSION=*4.2.2*

 PHOENIX_PKG_VERSION=*4.2.2*

 PHOENIX_RELEASE_VERSION=1

 PHOENIX_TARBALL_DST=phoenix-$(PHOENIX_BASE_VERSION)-src.tar.gz

 PHOENIX_TARBALL_SRC=$(PHOENIX_TARBALL_DST)

 PHOENIX_DOWNLOAD_PATH=/phoenix/phoenix-$(PHOENIX_BASE_VERSION)/src/

 PHOENIX_SITE=$(APACHE_MIRROR)$(PHOENIX_DOWNLOAD_PATH)

 PHOENIX_ARCHIVE=$(APACHE_ARCHIVE)$(PHOENIX_DOWNLOAD_PATH)

 $(eval $(call PACKAGE,phoenix,PHOENIX))



 When I build using the command line I get this error:



 [INFO]


 [INFO]
 

 [INFO] Building Apache Phoenix 4.2.2

 [INFO]
 

 [INFO]

 [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ phoenix ---

 [INFO]

 [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ phoenix
 ---

 [INFO]

 [INFO] --- maven-source-plugin:2.2.1:jar-no-fork (attach-sources) @
 phoenix ---

 [INFO]

 [INFO] --- maven-jar-plugin:2.4:test-jar (default) @ phoenix ---

 [WARNING] JAR will be empty - no content was marked for inclusion!

 [INFO] Building jar:
 /usr/src/bigtop/build/phoenix/rpm/BUILD/phoenix-4.2.2-src/target/phoenix-4.2.2-tests.jar

 [INFO]

 [INFO] --- maven-site-plugin:3.2:attach-descriptor (attach-descriptor) @
 phoenix ---

 [INFO]


 [INFO]
 

 [INFO] Building Phoenix Core 4.2.2

 [INFO]
 

 [WARNING] The POM for org.iq80.snappy:snappy:jar:1.0.4.1 is missing, no
 dependency information available

 [INFO]
 

 [INFO] Reactor Summary:

 [INFO]

 [INFO] Apache Phoenix  SUCCESS [4.173s]

 [INFO] Phoenix Core .. FAILURE [2.799s]

 [INFO] Phoenix - Flume ... SKIPPED

 [INFO] Phoenix - Pig . SKIPPED

 [INFO] Phoenix Assembly .. SKIPPED

 [INFO]
 

 [INFO] BUILD FAILURE

 [INFO]
 

 [INFO] Total time: 7.834s

 [INFO] Finished at: Mon Dec 15 14:13:38 UTC 2014

 [INFO] Final Memory: 20M/289M

 [INFO]
 

 [ERROR] Failed to execute goal on project phoenix-core: Could not resolve
 dependencies for project org.apache.phoenix:phoenix-core:jar:4.2.2: Failure
 to find org.iq80.snappy:snappy:jar:1.0.4.1 in
 https://repository.apache.org/content/repositories/releases/ was cached
 in the local repository, resolution will not be reattempted until the
 update interval of apache release has elapsed or updates are forced -
 [Help 1]

 [ERROR]



 I am new to using BigTop. Is there another setting I need to configure?



 -Jerry






-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


Re: queries getting rejected

2014-12-15 Thread James Taylor
What version are you using, Abe? The important parameter to bump up if
you get a rejected exception is the queue depth
(phoenix.query.queueSize). When you upgrade to 4.2.x, the default for
this went up to 5000, the reason being that we chunk up work into
smaller pieces. If you're already overriding this, make sure you
remove that override (or increase it substantially).

Thanks,
James

On Mon, Dec 15, 2014 at 10:08 AM, Abe Weinograd a...@flonet.com wrote:
 Hi,

 We resized (reduced) our cluster size.  Running any queries that aggregate
 on our datasets, I am getting the following errors.  After the first query,
 that error returns immediately AND even for an EXPLAIN.  I saw a couple of
 other posts about this and the recommendation is to jack up the threadPool
 and queue configs.  Is that right?

 Abe

 Task org.apache.phoenix.job.JobManager$JobFutureTask@513c74aa rejected from
 org.apache.phoenix.job.JobManager$1@6312e0f8[Running, pool size = 128,
 active threads = 128, queued tasks = 500, completed tasks = 2183]


Re: Query performance question

2014-12-15 Thread Perko, Ralph J
My apologies for the delay.  I had to switch clusters and use a smaller 
dataset.  The discrepancy still exists but the numbers are a little different:

I ran the same queries as in the original email (below)

Total records: 581M

Simple query based on secondary index value used in the subselect:
1747 recs – 0.256 sec

Subselect query:  177s

Join query: 179s

The answers to your questions are below.

1. What is the Primary Key definition of your BULK_TABLE?

CONSTRAINT pkey PRIMARY KEY (file_id,recnum)

2. How many (approximately) distinct file_id values are there in the 
BULK_TABLE? (If you don't know for sure, you can just run a query to find out).

select count(distinct(file_id)) from BULK_TABLE” - 7902 - 92s

select distinct(file_id) from BULK_TABLE”  - returns in 25m

3. How long does it take to run a full-scan query on BULK_TABLE, like select * 
from BULK_TABLE”?

Results began returning after about 25min

4. How long does it take to run a full-scan join query on BULK_TABLE, like 
select * from BULK_TABLE join (select file_id, recnum from BULK_TABLE) as SS 
on BULK_TABLE.file_id = SS.file_id and BULK_TABLE.recnum = SS.recnum”?

The full-scan join fails with a MaxServerCacheSizeExceededException  - server 
cache set to 1G.

Custom hbase/phoenix settings are attached.

Thanks,
Ralph


From: Maryann Xue maryann@gmail.commailto:maryann@gmail.com
Reply-To: user@phoenix.apache.orgmailto:user@phoenix.apache.org 
user@phoenix.apache.orgmailto:user@phoenix.apache.org
Date: Friday, December 12, 2014 at 8:07 AM
To: user@phoenix.apache.orgmailto:user@phoenix.apache.org 
user@phoenix.apache.orgmailto:user@phoenix.apache.org
Subject: Re: Query performance question

Hi Ralph,

Thanks for the question!
According to the explain result you got, the optimization worked exactly as 
expected with this query:

DYNAMIC SERVER FILTER BY FILE_ID IN (SS.FILE_ID) means a skip-scan instead of 
a full-scan over BULK_TABLE will be executed at runtime based on the values of 
file_id it got from the inner query.

So I need to know a few more things:
1. What is the Primary Key definition of your BULK_TABLE?
2. How many (approximately) distinct file_id values are there in the 
BULK_TABLE? (If you don't know for sure, you can just run a query to find out).
3. How long does it take to run a full-scan query on BULK_TABLE, like select * 
from BULK_TABLE?
4. How long does it take to run a full-scan join query on BULK_TABLE, like 
select * from BULK_TABLE join (select file_id, recnum from BULK_TABLE) as SS 
on BULK_TABLE.file_id = SS.file_id and BULK_TABLE.recnum = SS.recnum?


Thanks,
Maryann


On Thu, Dec 11, 2014 at 6:28 PM, Perko, Ralph J 
ralph.pe...@pnnl.govmailto:ralph.pe...@pnnl.gov wrote:
Hi,

Thanks for all your help thus far with Phoenix.

I am trying to understand the best way to construct a query that returns all 
the fields from a table but still takes advantage of a single field secondary 
index.   I have a table with upwards of 50 fields and do not wish to index them 
all but the use case exists to return them all.

My general approach is to first select the records I want using an indexed 
field then use the returned pk values to get the entire record in the form of a 
subselect or join.

The initial select executes very fast, sub-second , returning close to 3000 
records.  When used as a subselect or join the entire query takes very long 
(over 15min)  or does not return.   Based on the processing plans it appears 
there is a lot more going on than just a simple look-up of the values returned 
in the subselect.  Is there a way to do this using Phoenix SQL syntax?

Any suggestions are appreciated.

Initial indexed query (very fast):

SELECT file_id,recnum
FROM BULK_TABLE
WHERE saddr IN (ip1,ip2,ip3))

file_id and recnum make up the primary key

Plan:
| CLIENT PARALLEL 10-WAY SKIP SCAN ON 30 KEYS OVER BULK_TABLE_SADDR_IDX 
[0,1,000,004,076] - [9,1,000,142,114] |
| CLIENT MERGE SORT |

Used as a subselect (times out):

SELECT * FROM BULK_TABLE
WHERE (file_id,recnum) IN(SELECT file_id,recnum
 FROM BULK_TABLE
  WHERE saddr IN (ip1,ip2,ip3));
Plan:
| CLIENT PARALLEL 10-WAY FULL SCAN OVER BULK_TABLE |
| CLIENT MERGE SORT |
| PARALLEL SEMI-JOIN TABLE 0 (SKIP MERGE) |
| CLIENT PARALLEL 10-WAY SKIP SCAN ON 30 KEYS OVER BULK_TABLE_SADDR_IDX 
[0,1,000,004,076] - [9,1,000,142,114] |
| SERVER AGGREGATE INTO DISTINCT ROWS BY [RECNUM, FILE_ID] |
| CLIENT MERGE SORT |
| DYNAMIC SERVER FILTER BY FILE_ID IN ($1.$3) |

Another approach using using a join instead:

SELECT *
FROM BULK_TABLE
JOIN
(SELECT file_id, recnum
 FROM BULK_TABLE
 WHERE saddr in (ip1,ip2,ip3)) AS SS
ON BULK_TABLE.file_id = SS.file_id AND BULK_TABLE.recnum = SS.recnum;

Runs faster but still can take about 15min

Plan:

| CLIENT PARALLEL 10-WAY FULL SCAN OVER BULK_TABLE |
| CLIENT MERGE SORT |
| PARALLEL INNER-JOIN TABLE 0 |
| CLIENT PARALLEL 10-WAY SKIP SCAN ON 30 KEYS OVER 

Re: Query performance question

2014-12-15 Thread Maryann Xue
Hi Ralph,

Thank you very much for the information! Very helpful for your questions.
The numbers look reasonable as opposed to the query plan. But the only
problem is both file_id and recnum should be used in DYNAMIC SERVER
FILTER (which will turn into a skip-scan), but the plan you got there only
used file_id alone.
So could you please attach the DDL of your table and index so that I can
try reproducing the problem?


Thanks,
Maryann


On Mon, Dec 15, 2014 at 2:46 PM, Perko, Ralph J ralph.pe...@pnnl.gov
wrote:

   My apologies for the delay.  I had to switch clusters and use a smaller
 dataset.  The discrepancy still exists but the numbers are a little
 different:

  I ran the same queries as in the original email (below)

  Total records: 581M

   Simple query based on secondary index value used in the subselect:
  1747 recs – 0.256 sec

  Subselect query:  177s

  Join query: 179s

  The answers to your questions are below.

   1. What is the Primary Key definition of your BULK_TABLE?

  CONSTRAINT pkey PRIMARY KEY (file_id,recnum)

  2. How many (approximately) distinct file_id values are there in the
 BULK_TABLE? (If you don't know for sure, you can just run a query to find
 out).

  select count(distinct(file_id)) from BULK_TABLE” - 7902 - 92s

  select distinct(file_id) from BULK_TABLE”  - returns in 25m

  3. How long does it take to run a full-scan query on BULK_TABLE, like
 select * from BULK_TABLE”?

  Results began returning after about 25min

  4. How long does it take to run a full-scan join query on BULK_TABLE,
 like select * from BULK_TABLE join (select file_id, recnum from
 BULK_TABLE) as SS on BULK_TABLE.file_id = SS.file_id and BULK_TABLE.recnum
 = SS.recnum”?

  The full-scan join fails with a MaxServerCacheSizeExceededException  -
 server cache set to 1G.

  Custom hbase/phoenix settings are attached.

  Thanks,
  Ralph


   From: Maryann Xue maryann@gmail.com
 Reply-To: user@phoenix.apache.org user@phoenix.apache.org
 Date: Friday, December 12, 2014 at 8:07 AM
 To: user@phoenix.apache.org user@phoenix.apache.org
 Subject: Re: Query performance question

   Hi Ralph,

  Thanks for the question!
 According to the explain result you got, the optimization worked exactly
 as expected with this query:

  DYNAMIC SERVER FILTER BY FILE_ID IN (SS.FILE_ID) means a skip-scan
 instead of a full-scan over BULK_TABLE will be executed at runtime based on
 the values of file_id it got from the inner query.

  So I need to know a few more things:
 1. What is the Primary Key definition of your BULK_TABLE?
 2. How many (approximately) distinct file_id values are there in the
 BULK_TABLE? (If you don't know for sure, you can just run a query to find
 out).
 3. How long does it take to run a full-scan query on BULK_TABLE, like
 select * from BULK_TABLE?
 4. How long does it take to run a full-scan join query on BULK_TABLE, like
 select * from BULK_TABLE join (select file_id, recnum from BULK_TABLE)
 as SS on BULK_TABLE.file_id = SS.file_id and BULK_TABLE.recnum =
 SS.recnum?


  Thanks,
 Maryann


 On Thu, Dec 11, 2014 at 6:28 PM, Perko, Ralph J ralph.pe...@pnnl.gov
 wrote:

  Hi,

  Thanks for all your help thus far with Phoenix.

  I am trying to understand the best way to construct a query that
 returns all the fields from a table but still takes advantage of a single
 field secondary index.   I have a table with upwards of 50 fields and do
 not wish to index them all but the use case exists to return them all.

  My general approach is to first select the records I want using an
 indexed field then use the returned pk values to get the entire record in
 the form of a subselect or join.

  The initial select executes very fast, sub-second , returning close to
 3000 records.  When used as a subselect or join the entire query takes very
 long (over 15min)  or does not return.   Based on the processing plans it
 appears there is a lot more going on than just a simple look-up of the
 values returned in the subselect.  Is there a way to do this using Phoenix
 SQL syntax?

  Any suggestions are appreciated.

  Initial indexed query (very fast):

  SELECT file_id,recnum

 FROM BULK_TABLE

 WHERE saddr IN (ip1,ip2,ip3))


  file_id and recnum make up the primary key


  Plan:

 | CLIENT PARALLEL 10-WAY SKIP SCAN ON 30 KEYS OVER BULK_TABLE_SADDR_IDX
 [0,1,000,004,076] - [9,1,000,142,114] |

 | CLIENT MERGE SORT |


  Used as a subselect (times out):

   SELECT * FROM BULK_TABLE

 WHERE (file_id,recnum) IN(SELECT file_id,recnum

  FROM BULK_TABLE

   WHERE saddr IN (ip1,ip2,ip3));

 Plan:

 | CLIENT PARALLEL 10-WAY FULL SCAN OVER BULK_TABLE |

 | CLIENT MERGE SORT |

 | PARALLEL SEMI-JOIN TABLE 0 (SKIP MERGE) |

 | CLIENT PARALLEL 10-WAY SKIP SCAN ON 30 KEYS
 OVER BULK_TABLE_SADDR_IDX [0,1,000,004,076] - [9,1,000,142,114] |

 | SERVER AGGREGATE INTO DISTINCT ROWS BY [RECNUM, FILE_ID] |

 | CLIENT MERGE SORT |

 | DYNAMIC SERVER FILTER BY FILE_ID IN ($1.$3) 

Re: Query performance question

2014-12-15 Thread Maryann Xue
And one more thing: the version of Phoenix you are running.

On Mon, Dec 15, 2014 at 3:21 PM, Maryann Xue maryann@gmail.com wrote:

 Hi Ralph,

 Thank you very much for the information! Very helpful for your questions.
 The numbers look reasonable as opposed to the query plan. But the only
 problem is both file_id and recnum should be used in DYNAMIC SERVER
 FILTER (which will turn into a skip-scan), but the plan you got there only
 used file_id alone.
 So could you please attach the DDL of your table and index so that I can
 try reproducing the problem?


 Thanks,
 Maryann


 On Mon, Dec 15, 2014 at 2:46 PM, Perko, Ralph J ralph.pe...@pnnl.gov
 wrote:

   My apologies for the delay.  I had to switch clusters and use a
 smaller dataset.  The discrepancy still exists but the numbers are a little
 different:

  I ran the same queries as in the original email (below)

  Total records: 581M

   Simple query based on secondary index value used in the subselect:
  1747 recs – 0.256 sec

  Subselect query:  177s

  Join query: 179s

  The answers to your questions are below.

   1. What is the Primary Key definition of your BULK_TABLE?

  CONSTRAINT pkey PRIMARY KEY (file_id,recnum)

  2. How many (approximately) distinct file_id values are there in the
 BULK_TABLE? (If you don't know for sure, you can just run a query to find
 out).

  select count(distinct(file_id)) from BULK_TABLE” - 7902 - 92s

  select distinct(file_id) from BULK_TABLE”  - returns in 25m

  3. How long does it take to run a full-scan query on BULK_TABLE, like
 select * from BULK_TABLE”?

  Results began returning after about 25min

  4. How long does it take to run a full-scan join query on BULK_TABLE,
 like select * from BULK_TABLE join (select file_id, recnum from
 BULK_TABLE) as SS on BULK_TABLE.file_id = SS.file_id and BULK_TABLE.recnum
 = SS.recnum”?

  The full-scan join fails with a MaxServerCacheSizeExceededException  -
 server cache set to 1G.

  Custom hbase/phoenix settings are attached.

  Thanks,
  Ralph


   From: Maryann Xue maryann@gmail.com
 Reply-To: user@phoenix.apache.org user@phoenix.apache.org
 Date: Friday, December 12, 2014 at 8:07 AM
 To: user@phoenix.apache.org user@phoenix.apache.org
 Subject: Re: Query performance question

   Hi Ralph,

  Thanks for the question!
 According to the explain result you got, the optimization worked
 exactly as expected with this query:

  DYNAMIC SERVER FILTER BY FILE_ID IN (SS.FILE_ID) means a skip-scan
 instead of a full-scan over BULK_TABLE will be executed at runtime based on
 the values of file_id it got from the inner query.

  So I need to know a few more things:
 1. What is the Primary Key definition of your BULK_TABLE?
 2. How many (approximately) distinct file_id values are there in the
 BULK_TABLE? (If you don't know for sure, you can just run a query to find
 out).
 3. How long does it take to run a full-scan query on BULK_TABLE, like
 select * from BULK_TABLE?
 4. How long does it take to run a full-scan join query on BULK_TABLE,
 like select * from BULK_TABLE join (select file_id, recnum from
 BULK_TABLE) as SS on BULK_TABLE.file_id = SS.file_id and
 BULK_TABLE.recnum = SS.recnum?


  Thanks,
 Maryann


 On Thu, Dec 11, 2014 at 6:28 PM, Perko, Ralph J ralph.pe...@pnnl.gov
 wrote:

  Hi,

  Thanks for all your help thus far with Phoenix.

  I am trying to understand the best way to construct a query that
 returns all the fields from a table but still takes advantage of a single
 field secondary index.   I have a table with upwards of 50 fields and do
 not wish to index them all but the use case exists to return them all.

  My general approach is to first select the records I want using an
 indexed field then use the returned pk values to get the entire record in
 the form of a subselect or join.

  The initial select executes very fast, sub-second , returning close to
 3000 records.  When used as a subselect or join the entire query takes very
 long (over 15min)  or does not return.   Based on the processing plans it
 appears there is a lot more going on than just a simple look-up of the
 values returned in the subselect.  Is there a way to do this using Phoenix
 SQL syntax?

  Any suggestions are appreciated.

  Initial indexed query (very fast):

  SELECT file_id,recnum

 FROM BULK_TABLE

 WHERE saddr IN (ip1,ip2,ip3))


  file_id and recnum make up the primary key


  Plan:

 | CLIENT PARALLEL 10-WAY SKIP SCAN ON 30 KEYS OVER BULK_TABLE_SADDR_IDX
 [0,1,000,004,076] - [9,1,000,142,114] |

 | CLIENT MERGE SORT |


  Used as a subselect (times out):

   SELECT * FROM BULK_TABLE

 WHERE (file_id,recnum) IN(SELECT file_id,recnum

  FROM BULK_TABLE

   WHERE saddr IN (ip1,ip2,ip3));

 Plan:

 | CLIENT PARALLEL 10-WAY FULL SCAN OVER BULK_TABLE |

 | CLIENT MERGE SORT |

 | PARALLEL SEMI-JOIN TABLE 0 (SKIP MERGE) |

 | CLIENT PARALLEL 10-WAY SKIP SCAN ON 30 KEYS
 OVER BULK_TABLE_SADDR_IDX [0,1,000,004,076] - [9,1,000,142,114] |

 |

Re: Query performance question

2014-12-15 Thread Perko, Ralph J
DDL is attached – thanks!

Ralph


From: Maryann Xue maryann@gmail.commailto:maryann@gmail.com
Reply-To: user@phoenix.apache.orgmailto:user@phoenix.apache.org 
user@phoenix.apache.orgmailto:user@phoenix.apache.org
Date: Monday, December 15, 2014 at 12:21 PM
To: user@phoenix.apache.orgmailto:user@phoenix.apache.org 
user@phoenix.apache.orgmailto:user@phoenix.apache.org
Subject: Re: Query performance question

Hi Ralph,

Thank you very much for the information! Very helpful for your questions.
The numbers look reasonable as opposed to the query plan. But the only problem 
is both file_id and recnum should be used in DYNAMIC SERVER FILTER (which 
will turn into a skip-scan), but the plan you got there only used file_id alone.
So could you please attach the DDL of your table and index so that I can try 
reproducing the problem?


Thanks,
Maryann


On Mon, Dec 15, 2014 at 2:46 PM, Perko, Ralph J 
ralph.pe...@pnnl.govmailto:ralph.pe...@pnnl.gov wrote:
My apologies for the delay.  I had to switch clusters and use a smaller 
dataset.  The discrepancy still exists but the numbers are a little different:

I ran the same queries as in the original email (below)

Total records: 581M

Simple query based on secondary index value used in the subselect:
1747 recs – 0.256 sec

Subselect query:  177s

Join query: 179s

The answers to your questions are below.

1. What is the Primary Key definition of your BULK_TABLE?

CONSTRAINT pkey PRIMARY KEY (file_id,recnum)

2. How many (approximately) distinct file_id values are there in the 
BULK_TABLE? (If you don't know for sure, you can just run a query to find out).

select count(distinct(file_id)) from BULK_TABLE” - 7902 - 92s

select distinct(file_id) from BULK_TABLE”  - returns in 25m

3. How long does it take to run a full-scan query on BULK_TABLE, like select * 
from BULK_TABLE”?

Results began returning after about 25min

4. How long does it take to run a full-scan join query on BULK_TABLE, like 
select * from BULK_TABLE join (select file_id, recnum from BULK_TABLE) as SS 
on BULK_TABLE.file_id = SS.file_id and BULK_TABLE.recnum = SS.recnum”?

The full-scan join fails with a MaxServerCacheSizeExceededException  - server 
cache set to 1G.

Custom hbase/phoenix settings are attached.

Thanks,
Ralph


From: Maryann Xue maryann@gmail.commailto:maryann@gmail.com
Reply-To: user@phoenix.apache.orgmailto:user@phoenix.apache.org 
user@phoenix.apache.orgmailto:user@phoenix.apache.org
Date: Friday, December 12, 2014 at 8:07 AM
To: user@phoenix.apache.orgmailto:user@phoenix.apache.org 
user@phoenix.apache.orgmailto:user@phoenix.apache.org
Subject: Re: Query performance question

Hi Ralph,

Thanks for the question!
According to the explain result you got, the optimization worked exactly as 
expected with this query:

DYNAMIC SERVER FILTER BY FILE_ID IN (SS.FILE_ID) means a skip-scan instead of 
a full-scan over BULK_TABLE will be executed at runtime based on the values of 
file_id it got from the inner query.

So I need to know a few more things:
1. What is the Primary Key definition of your BULK_TABLE?
2. How many (approximately) distinct file_id values are there in the 
BULK_TABLE? (If you don't know for sure, you can just run a query to find out).
3. How long does it take to run a full-scan query on BULK_TABLE, like select * 
from BULK_TABLE?
4. How long does it take to run a full-scan join query on BULK_TABLE, like 
select * from BULK_TABLE join (select file_id, recnum from BULK_TABLE) as SS 
on BULK_TABLE.file_id = SS.file_id and BULK_TABLE.recnum = SS.recnum?


Thanks,
Maryann


On Thu, Dec 11, 2014 at 6:28 PM, Perko, Ralph J 
ralph.pe...@pnnl.govmailto:ralph.pe...@pnnl.gov wrote:
Hi,

Thanks for all your help thus far with Phoenix.

I am trying to understand the best way to construct a query that returns all 
the fields from a table but still takes advantage of a single field secondary 
index.   I have a table with upwards of 50 fields and do not wish to index them 
all but the use case exists to return them all.

My general approach is to first select the records I want using an indexed 
field then use the returned pk values to get the entire record in the form of a 
subselect or join.

The initial select executes very fast, sub-second , returning close to 3000 
records.  When used as a subselect or join the entire query takes very long 
(over 15min)  or does not return.   Based on the processing plans it appears 
there is a lot more going on than just a simple look-up of the values returned 
in the subselect.  Is there a way to do this using Phoenix SQL syntax?

Any suggestions are appreciated.

Initial indexed query (very fast):

SELECT file_id,recnum
FROM BULK_TABLE
WHERE saddr IN (ip1,ip2,ip3))

file_id and recnum make up the primary key

Plan:
| CLIENT PARALLEL 10-WAY SKIP SCAN ON 30 KEYS OVER BULK_TABLE_SADDR_IDX 
[0,1,000,004,076] - [9,1,000,142,114] |
| CLIENT MERGE SORT |

Used as a subselect (times out):

SELECT * FROM BULK_TABLE
WHERE (file_id,recnum) 

Re: Query performance question

2014-12-15 Thread Perko, Ralph J
I am using version 4.2.2


From: Perko, Ralph Perko ralph.pe...@pnnl.govmailto:ralph.pe...@pnnl.gov
Reply-To: user@phoenix.apache.orgmailto:user@phoenix.apache.org 
user@phoenix.apache.orgmailto:user@phoenix.apache.org
Date: Monday, December 15, 2014 at 12:37 PM
To: user@phoenix.apache.orgmailto:user@phoenix.apache.org 
user@phoenix.apache.orgmailto:user@phoenix.apache.org
Subject: Re: Query performance question

DDL is attached – thanks!

Ralph


From: Maryann Xue maryann@gmail.commailto:maryann@gmail.com
Reply-To: user@phoenix.apache.orgmailto:user@phoenix.apache.org 
user@phoenix.apache.orgmailto:user@phoenix.apache.org
Date: Monday, December 15, 2014 at 12:21 PM
To: user@phoenix.apache.orgmailto:user@phoenix.apache.org 
user@phoenix.apache.orgmailto:user@phoenix.apache.org
Subject: Re: Query performance question

Hi Ralph,

Thank you very much for the information! Very helpful for your questions.
The numbers look reasonable as opposed to the query plan. But the only problem 
is both file_id and recnum should be used in DYNAMIC SERVER FILTER (which 
will turn into a skip-scan), but the plan you got there only used file_id alone.
So could you please attach the DDL of your table and index so that I can try 
reproducing the problem?


Thanks,
Maryann


On Mon, Dec 15, 2014 at 2:46 PM, Perko, Ralph J 
ralph.pe...@pnnl.govmailto:ralph.pe...@pnnl.gov wrote:
My apologies for the delay.  I had to switch clusters and use a smaller 
dataset.  The discrepancy still exists but the numbers are a little different:

I ran the same queries as in the original email (below)

Total records: 581M

Simple query based on secondary index value used in the subselect:
1747 recs – 0.256 sec

Subselect query:  177s

Join query: 179s

The answers to your questions are below.

1. What is the Primary Key definition of your BULK_TABLE?

CONSTRAINT pkey PRIMARY KEY (file_id,recnum)

2. How many (approximately) distinct file_id values are there in the 
BULK_TABLE? (If you don't know for sure, you can just run a query to find out).

select count(distinct(file_id)) from BULK_TABLE” - 7902 - 92s

select distinct(file_id) from BULK_TABLE”  - returns in 25m

3. How long does it take to run a full-scan query on BULK_TABLE, like select * 
from BULK_TABLE”?

Results began returning after about 25min

4. How long does it take to run a full-scan join query on BULK_TABLE, like 
select * from BULK_TABLE join (select file_id, recnum from BULK_TABLE) as SS 
on BULK_TABLE.file_id = SS.file_id and BULK_TABLE.recnum = SS.recnum”?

The full-scan join fails with a MaxServerCacheSizeExceededException  - server 
cache set to 1G.

Custom hbase/phoenix settings are attached.

Thanks,
Ralph


From: Maryann Xue maryann@gmail.commailto:maryann@gmail.com
Reply-To: user@phoenix.apache.orgmailto:user@phoenix.apache.org 
user@phoenix.apache.orgmailto:user@phoenix.apache.org
Date: Friday, December 12, 2014 at 8:07 AM
To: user@phoenix.apache.orgmailto:user@phoenix.apache.org 
user@phoenix.apache.orgmailto:user@phoenix.apache.org
Subject: Re: Query performance question

Hi Ralph,

Thanks for the question!
According to the explain result you got, the optimization worked exactly as 
expected with this query:

DYNAMIC SERVER FILTER BY FILE_ID IN (SS.FILE_ID) means a skip-scan instead of 
a full-scan over BULK_TABLE will be executed at runtime based on the values of 
file_id it got from the inner query.

So I need to know a few more things:
1. What is the Primary Key definition of your BULK_TABLE?
2. How many (approximately) distinct file_id values are there in the 
BULK_TABLE? (If you don't know for sure, you can just run a query to find out).
3. How long does it take to run a full-scan query on BULK_TABLE, like select * 
from BULK_TABLE?
4. How long does it take to run a full-scan join query on BULK_TABLE, like 
select * from BULK_TABLE join (select file_id, recnum from BULK_TABLE) as SS 
on BULK_TABLE.file_id = SS.file_id and BULK_TABLE.recnum = SS.recnum?


Thanks,
Maryann


On Thu, Dec 11, 2014 at 6:28 PM, Perko, Ralph J 
ralph.pe...@pnnl.govmailto:ralph.pe...@pnnl.gov wrote:
Hi,

Thanks for all your help thus far with Phoenix.

I am trying to understand the best way to construct a query that returns all 
the fields from a table but still takes advantage of a single field secondary 
index.   I have a table with upwards of 50 fields and do not wish to index them 
all but the use case exists to return them all.

My general approach is to first select the records I want using an indexed 
field then use the returned pk values to get the entire record in the form of a 
subselect or join.

The initial select executes very fast, sub-second , returning close to 3000 
records.  When used as a subselect or join the entire query takes very long 
(over 15min)  or does not return.   Based on the processing plans it appears 
there is a lot more going on than just a simple look-up of the values returned 
in the subselect.  Is there a way to do this using Phoenix 

Re: queries getting rejected

2014-12-15 Thread Abe Weinograd
Hi James,

4.1 right now. We are currently using defaults.  I can try and jacking that
up if it makes sense.

Right now, this version of DBVis doesn't have our stock hbase-site.xml in
its classpath.  Do we have to add that or can we get away with setting that
somewhere in the query tool.  What is the best way to do that?

Thanks!
Abe

On Mon, Dec 15, 2014 at 2:15 PM, James Taylor jamestay...@apache.org
wrote:

 What version are you using, Abe? The important parameter to bump up if
 you get a rejected exception is the queue depth
 (phoenix.query.queueSize). When you upgrade to 4.2.x, the default for
 this went up to 5000, the reason being that we chunk up work into
 smaller pieces. If you're already overriding this, make sure you
 remove that override (or increase it substantially).

 Thanks,
 James

 On Mon, Dec 15, 2014 at 10:08 AM, Abe Weinograd a...@flonet.com wrote:
  Hi,
 
  We resized (reduced) our cluster size.  Running any queries that
 aggregate
  on our datasets, I am getting the following errors.  After the first
 query,
  that error returns immediately AND even for an EXPLAIN.  I saw a couple
 of
  other posts about this and the recommendation is to jack up the
 threadPool
  and queue configs.  Is that right?
 
  Abe
 
  Task org.apache.phoenix.job.JobManager$JobFutureTask@513c74aa rejected
 from
  org.apache.phoenix.job.JobManager$1@6312e0f8[Running, pool size = 128,
  active threads = 128, queued tasks = 500, completed tasks = 2183]



hbase 0.96.1.1-cdh5.0.1 for phoenix compatibility

2014-12-15 Thread su...@certusnet.com.cn
Hi,all
Just want to confirm the appropriate versions of phoenix for compatibility to 
hbase 0.96.1.1-cdh5.0.1.
Are the latest 4.2.2 release working well for the hbase version?

Thanks,
Sun.





CertusNet 



Re: Re: hbase 0.96.1.1-cdh5.0.1 for phoenix compatibility

2014-12-15 Thread su...@certusnet.com.cn
Hi, James
Thanks for your quick response. Just want to make sure that we cannot build 
phoenix for hbase 0.96.
However, is there any possibility that we can use phoenix built for hbase 
0.98.x and work well for the 0.96 
version? Cause that we would not like to upgrade our current CDH cluster.

Thanks,
Sun.





CertusNet 

 
From: James Taylor
Date: 2014-12-16 14:34
To: user
CC: dev
Subject: Re: hbase 0.96.1.1-cdh5.0.1 for phoenix compatibility
HBase 0.96 is not supported by Phoenix - only HBase 0.98.1 and above.
The CDH 5.1 releases package HBase 0.98, so these are ok.
 
On Mon, Dec 15, 2014 at 10:31 PM, su...@certusnet.com.cn
su...@certusnet.com.cn wrote:
 Hi,all
 Just want to confirm the appropriate versions of phoenix for compatibility
 to hbase 0.96.1.1-cdh5.0.1.
 Are the latest 4.2.2 release working well for the hbase version?

 Thanks,
 Sun.

 
 

 CertusNet


 
 
 


Re: Re: hbase 0.96.1.1-cdh5.0.1 for phoenix compatibility

2014-12-15 Thread James Taylor
See PHOENIX-848. It's not going to work to use Phoenix against
cdh5.0.1. I'd recommend upgrading - HBase 0.98 has made great strides
forward.

On Mon, Dec 15, 2014 at 10:48 PM, su...@certusnet.com.cn
su...@certusnet.com.cn wrote:
 Hi, James
 Thanks for your quick response. Just want to make sure that we cannot build
 phoenix for hbase 0.96.
 However, is there any possibility that we can use phoenix built for hbase
 0.98.x and work well for the 0.96
 version? Cause that we would not like to upgrade our current CDH cluster.

 Thanks,
 Sun.

 
 

 CertusNet



 From: James Taylor
 Date: 2014-12-16 14:34
 To: user
 CC: dev
 Subject: Re: hbase 0.96.1.1-cdh5.0.1 for phoenix compatibility
 HBase 0.96 is not supported by Phoenix - only HBase 0.98.1 and above.
 The CDH 5.1 releases package HBase 0.98, so these are ok.

 On Mon, Dec 15, 2014 at 10:31 PM, su...@certusnet.com.cn
 su...@certusnet.com.cn wrote:
 Hi,all
 Just want to confirm the appropriate versions of phoenix for compatibility
 to hbase 0.96.1.1-cdh5.0.1.
 Are the latest 4.2.2 release working well for the hbase version?

 Thanks,
 Sun.

 
 

 CertusNet