[jira] [Updated] (PHOENIX-5145) GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)

2019-02-19 Thread MariaCarrie (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-5145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MariaCarrie updated PHOENIX-5145:
-
Attachment: application_1548138380177_1787.txt

> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt) 
> ---
>
> Key: PHOENIX-5145
> URL: https://issues.apache.org/jira/browse/PHOENIX-5145
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 5.0.0
> Environment: >HDP 3.0.0
> >Phoenix 5.0.0
> >HBase 2.0.0
> >Spark 2.3.1
> >Hadoop 3.0.1
>Reporter: MariaCarrie
>Priority: Major
> Attachments: application_1548138380177_1772.txt, 
> application_1548138380177_1787.txt
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> I can successfully read the data using the local mode. Here is my code:
> ^val sqlContext: SQLContext = missionSession.app.ss.sqlContext^
>  ^System.setProperty("sun.security.krb5.debug", "true")^
>  ^System.setProperty("sun.security.spnego.debug", "true")^
>  ^UserGroupInformation.loginUserFromKeytab("d...@devdip.org", 
> "devdmp.keytab")^
>  ^// Load as a DataFrame directly using a Configuration object^
>  ^val df: DataFrame = 
> sqlContext.phoenixTableAsDataFrame(missionSession.config.tableName, 
> Seq("ID"), zkUrl = Some(missionSession.config.zkUrl))^
>  ^df.show(5)^
> But when I submit this to YARN for execution, an exception will be thrown. 
> The following is the exception information:
> ^Tue Feb 19 13:07:53 CST 2019, 
> RpcRetryingCaller\{globalStartTime=1550552873361, pause=100, maxAttempts=36}, 
> java.io.IOException: Call to test-dmp5.fengdai.org/10.200.162.26:16020 failed 
> on local exception: java.io.IOException: Can not send request because relogin 
> is in progress.^
>  ^Tue Feb 19 13:07:53 CST 2019, 
> RpcRetryingCaller\{globalStartTime=1550552873361, pause=100, maxAttempts=36}, 
> java.io.IOException: Call to test-dmp5.fengdai.org/10.200.162.26:16020 failed 
> on local exception: java.io.IOException: Can not send request because relogin 
> is in progress.^
>  ^Tue Feb 19 13:07:53 CST 2019, 
> RpcRetryingCaller\{globalStartTime=1550552873361, pause=100, maxAttempts=36}, 
> java.io.IOException: Call to test-dmp5.fengdai.org/10.200.162.26:16020 failed 
> on local exception: java.io.IOException: Can not send request because relogin 
> is in progress.^
>  ^Tue Feb 19 13:07:54 CST 2019, 
> RpcRetryingCaller\{globalStartTime=1550552873361, pause=100, maxAttempts=36}, 
> java.io.IOException: Call to test-dmp5.fengdai.org/10.200.162.26:16020 failed 
> on local exception: java.io.IOException: Can not send request because relogin 
> is in progress.^
>  ^Tue Feb 19 13:07:54 CST 2019, 
> RpcRetryingCaller\{globalStartTime=1550552873361, pause=100, maxAttempts=36}, 
> java.io.IOException: Call to test-dmp5.fengdai.org/10.200.162.26:16020 failed 
> on local exception: java.io.IOException: Can not send request because relogin 
> is in progress.^
>  ^Tue Feb 19 13:07:55 CST 2019, 
> RpcRetryingCaller\{globalStartTime=1550552873361, pause=100, maxAttempts=36}, 
> java.io.IOException: Call to test-dmp5.fengdai.org/10.200.162.26:16020 failed 
> on local exception: java.io.IOException: Can not send request because relogin 
> is in progress.^
>  ^Tue Feb 19 13:07:57 CST 2019, 
> RpcRetryingCaller\{globalStartTime=1550552873361, pause=100, maxAttempts=36}, 
> java.io.IOException: Call to test-dmp5.fengdai.org/10.200.162.26:16020 failed 
> on local exception: java.io.IOException: Can not send request because relogin 
> is in progress.^
>  ^Tue Feb 19 13:08:01 CST 2019, 
> RpcRetryingCaller\{globalStartTime=1550552873361, pause=100, maxAttempts=36}, 
> java.io.IOException: Call to test-dmp5.fengdai.org/10.200.162.26:16020 failed 
> on local exception: javax.security.sasl.SaslException: GSS initiate failed 
> [Caused by GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos tgt)]^
>  ^Tue Feb 19 13:08:11 CST 2019, 
> RpcRetryingCaller\{globalStartTime=1550552873361, pause=100, maxAttempts=36}, 
> java.io.IOException: Call to test-dmp5.fengdai.org/10.200.162.26:16020 failed 
> on local exception: javax.security.sasl.SaslException: GSS initiate failed 
> [Caused by GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos tgt)]^
>  ^Tue Feb 19 13:08:21 CST 2019, 
> RpcRetryingCaller\{globalStartTime=1550552873361, pause=100, maxAttempts=36}, 
> java.io.IOException: Call to test-dmp5.fengdai.org/10.200.162.26:16020 failed 
> on local exception: javax.security.sasl.SaslException: GSS initiate failed 
> [Caused by GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos tgt)]^
>  ^Tue Feb 19 13:08:31 CST 2019, 
> RpcRetryingCaller\{globalStartTime=1550552873361, 

[jira] [Assigned] (PHOENIX-2787) support IF EXISTS for ALTER TABLE SET options

2019-02-19 Thread Xinyi Yan (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyi Yan reassigned PHOENIX-2787:
--

Assignee: (was: Xinyi Yan)

> support IF EXISTS for ALTER TABLE SET options
> -
>
> Key: PHOENIX-2787
> URL: https://issues.apache.org/jira/browse/PHOENIX-2787
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: Vincent Poon
>Priority: Trivial
>
> A nice-to-have improvement to the grammar:
> ALTER TABLE my_table IF EXISTS SET options
> currently the 'IF EXISTS' only works for dropping/adding a column



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (PHOENIX-2787) support IF EXISTS for ALTER TABLE SET options

2019-02-19 Thread Xinyi Yan (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyi Yan reassigned PHOENIX-2787:
--

Assignee: Xinyi Yan

> support IF EXISTS for ALTER TABLE SET options
> -
>
> Key: PHOENIX-2787
> URL: https://issues.apache.org/jira/browse/PHOENIX-2787
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: Vincent Poon
>Assignee: Xinyi Yan
>Priority: Trivial
>
> A nice-to-have improvement to the grammar:
> ALTER TABLE my_table IF EXISTS SET options
> currently the 'IF EXISTS' only works for dropping/adding a column



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-5145) GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)

2019-02-19 Thread MariaCarrie (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-5145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MariaCarrie updated PHOENIX-5145:
-
Attachment: application_1548138380177_1772.txt

> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt) 
> ---
>
> Key: PHOENIX-5145
> URL: https://issues.apache.org/jira/browse/PHOENIX-5145
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 5.0.0
> Environment: >HDP 3.0.0
> >Phoenix 5.0.0
> >HBase 2.0.0
> >Spark 2.3.1
> >Hadoop 3.0.1
>Reporter: MariaCarrie
>Priority: Major
> Attachments: application_1548138380177_1772.txt
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> I can successfully read the data using the local mode. Here is my code:
> ^val sqlContext: SQLContext = missionSession.app.ss.sqlContext^
>  ^System.setProperty("sun.security.krb5.debug", "true")^
>  ^System.setProperty("sun.security.spnego.debug", "true")^
>  ^UserGroupInformation.loginUserFromKeytab("d...@devdip.org", 
> "devdmp.keytab")^
>  ^// Load as a DataFrame directly using a Configuration object^
>  ^val df: DataFrame = 
> sqlContext.phoenixTableAsDataFrame(missionSession.config.tableName, 
> Seq("ID"), zkUrl = Some(missionSession.config.zkUrl))^
>  ^df.show(5)^
> But when I submit this to YARN for execution, an exception will be thrown. 
> The following is the exception information:
> ^Tue Feb 19 13:07:53 CST 2019, 
> RpcRetryingCaller\{globalStartTime=1550552873361, pause=100, maxAttempts=36}, 
> java.io.IOException: Call to test-dmp5.fengdai.org/10.200.162.26:16020 failed 
> on local exception: java.io.IOException: Can not send request because relogin 
> is in progress.^
>  ^Tue Feb 19 13:07:53 CST 2019, 
> RpcRetryingCaller\{globalStartTime=1550552873361, pause=100, maxAttempts=36}, 
> java.io.IOException: Call to test-dmp5.fengdai.org/10.200.162.26:16020 failed 
> on local exception: java.io.IOException: Can not send request because relogin 
> is in progress.^
>  ^Tue Feb 19 13:07:53 CST 2019, 
> RpcRetryingCaller\{globalStartTime=1550552873361, pause=100, maxAttempts=36}, 
> java.io.IOException: Call to test-dmp5.fengdai.org/10.200.162.26:16020 failed 
> on local exception: java.io.IOException: Can not send request because relogin 
> is in progress.^
>  ^Tue Feb 19 13:07:54 CST 2019, 
> RpcRetryingCaller\{globalStartTime=1550552873361, pause=100, maxAttempts=36}, 
> java.io.IOException: Call to test-dmp5.fengdai.org/10.200.162.26:16020 failed 
> on local exception: java.io.IOException: Can not send request because relogin 
> is in progress.^
>  ^Tue Feb 19 13:07:54 CST 2019, 
> RpcRetryingCaller\{globalStartTime=1550552873361, pause=100, maxAttempts=36}, 
> java.io.IOException: Call to test-dmp5.fengdai.org/10.200.162.26:16020 failed 
> on local exception: java.io.IOException: Can not send request because relogin 
> is in progress.^
>  ^Tue Feb 19 13:07:55 CST 2019, 
> RpcRetryingCaller\{globalStartTime=1550552873361, pause=100, maxAttempts=36}, 
> java.io.IOException: Call to test-dmp5.fengdai.org/10.200.162.26:16020 failed 
> on local exception: java.io.IOException: Can not send request because relogin 
> is in progress.^
>  ^Tue Feb 19 13:07:57 CST 2019, 
> RpcRetryingCaller\{globalStartTime=1550552873361, pause=100, maxAttempts=36}, 
> java.io.IOException: Call to test-dmp5.fengdai.org/10.200.162.26:16020 failed 
> on local exception: java.io.IOException: Can not send request because relogin 
> is in progress.^
>  ^Tue Feb 19 13:08:01 CST 2019, 
> RpcRetryingCaller\{globalStartTime=1550552873361, pause=100, maxAttempts=36}, 
> java.io.IOException: Call to test-dmp5.fengdai.org/10.200.162.26:16020 failed 
> on local exception: javax.security.sasl.SaslException: GSS initiate failed 
> [Caused by GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos tgt)]^
>  ^Tue Feb 19 13:08:11 CST 2019, 
> RpcRetryingCaller\{globalStartTime=1550552873361, pause=100, maxAttempts=36}, 
> java.io.IOException: Call to test-dmp5.fengdai.org/10.200.162.26:16020 failed 
> on local exception: javax.security.sasl.SaslException: GSS initiate failed 
> [Caused by GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos tgt)]^
>  ^Tue Feb 19 13:08:21 CST 2019, 
> RpcRetryingCaller\{globalStartTime=1550552873361, pause=100, maxAttempts=36}, 
> java.io.IOException: Call to test-dmp5.fengdai.org/10.200.162.26:16020 failed 
> on local exception: javax.security.sasl.SaslException: GSS initiate failed 
> [Caused by GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos tgt)]^
>  ^Tue Feb 19 13:08:31 CST 2019, 
> RpcRetryingCaller\{globalStartTime=1550552873361, pause=100, maxAttempts=36}, 
> 

[jira] [Updated] (PHOENIX-5147) Add an option to disable spooling ( SORT MERGE strategy in QueryCompiler )

2019-02-19 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated PHOENIX-5147:
-
Attachment: PHOENIX-5147.4.x-HBase-1.3.002.patch

> Add an option to disable spooling ( SORT MERGE strategy in QueryCompiler )
> --
>
> Key: PHOENIX-5147
> URL: https://issues.apache.org/jira/browse/PHOENIX-5147
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.15.0
>Reporter: Xu Cang
>Assignee: Xu Cang
>Priority: Major
> Attachments: PHOENIX-5147.4.x-HBase-1.3.001.patch, 
> PHOENIX-5147.4.x-HBase-1.3.002.patch
>
>
> We should add an option to allow database admin to disable using spooling 
> from the server side. 
> Especially before PHOENIX-5135 is fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (PHOENIX-5147) Add an option to disable spooling ( SORT MERGE strategy in QueryCompiler )

2019-02-19 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang reassigned PHOENIX-5147:


Assignee: Xu Cang

> Add an option to disable spooling ( SORT MERGE strategy in QueryCompiler )
> --
>
> Key: PHOENIX-5147
> URL: https://issues.apache.org/jira/browse/PHOENIX-5147
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.15.0
>Reporter: Xu Cang
>Assignee: Xu Cang
>Priority: Major
>
> We should add an option to allow database admin to disable using spooling 
> from the server side. 
> Especially before PHOENIX-5135 is fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-5147) Add an option to disable spooling ( SORT MERGE strategy in QueryCompiler )

2019-02-19 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated PHOENIX-5147:
-
Summary: Add an option to disable spooling ( SORT MERGE strategy in 
QueryCompiler )  (was: Add an option to disable spooling)

> Add an option to disable spooling ( SORT MERGE strategy in QueryCompiler )
> --
>
> Key: PHOENIX-5147
> URL: https://issues.apache.org/jira/browse/PHOENIX-5147
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.15.0
>Reporter: Xu Cang
>Priority: Major
>
> We should add an option to allow database admin to disable using spooling 
> from the server side. 
> Especially before PHOENIX-5135 is fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PHOENIX-5147) Add an option to disable spooling

2019-02-19 Thread Xu Cang (JIRA)
Xu Cang created PHOENIX-5147:


 Summary: Add an option to disable spooling
 Key: PHOENIX-5147
 URL: https://issues.apache.org/jira/browse/PHOENIX-5147
 Project: Phoenix
  Issue Type: Improvement
Affects Versions: 4.15.0
Reporter: Xu Cang


We should add an option to allow database admin to disable using spooling from 
the server side. 
Especially before PHOENIX-5135 is fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (PHOENIX-5144) C++ JDBC Driver

2019-02-19 Thread yinghua_zh (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yinghua_zh closed PHOENIX-5144.
---

> C++ JDBC Driver
> ---
>
> Key: PHOENIX-5144
> URL: https://issues.apache.org/jira/browse/PHOENIX-5144
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.14.1
>Reporter: yinghua_zh
>Priority: Major
>
> Can you provide a C++ version of JDBC driver? 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-5146) Phoenix missing class definition: java.lang.NoClassDefFoundError: org/apache/phoenix/shaded/org/apache/http/Consts

2019-02-19 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated PHOENIX-5146:

Description: 
While running a SparkCompatibility check for Phoniex hitting this issue:
{noformat}
2019-02-15 09:03:38,470|INFO|MainThread|machine.py:169 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|RUNNING: echo "
 import org.apache.spark.graphx._;
 import org.apache.phoenix.spark._;
 val rdd = sc.phoenixTableAsRDD(\"EMAIL_ENRON\", Seq(\"MAIL_FROM\", 
\"MAIL_TO\"), 
zkUrl=Some(\"huaycloud012.l42scl.hortonworks.com:2181:/hbase-secure\"));
 val rawEdges = rdd.map

{ e => (e(\"MAIL_FROM\").asInstanceOf[VertexId], 
e(\"MAIL_TO\").asInstanceOf[VertexId])}

;
 val graph = Graph.fromEdgeTuples(rawEdges, 1.0);
 val pr = graph.pageRank(0.001);
 pr.vertices.saveToPhoenix(\"EMAIL_ENRON_PAGERANK\", Seq(\"ID\", \"RANK\"), 
zkUrl = Some(\"huaycloud012.l42scl.hortonworks.com:2181:/hbase-secure\"));
 " | spark-shell --master yarn --jars 
/usr/hdp/current/hadoop-client/lib/hadoop-lzo-0.6.0.3.1.0.0-75.jar 
--properties-file 
/grid/0/log/cluster/run_phoenix_secure_ha_all_1/artifacts/spark_defaults.conf 
2>&1 | tee 
/grid/0/log/cluster/run_phoenix_secure_ha_all_1/artifacts/Spark_clientLogs/phoenix-spark.txt
 2019-02-15 09:03:38,488|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|SPARK_MAJOR_VERSION is set to 
2, using Spark2
 2019-02-15 09:03:39,901|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|SLF4J: Class path contains 
multiple SLF4J bindings.
 2019-02-15 09:03:39,902|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|SLF4J: Found binding in 
[jar:file:/usr/hdp/3.1.0.0-75/phoenix/phoenix-5.0.0.3.1.0.0-75-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 2019-02-15 09:03:39,902|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|SLF4J: Found binding in 
[jar:file:/usr/hdp/3.1.0.0-75/spark2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 2019-02-15 09:03:39,902|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|SLF4J: See 
[http://www.slf4j.org/codes.html#multiple_bindings] for an explanation.
 2019-02-15 09:03:41,400|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|Setting default log level to 
"WARN".
 2019-02-15 09:03:41,400|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|To adjust logging level use 
sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
 2019-02-15 09:03:54,837|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84{color:#ff}*|java.lang.NoClassDefFoundError:
 org/apache/phoenix/shaded/org/apache/http/Consts*{color}
 2019-02-15 09:03:54,838|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|at 
org.apache.phoenix.shaded.org.apache.http.client.utils.URIBuilder.digestURI(URIBuilder.java:181)
 2019-02-15 09:03:54,839|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|at 
org.apache.phoenix.shaded.org.apache.http.client.utils.URIBuilder.(URIBuilder.java:82)
 2019-02-15 09:03:54,839|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider.createURL(KMSClientProvider.java:468)
 2019-02-15 09:03:54,839|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider.getDelegationToken(KMSClientProvider.java:1023)
 2019-02-15 09:03:54,840|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|at 
org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$1.call(LoadBalancingKMSClientProvider.java:252)
 2019-02-15 09:03:54,840|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|at 
org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$1.call(LoadBalancingKMSClientProvider.java:249)
 2019-02-15 09:03:54,840|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|at 
org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:172)
 2019-02-15 09:03:54,841|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|at 
org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.getDelegationToken(LoadBalancingKMSClientProvider.java:249)
 2019-02-15 09:03:54,841|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|at 
org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(DelegationTokenIssuer.java:95)
 2019-02-15 09:03:54,841|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|at 

Re: Integration Testing Requirements

2019-02-19 Thread Ankit Singhal
you can check if HADOOP_HOME and JAVA_HOME are properly set in the
environment.

On Tue, Feb 19, 2019 at 11:23 AM William Shen 
wrote:

> Hi everyone,
>
> I'm trying to set up the Jenkins job at work to build Phoenix and run the
> integration tests. However, repeatedly I encounter issues with the hive
> module when I run mvn verify. Does the hive integration tests require any
> special set up for them to pass? The other modules passed integration
> testing successfully.
>
> Attaching below is a sample failure trace.
>
> Thanks!
>
> - Will
>
> [ERROR] Tests run: 6, Failures: 6, Errors: 0, Skipped: 0, Time
> elapsed: 48.302 s <<< FAILURE! - in
> org.apache.phoenix.hive.HiveTezIT[ERROR]
> simpleColumnMapTest(org.apache.phoenix.hive.HiveTezIT)  Time elapsed:
> 6.727 s  <<< FAILURE!junit.framework.AssertionFailedError:
> Unexpected exception java.lang.RuntimeException:
> org.apache.tez.dag.api.SessionNotRunning: TezSession has already
> shutdown. Application application_1550371508120_0001 failed 2 times
> due to AM Container for appattempt_1550371508120_0001_02 exited
> with  exitCode: 127
> For more detailed output, check application tracking
> page:
> http://40a4dd0e8959:38843/cluster/app/application_1550371508120_0001Then,
> click on links to logs of each attempt.
> Diagnostics: Exception from container-launch.
> Container id: container_1550371508120_0001_02_01
> Exit code: 127
> Stack trace: ExitCodeException exitCode=127:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
> at org.apache.hadoop.util.Shell.run(Shell.java:456)
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
> at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
>
>
> Container exited with a non-zero exit code 127
> Failing this attempt. Failing the application.
> at
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:535)
> at
> org.apache.phoenix.hive.HiveTestUtil.cliInit(HiveTestUtil.java:637)
> at
> org.apache.phoenix.hive.HiveTestUtil.cliInit(HiveTestUtil.java:590)
> at
> org.apache.phoenix.hive.BaseHivePhoenixStoreIT.runTest(BaseHivePhoenixStoreIT.java:117)
> at
> org.apache.phoenix.hive.HivePhoenixStoreIT.simpleColumnMapTest(HivePhoenixStoreIT.java:103)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at org.junit.runners.Suite.runChild(Suite.java:128)
> at org.junit.runners.Suite.runChild(Suite.java:27)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at 

Integration Testing Requirements

2019-02-19 Thread William Shen
Hi everyone,

I'm trying to set up the Jenkins job at work to build Phoenix and run the
integration tests. However, repeatedly I encounter issues with the hive
module when I run mvn verify. Does the hive integration tests require any
special set up for them to pass? The other modules passed integration
testing successfully.

Attaching below is a sample failure trace.

Thanks!

- Will

[ERROR] Tests run: 6, Failures: 6, Errors: 0, Skipped: 0, Time
elapsed: 48.302 s <<< FAILURE! - in
org.apache.phoenix.hive.HiveTezIT[ERROR]
simpleColumnMapTest(org.apache.phoenix.hive.HiveTezIT)  Time elapsed:
6.727 s  <<< FAILURE!junit.framework.AssertionFailedError:
Unexpected exception java.lang.RuntimeException:
org.apache.tez.dag.api.SessionNotRunning: TezSession has already
shutdown. Application application_1550371508120_0001 failed 2 times
due to AM Container for appattempt_1550371508120_0001_02 exited
with  exitCode: 127
For more detailed output, check application tracking
page:http://40a4dd0e8959:38843/cluster/app/application_1550371508120_0001Then,
click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1550371508120_0001_02_01
Exit code: 127
Stack trace: ExitCodeException exitCode=127:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)


Container exited with a non-zero exit code 127
Failing this attempt. Failing the application.
at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:535)
at org.apache.phoenix.hive.HiveTestUtil.cliInit(HiveTestUtil.java:637)
at org.apache.phoenix.hive.HiveTestUtil.cliInit(HiveTestUtil.java:590)
at 
org.apache.phoenix.hive.BaseHivePhoenixStoreIT.runTest(BaseHivePhoenixStoreIT.java:117)
at 
org.apache.phoenix.hive.HivePhoenixStoreIT.simpleColumnMapTest(HivePhoenixStoreIT.java:103)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runners.Suite.runChild(Suite.java:128)
at org.junit.runners.Suite.runChild(Suite.java:27)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.apache.maven.surefire.junitcore.JUnitCore.run(JUnitCore.java:55)
at 

[jira] [Updated] (PHOENIX-5018) Index mutations created by UPSERT SELECT will have wrong timestamps

2019-02-19 Thread Kadir OZDEMIR (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kadir OZDEMIR updated PHOENIX-5018:
---
Attachment: PHOENIX-5018.master.004.patch

> Index mutations created by UPSERT SELECT will have wrong timestamps
> ---
>
> Key: PHOENIX-5018
> URL: https://issues.apache.org/jira/browse/PHOENIX-5018
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.14.0, 5.0.0
>Reporter: Geoffrey Jacoby
>Assignee: Kadir OZDEMIR
>Priority: Major
> Attachments: PHOENIX-5018.4.x-HBase-1.3.001.patch, 
> PHOENIX-5018.4.x-HBase-1.3.002.patch, PHOENIX-5018.4.x-HBase-1.4.001.patch, 
> PHOENIX-5018.4.x-HBase-1.4.002.patch, PHOENIX-5018.master.001.patch, 
> PHOENIX-5018.master.002.patch, PHOENIX-5018.master.003.patch, 
> PHOENIX-5018.master.004.patch
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> When doing a full rebuild (or initial async build) of a local or global index 
> using IndexTool and PhoenixIndexImportDirectMapper, or doing a synchronous 
> initial build of a global index using the index create DDL, we generate the 
> index mutations by using an UPSERT SELECT query from the base table to the 
> index.
> The timestamps of the mutations use the default HBase behavior, which is to 
> take the current wall clock. However, the timestamp of an index KeyValue 
> should use the timestamp of the initial KeyValue in the base table.
> Having base table and index timestamps out of sync can cause all sorts of 
> weird side effects, such as if the base table has data with an expired TTL 
> that isn't expired in the index yet. Also inserting old mutations with new 
> timestamps may overwrite the data that has been newly overwritten by the 
> regular data path during index build, which would lead to data loss and 
> inconsistency issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-5018) Index mutations created by UPSERT SELECT will have wrong timestamps

2019-02-19 Thread Kadir OZDEMIR (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kadir OZDEMIR updated PHOENIX-5018:
---
Attachment: PHOENIX-5018.4.x-HBase-1.4.002.patch

> Index mutations created by UPSERT SELECT will have wrong timestamps
> ---
>
> Key: PHOENIX-5018
> URL: https://issues.apache.org/jira/browse/PHOENIX-5018
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.14.0, 5.0.0
>Reporter: Geoffrey Jacoby
>Assignee: Kadir OZDEMIR
>Priority: Major
> Attachments: PHOENIX-5018.4.x-HBase-1.3.001.patch, 
> PHOENIX-5018.4.x-HBase-1.3.002.patch, PHOENIX-5018.4.x-HBase-1.4.001.patch, 
> PHOENIX-5018.4.x-HBase-1.4.002.patch, PHOENIX-5018.master.001.patch, 
> PHOENIX-5018.master.002.patch, PHOENIX-5018.master.003.patch, 
> PHOENIX-5018.master.004.patch
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> When doing a full rebuild (or initial async build) of a local or global index 
> using IndexTool and PhoenixIndexImportDirectMapper, or doing a synchronous 
> initial build of a global index using the index create DDL, we generate the 
> index mutations by using an UPSERT SELECT query from the base table to the 
> index.
> The timestamps of the mutations use the default HBase behavior, which is to 
> take the current wall clock. However, the timestamp of an index KeyValue 
> should use the timestamp of the initial KeyValue in the base table.
> Having base table and index timestamps out of sync can cause all sorts of 
> weird side effects, such as if the base table has data with an expired TTL 
> that isn't expired in the index yet. Also inserting old mutations with new 
> timestamps may overwrite the data that has been newly overwritten by the 
> regular data path during index build, which would lead to data loss and 
> inconsistency issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-5018) Index mutations created by UPSERT SELECT will have wrong timestamps

2019-02-19 Thread Kadir OZDEMIR (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kadir OZDEMIR updated PHOENIX-5018:
---
Attachment: PHOENIX-5018.4.x-HBase-1.3.002.patch

> Index mutations created by UPSERT SELECT will have wrong timestamps
> ---
>
> Key: PHOENIX-5018
> URL: https://issues.apache.org/jira/browse/PHOENIX-5018
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.14.0, 5.0.0
>Reporter: Geoffrey Jacoby
>Assignee: Kadir OZDEMIR
>Priority: Major
> Attachments: PHOENIX-5018.4.x-HBase-1.3.001.patch, 
> PHOENIX-5018.4.x-HBase-1.3.002.patch, PHOENIX-5018.4.x-HBase-1.4.001.patch, 
> PHOENIX-5018.4.x-HBase-1.4.002.patch, PHOENIX-5018.master.001.patch, 
> PHOENIX-5018.master.002.patch, PHOENIX-5018.master.003.patch, 
> PHOENIX-5018.master.004.patch
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> When doing a full rebuild (or initial async build) of a local or global index 
> using IndexTool and PhoenixIndexImportDirectMapper, or doing a synchronous 
> initial build of a global index using the index create DDL, we generate the 
> index mutations by using an UPSERT SELECT query from the base table to the 
> index.
> The timestamps of the mutations use the default HBase behavior, which is to 
> take the current wall clock. However, the timestamp of an index KeyValue 
> should use the timestamp of the initial KeyValue in the base table.
> Having base table and index timestamps out of sync can cause all sorts of 
> weird side effects, such as if the base table has data with an expired TTL 
> that isn't expired in the index yet. Also inserting old mutations with new 
> timestamps may overwrite the data that has been newly overwritten by the 
> regular data path during index build, which would lead to data loss and 
> inconsistency issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (PHOENIX-5068) Autocommit off is not working as expected might be a bug!?

2019-02-19 Thread Xinyi Yan (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyi Yan resolved PHOENIX-5068.

Resolution: Duplicate

> Autocommit off is not working as expected might be a bug!?
> --
>
> Key: PHOENIX-5068
> URL: https://issues.apache.org/jira/browse/PHOENIX-5068
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Amarnath Ramamoorthi
>Assignee: Xinyi Yan
>Priority: Minor
> Attachments: test_foo_data.sql
>
>
> Autocommit off is working strange might be a bug!?
> Here is what we found when using autocommit off.
> A table has only 2 int columns and both set as primary key, containing 100 
> rows in total.
> On *"autocommit off"* when we try to upsert values in to same table, it says 
> 200 rows affected.
> Works fine when we run the same Upsert command but with less than 100 rows 
> using WHERE command as you can see below.
> There is something wrong with auto commit off with >= 100 rows upsert`s.
> {code:java}
> 0: jdbc:phoenix:XXYYZZ> select count(*) from "FOO".DEMO;
> +---+
> | COUNT(1)  |
> +---+
> | 100   |
> +---+
> 1 row selected (0.025 seconds)
> 0: jdbc:phoenix:XXYYZZ> SELECT * FROM "FOO".DEMO WHERE "id_x"=9741;
> ++---+
> | id_x  |   id_y   |
> ++---+
> | 9741   | 63423770  |
> ++---+
> 1 row selected (0.04 seconds)
> 0: jdbc:phoenix:XXYYZZ> !autocommit off
> Autocommit status: false
> 0: jdbc:phoenix:XXYYZZ> UPSERT INTO "FOO".DEMO SELECT * FROM "FOO".DEMO;
> 200 rows affected (0.023 seconds)
> 0: jdbc:phoenix:XXYYZZ> 
> 0: jdbc:phoenix:XXYYZZ> UPSERT INTO "FOO".DEMO SELECT * FROM "FOO".DEMO WHERE 
> "id_x"=9741;
> 1 row affected (0.014 seconds)
> 0: jdbc:phoenix:XXYYZZ> UPSERT INTO "FOO".DEMO SELECT * FROM "FOO".DEMO WHERE 
> "id_x"!=9741;
> 99 rows affected (0.045 seconds)
> 0: jdbc:phoenix:XXYYZZ>
> 0: jdbc:phoenix:XXYYZZ> !autocommit on
> Autocommit status: true
> 0: jdbc:phoenix:XXYYZZ> UPSERT INTO "FOO".DEMO SELECT * FROM "FOO".DEMO;
> 100 rows affected (0.065 seconds)
> {code}
> Tested once again, but now select from different table
> {code:java}
> 0: jdbc:phoenix:XXYYZZ> !autocommit off
> Autocommit status: false
> 0: jdbc:phoenix:XXYYZZ> UPSERT INTO "FOO".DEMO SELECT * FROM "FOO".TEST limit 
> 100;
> 200 rows affected (0.052 seconds)
> 0: jdbc:phoenix:XXYYZZ> UPSERT INTO "FOO".DEMO SELECT * FROM "FOO".TEST limit 
> 99;
> 99 rows affected (0.029 seconds)
> 0: jdbc:phoenix:XXYYZZ> UPSERT INTO "FOO".DEMO SELECT * FROM "FOO".TEST limit 
> 500;
> 1,000 rows affected (0.041 seconds)
> {code}
> Still the same, It shows the rows affected is 1,000 even though we have it 
> limited to 500. It keeps doubling up.
> Would be really helpful if someone could help on this please.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4925) Use Segment tree to organize Guide Post Info

2019-02-19 Thread Bin Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bin Shi updated PHOENIX-4925:
-
Description: 
As reported, Query compilation (for the sample queries showed below), 
especially deriving estimation and generating parallel scans from guide posts, 
becomes much slower after we introduced Phoenix Stats. 
 a. SELECT f1__c FROM MyCustomBigObject__b ORDER BY Pk1__c
 b. SELECT f1__c FROM MyCustomBigObject__b WHERE nonpk1__c = ‘x’ ORDER BY Pk1__c
 c. SELECT f1__c FROM MyCustomBigObject__b WHERE pk2__c = ‘x’ ORDER BY 
pk1__c,pk2__c
 d. SELECT f1__c FROM MyCustomBigObject__b WHERE pk1__c = ‘x’ AND nonpk1__c 
ORDER BY pk1__c,pk2__c
 e. SELECT f1__c FROM MyCustomBigObject__b WHERE pk__c >= 'd' AND pk__c <= 'm' 
OR pk__c >= 'o' AND pk__c <= 'x' ORDER BY pk__c // pk__c is the only column to 
make the primary key.
  
 By using prefix encoding for guide post info, we have to decode and traverse 
guide posts sequentially, which causes time complexity in 
BaseResultIterators.getParallelScan(...) to be O( n ) , where n is the total 
count of guide posts.

According to PHOENIX-2417, to reduce footprint in client cache and over 
transmition, the prefix encoding is used as in-memory and over-the-wire 
encoding for guide post info.

We can use Segment Tree to address both memory and performance concerns. The 
guide posts are partitioned to k chunks (k=1024?), each chunk is encoded by 
prefix encoding and the encoded data is a leaf node of the tree. The inner node 
contains summary info (the count of rows, the data size) of the sub tree rooted 
at the inner node.

With this tree like data structure, compared to the current data structure, the 
increased size (mainly coming from the n/k-1 inner nodes) is ignorable. The 
time complexity for queries a, b, c can be reduced to O(m) where m is the total 
count of regions; the time complexity for "EXPLAN" queries a, b, c can be 
reduced to O(m) too, and if we support "EXPLAIN (ESTIMATE ONLY)", it can even 
be reduced to O(1). For queries d and e, the time complexity to find the start 
of target scan ranges can be reduced to O(log(n/k)).

The tree can also integrate AVL and B+ characteristics to support partial 
load/unload when interacting with stats client cache.

 

  was:
As reported, Query compilation (for the sample queries showed below), 
especially deriving estimation and generating parallel scans from guide posts, 
becomes much slower after we introduced Phoenix Stats. 
 a. SELECT f1__c FROM MyCustomBigObject__b ORDER BY Pk1__c
 b. SELECT f1__c FROM MyCustomBigObject__b WHERE nonpk1__c = ‘x’ ORDER BY Pk1__c
 c. SELECT f1__c FROM MyCustomBigObject__b WHERE pk2__c = ‘x’ ORDER BY 
pk1__c,pk2__c
 d. SELECT f1__c FROM MyCustomBigObject__b WHERE pk1__c = ‘x’ AND nonpk1__c 
ORDER BY pk1__c,pk2__c
 e. SELECT f1__c FROM MyCustomBigObject__b WHERE pk__c >= 'd' AND pk__c <= 'm' 
OR pk__c >= 'o' AND pk__c <= 'x' ORDER BY pk__c // pk__c is the only column to 
make the primary key.
  
 By using prefix encoding for guide post info, we have to decode and traverse 
guide posts sequentially, which causes time complexity in 
BaseResultIterators.getParallelScan(...) to be O(n) , where n is the total 
count of guide posts.

According to PHOENIX-2417, to reduce footprint in client cache and over 
transmition, the prefix encoding is used as in-memory and over-the-wire 
encoding for guide post info.

We can use Segment Tree to address both memory and performance concerns. The 
guide posts are partitioned to k chunks (k=1024?), each chunk is encoded by 
prefix encoding and the encoded data is a leaf node of the tree. The inner node 
contains summary info (the count of rows, the data size) of the sub tree rooted 
at the inner node.

With this tree like data structure, compared to the current data structure, the 
increased size (mainly coming from the n/k-1 inner nodes) is ignorable. The 
time complexity for queries a, b, c can be reduced to O(m) where m is the total 
count of regions; the time complexity for "EXPLAN" queries a, b, c can be 
reduced to O(m) too, and if we support "EXPLAIN (ESTIMATE ONLY)", it can even 
be reduced to O(1). For queries d and e, the time complexity to find the start 
of target scan ranges can be reduced to O(log(n/k)).

The tree can also integrate AVL and B+ characteristics to support partial 
load/unload when interacting with stats client cache.

 


> Use Segment tree to organize Guide Post Info
> 
>
> Key: PHOENIX-4925
> URL: https://issues.apache.org/jira/browse/PHOENIX-4925
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Bin Shi
>Assignee: Bin Shi
>Priority: Major
>
> As reported, Query compilation (for the sample queries showed below), 
> especially deriving estimation and generating parallel scans from guide 
> posts, 

[jira] [Updated] (PHOENIX-4925) Use Segment tree to organize Guide Post Info

2019-02-19 Thread Bin Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bin Shi updated PHOENIX-4925:
-
Summary: Use Segment tree to organize Guide Post Info  (was: Use 
Segment/SUM tree to organize Guide Post Info)

> Use Segment tree to organize Guide Post Info
> 
>
> Key: PHOENIX-4925
> URL: https://issues.apache.org/jira/browse/PHOENIX-4925
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Bin Shi
>Assignee: Bin Shi
>Priority: Major
>
> As reported, Query compilation (for the sample queries showed below), 
> especially deriving estimation and generating parallel scans from guide 
> posts, becomes much slower after we introduced Phoenix Stats. 
>  a. SELECT f1__c FROM MyCustomBigObject__b ORDER BY Pk1__c
>  b. SELECT f1__c FROM MyCustomBigObject__b WHERE nonpk1__c = ‘x’ ORDER BY 
> Pk1__c
>  c. SELECT f1__c FROM MyCustomBigObject__b WHERE pk2__c = ‘x’ ORDER BY 
> pk1__c,pk2__c
>  d. SELECT f1__c FROM MyCustomBigObject__b WHERE pk1__c = ‘x’ AND nonpk1__c 
> ORDER BY pk1__c,pk2__c
>  e. SELECT f1__c FROM MyCustomBigObject__b WHERE pk__c >= 'd' AND pk__c <= 
> 'm' OR pk__c >= 'o' AND pk__c <= 'x' ORDER BY pk__c // pk__c is the only 
> column to make the primary key.
>   
>  By using prefix encoding for guide post info, we have to decode and traverse 
> guide posts sequentially, which causes time complexity in 
> BaseResultIterators.getParallelScan(...) to be O(n) , where n is the total 
> count of guide posts.
> According to PHOENIX-2417, to reduce footprint in client cache and over 
> transmition, the prefix encoding is used as in-memory and over-the-wire 
> encoding for guide post info.
> We can use something like Sum Tree (even Binary Indexed Tree) to address both 
> memory and performance concerns. The guide posts are partitioned to k chunks 
> (k=1024?), each chunk is encoded by prefix encoding and the encoded data is a 
> leaf node of the tree. The inner node contains summary info (the count of 
> rows, the data size) of the sub tree rooted at the inner node.
> With this tree like data structure, compared to the current data structure, 
> the increased size (mainly coming from the n/k-1 inner nodes) is ignorable. 
> The time complexity for queries a, b, c can be reduced to O(m) where m is the 
> total count of regions; the time complexity for "EXPLAN" queries a, b, c can 
> be reduced to O(m) too, and if we support "EXPLAIN (ESTIMATE ONLY)", it can 
> even be reduced to O(1). For queries d and e, the time complexity to find the 
> start of target scan ranges can be reduced to O(log(n/k)).
> The tree can also integrate AVL and B+ characteristics to support partial 
> load/unload when interacting with stats client cache.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4925) Use Segment tree to organize Guide Post Info

2019-02-19 Thread Bin Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bin Shi updated PHOENIX-4925:
-
Description: 
As reported, Query compilation (for the sample queries showed below), 
especially deriving estimation and generating parallel scans from guide posts, 
becomes much slower after we introduced Phoenix Stats. 
 a. SELECT f1__c FROM MyCustomBigObject__b ORDER BY Pk1__c
 b. SELECT f1__c FROM MyCustomBigObject__b WHERE nonpk1__c = ‘x’ ORDER BY Pk1__c
 c. SELECT f1__c FROM MyCustomBigObject__b WHERE pk2__c = ‘x’ ORDER BY 
pk1__c,pk2__c
 d. SELECT f1__c FROM MyCustomBigObject__b WHERE pk1__c = ‘x’ AND nonpk1__c 
ORDER BY pk1__c,pk2__c
 e. SELECT f1__c FROM MyCustomBigObject__b WHERE pk__c >= 'd' AND pk__c <= 'm' 
OR pk__c >= 'o' AND pk__c <= 'x' ORDER BY pk__c // pk__c is the only column to 
make the primary key.
  
 By using prefix encoding for guide post info, we have to decode and traverse 
guide posts sequentially, which causes time complexity in 
BaseResultIterators.getParallelScan(...) to be O(n) , where n is the total 
count of guide posts.

According to PHOENIX-2417, to reduce footprint in client cache and over 
transmition, the prefix encoding is used as in-memory and over-the-wire 
encoding for guide post info.

We can use Segment Tree to address both memory and performance concerns. The 
guide posts are partitioned to k chunks (k=1024?), each chunk is encoded by 
prefix encoding and the encoded data is a leaf node of the tree. The inner node 
contains summary info (the count of rows, the data size) of the sub tree rooted 
at the inner node.

With this tree like data structure, compared to the current data structure, the 
increased size (mainly coming from the n/k-1 inner nodes) is ignorable. The 
time complexity for queries a, b, c can be reduced to O(m) where m is the total 
count of regions; the time complexity for "EXPLAN" queries a, b, c can be 
reduced to O(m) too, and if we support "EXPLAIN (ESTIMATE ONLY)", it can even 
be reduced to O(1). For queries d and e, the time complexity to find the start 
of target scan ranges can be reduced to O(log(n/k)).

The tree can also integrate AVL and B+ characteristics to support partial 
load/unload when interacting with stats client cache.

 

  was:
As reported, Query compilation (for the sample queries showed below), 
especially deriving estimation and generating parallel scans from guide posts, 
becomes much slower after we introduced Phoenix Stats. 
 a. SELECT f1__c FROM MyCustomBigObject__b ORDER BY Pk1__c
 b. SELECT f1__c FROM MyCustomBigObject__b WHERE nonpk1__c = ‘x’ ORDER BY Pk1__c
 c. SELECT f1__c FROM MyCustomBigObject__b WHERE pk2__c = ‘x’ ORDER BY 
pk1__c,pk2__c
 d. SELECT f1__c FROM MyCustomBigObject__b WHERE pk1__c = ‘x’ AND nonpk1__c 
ORDER BY pk1__c,pk2__c
 e. SELECT f1__c FROM MyCustomBigObject__b WHERE pk__c >= 'd' AND pk__c <= 'm' 
OR pk__c >= 'o' AND pk__c <= 'x' ORDER BY pk__c // pk__c is the only column to 
make the primary key.
  
 By using prefix encoding for guide post info, we have to decode and traverse 
guide posts sequentially, which causes time complexity in 
BaseResultIterators.getParallelScan(...) to be O(n) , where n is the total 
count of guide posts.

According to PHOENIX-2417, to reduce footprint in client cache and over 
transmition, the prefix encoding is used as in-memory and over-the-wire 
encoding for guide post info.

We can use something like Sum Tree (even Binary Indexed Tree) to address both 
memory and performance concerns. The guide posts are partitioned to k chunks 
(k=1024?), each chunk is encoded by prefix encoding and the encoded data is a 
leaf node of the tree. The inner node contains summary info (the count of rows, 
the data size) of the sub tree rooted at the inner node.

With this tree like data structure, compared to the current data structure, the 
increased size (mainly coming from the n/k-1 inner nodes) is ignorable. The 
time complexity for queries a, b, c can be reduced to O(m) where m is the total 
count of regions; the time complexity for "EXPLAN" queries a, b, c can be 
reduced to O(m) too, and if we support "EXPLAIN (ESTIMATE ONLY)", it can even 
be reduced to O(1). For queries d and e, the time complexity to find the start 
of target scan ranges can be reduced to O(log(n/k)).

The tree can also integrate AVL and B+ characteristics to support partial 
load/unload when interacting with stats client cache.

 


> Use Segment tree to organize Guide Post Info
> 
>
> Key: PHOENIX-4925
> URL: https://issues.apache.org/jira/browse/PHOENIX-4925
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Bin Shi
>Assignee: Bin Shi
>Priority: Major
>
> As reported, Query compilation (for the sample queries showed below), 
> especially deriving estimation and generating 

[jira] [Updated] (PHOENIX-5137) Index Rebuilder scan increases data table region split time

2019-02-19 Thread Kiran Kumar Maturi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kiran Kumar Maturi updated PHOENIX-5137:

Description: 
[~lhofhansl] [~vincentpoon] [~tdsilva] please review

In order to differentiate between the index rebuilder retries  
(UngroupedAggregateRegionObserver.rebuildIndices()) and commits that happen in 
the loop of UngroupedAggregateRegionObserver.doPostScannerOpen() as part of  
PHOENIX-4600 blockingMemstoreSize was set to -1 for rebuildIndices;
{code:java}
commitBatchWithRetries(region, mutations, -1);{code}
blocks the region split as the check for region closing does not happen  
blockingMemstoreSize > 0
{code:java}
for (int i = 0; blockingMemstoreSize > 0 && region.getMemstoreSize() > 
blockingMemstoreSize && i < 30; i++) {
  try{
   checkForRegionClosing();
   
{code}
Plan is to have the check for region closing at least once before committing 
the batch
{code:java}
checkForRegionClosing();
for (int i = 0; blockingMemstoreSize > 0 && region.getMemstoreSize() > 
blockingMemstoreSize && i < 30; i++) {
  try{
   checkForRegionClosing();
   
{code}

Steps to reproduce 
1. Create a table with one index (startime) 
2. Add 1-2 million rows 
3. Wait till the index is active 
4. Disable the index with start time (noted in step 1) 
5. Once the rebuilder starts split data table region 

Repeat the steps again after applying the patch to check the difference.


  was:
[~lhofhansl] [~vincentpoon] [~tdsilva] please review

In order to differentiate between the index rebuilder retries  
(UngroupedAggregateRegionObserver.rebuildIndices()) and commits that happen in 
the loop of UngroupedAggregateRegionObserver.doPostScannerOpen() as part of  
PHOENIX-4600 blockingMemstoreSize was set to -1 for rebuildIndices;
{code:java}
commitBatchWithRetries(region, mutations, -1);{code}
blocks the region split as the check for region closing does not happen  
blockingMemstoreSize > 0
{code:java}
for (int i = 0; blockingMemstoreSize > 0 && region.getMemstoreSize() > 
blockingMemstoreSize && i < 30; i++) {
  try{
   checkForRegionClosing();
   
{code}
Plan is to have the check for region closing at least once before committing 
the batch
{code:java}
checkForRegionClosing();
for (int i = 0; blockingMemstoreSize > 0 && region.getMemstoreSize() > 
blockingMemstoreSize && i < 30; i++) {
  try{
   checkForRegionClosing();
   
{code}



> Index Rebuilder scan increases data table region split time
> ---
>
> Key: PHOENIX-5137
> URL: https://issues.apache.org/jira/browse/PHOENIX-5137
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.14.1
>Reporter: Kiran Kumar Maturi
>Assignee: Kiran Kumar Maturi
>Priority: Major
> Attachments: PHOENIX-5137-4.14-Hbase-1.3.01.patch, 
> PHOENIX-5137-4.14-Hbase-1.3.01.patch
>
>
> [~lhofhansl] [~vincentpoon] [~tdsilva] please review
> In order to differentiate between the index rebuilder retries  
> (UngroupedAggregateRegionObserver.rebuildIndices()) and commits that happen 
> in the loop of UngroupedAggregateRegionObserver.doPostScannerOpen() as part 
> of  PHOENIX-4600 blockingMemstoreSize was set to -1 for rebuildIndices;
> {code:java}
> commitBatchWithRetries(region, mutations, -1);{code}
> blocks the region split as the check for region closing does not happen  
> blockingMemstoreSize > 0
> {code:java}
> for (int i = 0; blockingMemstoreSize > 0 && region.getMemstoreSize() > 
> blockingMemstoreSize && i < 30; i++) {
>   try{
>checkForRegionClosing();
>
> {code}
> Plan is to have the check for region closing at least once before committing 
> the batch
> {code:java}
> checkForRegionClosing();
> for (int i = 0; blockingMemstoreSize > 0 && region.getMemstoreSize() > 
> blockingMemstoreSize && i < 30; i++) {
>   try{
>checkForRegionClosing();
>
> {code}
> Steps to reproduce 
> 1. Create a table with one index (startime) 
> 2. Add 1-2 million rows 
> 3. Wait till the index is active 
> 4. Disable the index with start time (noted in step 1) 
> 5. Once the rebuilder starts split data table region 
> Repeat the steps again after applying the patch to check the difference.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (PHOENIX-5144) C++ JDBC Driver

2019-02-19 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved PHOENIX-5144.
-
Resolution: Later

> C++ JDBC Driver
> ---
>
> Key: PHOENIX-5144
> URL: https://issues.apache.org/jira/browse/PHOENIX-5144
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.14.1
>Reporter: yinghua_zh
>Priority: Major
>
> Can you provide a C++ version of JDBC driver? 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PHOENIX-5146) Phoenix missing class definition: java.lang.NoClassDefFoundError: org/apache/phoenix/shaded/org/apache/http/Consts

2019-02-19 Thread Narendra Kumar (JIRA)
Narendra Kumar created PHOENIX-5146:
---

 Summary: Phoenix missing class definition: 
java.lang.NoClassDefFoundError: org/apache/phoenix/shaded/org/apache/http/Consts
 Key: PHOENIX-5146
 URL: https://issues.apache.org/jira/browse/PHOENIX-5146
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 5.0.0
 Environment: 3 node kerberised cluster.

Hbase 2.0.2
Reporter: Narendra Kumar


While running a SparkCompatibility check for Phoniex hitting this issue:

2019-02-15 09:03:38,470|INFO|MainThread|machine.py:169 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|RUNNING: echo "
import org.apache.spark.graphx._;
import org.apache.phoenix.spark._;
val rdd = sc.phoenixTableAsRDD(\"EMAIL_ENRON\", Seq(\"MAIL_FROM\", 
\"MAIL_TO\"), 
zkUrl=Some(\"huaycloud012.l42scl.hortonworks.com:2181:/hbase-secure\"));
val rawEdges = rdd.map

{ e => (e(\"MAIL_FROM\").asInstanceOf[VertexId], 
e(\"MAIL_TO\").asInstanceOf[VertexId])}

;
val graph = Graph.fromEdgeTuples(rawEdges, 1.0);
val pr = graph.pageRank(0.001);
pr.vertices.saveToPhoenix(\"EMAIL_ENRON_PAGERANK\", Seq(\"ID\", \"RANK\"), 
zkUrl = Some(\"huaycloud012.l42scl.hortonworks.com:2181:/hbase-secure\"));
" | spark-shell --master yarn --jars 
/usr/hdp/current/hadoop-client/lib/hadoop-lzo-0.6.0.3.1.0.0-75.jar 
--properties-file 
/grid/0/log/cluster/run_phoenix_secure_ha_all_1/artifacts/spark_defaults.conf 
2>&1 | tee 
/grid/0/log/cluster/run_phoenix_secure_ha_all_1/artifacts/Spark_clientLogs/phoenix-spark.txt
2019-02-15 09:03:38,488|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|SPARK_MAJOR_VERSION is set to 
2, using Spark2
2019-02-15 09:03:39,901|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|SLF4J: Class path contains 
multiple SLF4J bindings.
2019-02-15 09:03:39,902|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|SLF4J: Found binding in 
[jar:file:/usr/hdp/3.1.0.0-75/phoenix/phoenix-5.0.0.3.1.0.0-75-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
2019-02-15 09:03:39,902|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|SLF4J: Found binding in 
[jar:file:/usr/hdp/3.1.0.0-75/spark2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
2019-02-15 09:03:39,902|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|SLF4J: See 
[http://www.slf4j.org/codes.html#multiple_bindings] for an explanation.
2019-02-15 09:03:41,400|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|Setting default log level to 
"WARN".
2019-02-15 09:03:41,400|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|To adjust logging level use 
sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2019-02-15 09:03:54,837|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84{color:#FF}*|java.lang.NoClassDefFoundError:
 org/apache/phoenix/shaded/org/apache/http/Consts*{color}
2019-02-15 09:03:54,838|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|at 
org.apache.phoenix.shaded.org.apache.http.client.utils.URIBuilder.digestURI(URIBuilder.java:181)
2019-02-15 09:03:54,839|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|at 
org.apache.phoenix.shaded.org.apache.http.client.utils.URIBuilder.(URIBuilder.java:82)
2019-02-15 09:03:54,839|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider.createURL(KMSClientProvider.java:468)
2019-02-15 09:03:54,839|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|at 
org.apache.hadoop.crypto.key.kms.KMSClientProvider.getDelegationToken(KMSClientProvider.java:1023)
2019-02-15 09:03:54,840|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|at 
org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$1.call(LoadBalancingKMSClientProvider.java:252)
2019-02-15 09:03:54,840|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|at 
org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$1.call(LoadBalancingKMSClientProvider.java:249)
2019-02-15 09:03:54,840|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|at 
org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:172)
2019-02-15 09:03:54,841|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|at 
org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.getDelegationToken(LoadBalancingKMSClientProvider.java:249)
2019-02-15 09:03:54,841|INFO|MainThread|machine.py:184 - 
run()||GUID=1566a829-b1df-4757-8c3d-73a7fa302b84|at