[jira] [Commented] (PHOENIX-4490) Phoenix Spark Module doesn't pass in user properties to create connection

2018-02-01 Thread jifei_yang (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349747#comment-16349747
 ] 

jifei_yang commented on PHOENIX-4490:
-

So, it will work fine.

> Phoenix Spark Module doesn't pass in user properties to create connection
> -
>
> Key: PHOENIX-4490
> URL: https://issues.apache.org/jira/browse/PHOENIX-4490
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Karan Mehta
>Priority: Major
>
> Phoenix Spark module doesn't work perfectly in a Kerberos environment. This 
> is because whenever new {{PhoenixRDD}} are built, they are always built with 
> new and default properties. The following piece of code in 
> {{PhoenixRelation}} is an example. This is the class used by spark to create 
> {{BaseRelation}} before executing a scan. 
> {code}
> new PhoenixRDD(
>   sqlContext.sparkContext,
>   tableName,
>   requiredColumns,
>   Some(buildFilter(filters)),
>   Some(zkUrl),
>   new Configuration(),
>   dateAsTimestamp
> ).toDataFrame(sqlContext).rdd
> {code}
> This would work fine in most cases if the spark code is being run on the same 
> cluster as HBase, the config object will pickup properties from Class path 
> xml files. However in an external environment we should use the user provided 
> properties and merge them before creating any {{PhoenixRelation}} or 
> {{PhoenixRDD}}. As per my understanding, we should ideally provide properties 
> in {{DefaultSource#createRelation() method}}.
> An example of when this fails is, Spark tries to get the splits to optimize 
> the MR performance for loading data in the table in 
> {{PhoenixInputFormat#generateSplits()}} methods. Ideally, it should get all 
> the config parameters from the {{JobContext}} being passed, but it is 
> defaulted to {{new Configuration()}}, irrespective of what user passes in. 
> Thus it fails to create a connection.
> [~jmahonin] [~maghamraviki...@gmail.com] 
> Any ideas or advice? Let me know if I am missing anything obvious here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4490) Phoenix Spark Module doesn't pass in user properties to create connection

2018-02-01 Thread jifei_yang (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349745#comment-16349745
 ] 

jifei_yang commented on PHOENIX-4490:
-

Hi,[~karanmehta93],In the production environment, I solve the problem this way:
1, download the corresponding version of the Apache phoenix source
2, modify the package org.apache.phoenix.spark.ConfigurationUtil this class
3, in this package, add krb5.conf, you.keytab, hadoop xml file, 
log4j.properties and hbase-site.xml
4, the phoenix dependent dependencies added to the cluster 
/etc/spark/conf/classpath.txt, ensure that each submitted spark task, you can 
get the phoenix dependency package.

> Phoenix Spark Module doesn't pass in user properties to create connection
> -
>
> Key: PHOENIX-4490
> URL: https://issues.apache.org/jira/browse/PHOENIX-4490
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Karan Mehta
>Priority: Major
>
> Phoenix Spark module doesn't work perfectly in a Kerberos environment. This 
> is because whenever new {{PhoenixRDD}} are built, they are always built with 
> new and default properties. The following piece of code in 
> {{PhoenixRelation}} is an example. This is the class used by spark to create 
> {{BaseRelation}} before executing a scan. 
> {code}
> new PhoenixRDD(
>   sqlContext.sparkContext,
>   tableName,
>   requiredColumns,
>   Some(buildFilter(filters)),
>   Some(zkUrl),
>   new Configuration(),
>   dateAsTimestamp
> ).toDataFrame(sqlContext).rdd
> {code}
> This would work fine in most cases if the spark code is being run on the same 
> cluster as HBase, the config object will pickup properties from Class path 
> xml files. However in an external environment we should use the user provided 
> properties and merge them before creating any {{PhoenixRelation}} or 
> {{PhoenixRDD}}. As per my understanding, we should ideally provide properties 
> in {{DefaultSource#createRelation() method}}.
> An example of when this fails is, Spark tries to get the splits to optimize 
> the MR performance for loading data in the table in 
> {{PhoenixInputFormat#generateSplits()}} methods. Ideally, it should get all 
> the config parameters from the {{JobContext}} being passed, but it is 
> defaulted to {{new Configuration()}}, irrespective of what user passes in. 
> Thus it fails to create a connection.
> [~jmahonin] [~maghamraviki...@gmail.com] 
> Any ideas or advice? Let me know if I am missing anything obvious here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4577) make-rc.sh fails trying to copy the inline argparse into bin/

2018-02-01 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated PHOENIX-4577:

Attachment: PHOENIX-4577.patch

> make-rc.sh fails trying to copy the inline argparse into bin/
> -
>
> Key: PHOENIX-4577
> URL: https://issues.apache.org/jira/browse/PHOENIX-4577
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Critical
> Fix For: 5.0.0, 4.14.0
>
> Attachments: PHOENIX-4577.patch
>
>
> Silly little fix. Need to add a {{-r}} to the {{cp}} call.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PHOENIX-4577) make-rc.sh fails trying to copy the inline argparse into bin/

2018-02-01 Thread Josh Elser (JIRA)
Josh Elser created PHOENIX-4577:
---

 Summary: make-rc.sh fails trying to copy the inline argparse into 
bin/
 Key: PHOENIX-4577
 URL: https://issues.apache.org/jira/browse/PHOENIX-4577
 Project: Phoenix
  Issue Type: Bug
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 5.0.0, 4.14.0


Silly little fix. Need to add a {{-r}} to the {{cp}} call.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (PHOENIX-4130) Avoid server retries for mutable indexes

2018-02-01 Thread Vincent Poon (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent Poon resolved PHOENIX-4130.
---
Resolution: Fixed

> Avoid server retries for mutable indexes
> 
>
> Key: PHOENIX-4130
> URL: https://issues.apache.org/jira/browse/PHOENIX-4130
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Lars Hofhansl
>Assignee: Vincent Poon
>Priority: Major
> Fix For: 4.14.0
>
> Attachments: PHOENIX-4130.addendum.master.patch, 
> PHOENIX-4130.v1.master.patch, PHOENIX-4130.v10.master.patch, 
> PHOENIX-4130.v2.master.patch, PHOENIX-4130.v3.master.patch, 
> PHOENIX-4130.v4.master.patch, PHOENIX-4130.v5.master.patch, 
> PHOENIX-4130.v6.master.patch, PHOENIX-4130.v7.master.patch, 
> PHOENIX-4130.v8.master.patch, PHOENIX-4130.v9.master.patch
>
>
> Had some discussions with [~jamestaylor], [~samarthjain], and [~vincentpoon], 
> during which I suggested that we can possibly eliminate retry loops happening 
> at the server that cause the handler threads to be stuck potentially for 
> quite a while (at least multiple seconds to ride over common scenarios like 
> splits).
> Instead we can do the retries at the Phoenix client that.
> So:
> # The index updates are not retried on the server. (retries = 0)
> # A failed index update would set the failed index timestamp but leave the 
> index enabled.
> # Now the handler thread is done, it throws an appropriate exception back to 
> the client.
> # The Phoenix client can now retry. When those retries fail the index is 
> disabled (if the policy dictates that) and throw the exception back to its 
> caller.
> So no more waiting is needed on the server, handler threads are freed 
> immediately.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4130) Avoid server retries for mutable indexes

2018-02-01 Thread Vincent Poon (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349372#comment-16349372
 ] 

Vincent Poon commented on PHOENIX-4130:
---

Committed to master and 4.x branches

> Avoid server retries for mutable indexes
> 
>
> Key: PHOENIX-4130
> URL: https://issues.apache.org/jira/browse/PHOENIX-4130
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Lars Hofhansl
>Assignee: Vincent Poon
>Priority: Major
> Fix For: 4.14.0
>
> Attachments: PHOENIX-4130.addendum.master.patch, 
> PHOENIX-4130.v1.master.patch, PHOENIX-4130.v10.master.patch, 
> PHOENIX-4130.v2.master.patch, PHOENIX-4130.v3.master.patch, 
> PHOENIX-4130.v4.master.patch, PHOENIX-4130.v5.master.patch, 
> PHOENIX-4130.v6.master.patch, PHOENIX-4130.v7.master.patch, 
> PHOENIX-4130.v8.master.patch, PHOENIX-4130.v9.master.patch
>
>
> Had some discussions with [~jamestaylor], [~samarthjain], and [~vincentpoon], 
> during which I suggested that we can possibly eliminate retry loops happening 
> at the server that cause the handler threads to be stuck potentially for 
> quite a while (at least multiple seconds to ride over common scenarios like 
> splits).
> Instead we can do the retries at the Phoenix client that.
> So:
> # The index updates are not retried on the server. (retries = 0)
> # A failed index update would set the failed index timestamp but leave the 
> index enabled.
> # Now the handler thread is done, it throws an appropriate exception back to 
> the client.
> # The Phoenix client can now retry. When those retries fail the index is 
> disabled (if the policy dictates that) and throw the exception back to its 
> caller.
> So no more waiting is needed on the server, handler threads are freed 
> immediately.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4231) Support restriction of remote UDF load sources

2018-02-01 Thread Chinmay Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349314#comment-16349314
 ] 

Chinmay Kulkarni commented on PHOENIX-4231:
---

When users execute a _CREATE FUNCTION_ statement, here are the following 
options based on the UDF's jar path:
 # If the user does not provide the location of the jar file with 'using jar', 
the DynamicClassLoader will automatically try to load the class from the 
hbase.dynamic.jars.dir directory.
 # As of now, if the user provides an HDFS URI path for the jar file with 
'using jar ', the DynamicClassLoader will try to load the 
class from this location, without restricting the search to within 
hbase.dynamic.jars.dir. 
 # If the user wishes to load a local jar, he/she is expected to manually load 
the jar onto any HDFS filesystem reachable on the network (any such location, 
not necessarily restricted to hbase.dynamic.jars.dir) and follow step 2.
 # If the user calls 'add jar ' prior to the _CREATE FUNCTION_ 
statement, his/her local jar will be automatically copied to the 
hbase.dynamic.jars.dir directory. Note that this only applies to a local jar, 
not any jar on the HDFS. In this case, the user can follow step 1 or 2.

To support this feature, we can do the following:
 # Allow the user to 'add jar ' so their jar to be loaded 
is inside the hbase.dynamic.jars.dir directory. Currently, we only allow local 
jars to be added.
 # Load the class only from the hbase.dynamic.jars.dir directory, handling URIs 
without scheme and authority carefully.

Any suggestions or comments [~apurtell] [~jamestaylor]? 

 

> Support restriction of remote UDF load sources 
> ---
>
> Key: PHOENIX-4231
> URL: https://issues.apache.org/jira/browse/PHOENIX-4231
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Andrew Purtell
>Assignee: Chinmay Kulkarni
>Priority: Major
>
> When allowUserDefinedFunctions is true, users can load UDFs remotely via a 
> jar file from any HDFS filesystem reachable on the network. The setting 
> hbase.dynamic.jars.dir can be used to restrict locations for jar loading but 
> is only applied to jars loaded from the local filesystem.  We should 
> implement support for similar restriction via configuration for jars loaded 
> via hdfs:// URIs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4575) Phoenix metadata KEEP_DELETED_CELLS and VERSIONS should be property driven

2018-02-01 Thread Mujtaba Chohan (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349204#comment-16349204
 ] 

Mujtaba Chohan commented on PHOENIX-4575:
-

[~jamestaylor] FYI attached v2 patch.

> Phoenix metadata KEEP_DELETED_CELLS and VERSIONS should be property driven
> --
>
> Key: PHOENIX-4575
> URL: https://issues.apache.org/jira/browse/PHOENIX-4575
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: Mujtaba Chohan
>Assignee: Mujtaba Chohan
>Priority: Major
> Attachments: PHOENIX-4575.patch, PHOENIX-4575_v2.patch
>
>
> This is to cater for circumstances where we need to alter state of 
> KEEP_DELETED_CELLS/VERSION on Phoenix meta tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4575) Phoenix metadata KEEP_DELETED_CELLS and VERSIONS should be property driven

2018-02-01 Thread Mujtaba Chohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mujtaba Chohan updated PHOENIX-4575:

Attachment: PHOENIX-4575_v2.patch

> Phoenix metadata KEEP_DELETED_CELLS and VERSIONS should be property driven
> --
>
> Key: PHOENIX-4575
> URL: https://issues.apache.org/jira/browse/PHOENIX-4575
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: Mujtaba Chohan
>Assignee: Mujtaba Chohan
>Priority: Major
> Attachments: PHOENIX-4575.patch, PHOENIX-4575_v2.patch
>
>
> This is to cater for circumstances where we need to alter state of 
> KEEP_DELETED_CELLS/VERSION on Phoenix meta tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4503) Phoenix-Spark plugin doesn't release zookeeper connections

2018-02-01 Thread Karan Mehta (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349112#comment-16349112
 ] 

Karan Mehta commented on PHOENIX-4503:
--

Thanks [~snalap...@dataken.net] for the confirmation! 

If possible, could you also confirm with 4.x-HBase-1.3 branch?

> Phoenix-Spark plugin doesn't release zookeeper connections
> --
>
> Key: PHOENIX-4503
> URL: https://issues.apache.org/jira/browse/PHOENIX-4503
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.11.0
> Environment: HBase 1.2 on Linux (Ubuntu, CentOS)
>Reporter: Suhas Nalapure
>Priority: Major
>
> *1. Phoenix-Spark plugin doesn't release zookeeper connections*
> Example: 
>   
> {code:java}
> for(int i=0; i < 50; i++){
>   Dataset df = 
> sqlContext.read().format("org.apache.phoenix.spark")
>   .option("table", 
> "\"Sales\"").option("zkUrl", "localhost:2181")
>   .load();
>   df.show(2);
>   }
>   Thread.sleep(1000*60); 
> {code}
>
>  When the above snippet is executed, we can see number of connections to 2181 
> increasing and not getting released until after the main thread wakes up from 
> sleep and program ends as can be seen below (14 is the number of connections 
> even before the program starts to run) :
> netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 16:52:05
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 22
> 16:52:15
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 38
> 16:52:18
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 68
> 16:52:23
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 100
> 16:52:27
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 116
> 16:52:32
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 116
> 16:52:38
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 116
> 16:52:52
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 116
> 16:53:00
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 116
> 16:53:24
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 16:53:32
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 16:53:34
> root@user1 ~ $
> *2. Instead if "jdbc" format is used to create Spark Dataframe, the 
> connection count doesn't shoot up*
> Example:
>   
> {code:java}
> for(int i=0; i < 50; i++){
>   Dataset df = sqlContext.read().format("jdbc")
>   .option("url", 
> "jdbc:phoenix:localhost:2181")
>   .option("dbtable", "\"Sales\"")
>   .option("driver", 
> "org.apache.phoenix.jdbc.PhoenixDriver")
>   .load();
>   df.show(2);
>   }
>   Thread.sleep(1000*60);  
> {code}
>   
> Connection counts during program execution(14 being the count before 
> execution starts):
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 17:00:42
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 17:00:43
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:00:46
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:00:50
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:00:55
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:12
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:18
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:28
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:34
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:37
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:39
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 17:02:07



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4565) IndexScrutinyToolIT is failing

2018-02-01 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348944#comment-16348944
 ] 

Josh Elser commented on PHOENIX-4565:
-

Ignore'd the wrong IndexScrutinyIT :)

> IndexScrutinyToolIT is failing
> --
>
> Key: PHOENIX-4565
> URL: https://issues.apache.org/jira/browse/PHOENIX-4565
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Josh Elser
>Priority: Critical
> Fix For: 5.0.0
>
> Attachments: PHOENIX-4565.2.patch, PHOENIX-4565.patch
>
>
> {noformat}
> [ERROR] 
> testScrutinyWhileTakingWrites[0](org.apache.phoenix.end2end.IndexScrutinyToolIT)
>   Time elapsed: 12.494 s  <<< FAILURE!
> java.lang.AssertionError: expected:<1000> but was:<996>
>     at 
> org.apache.phoenix.end2end.IndexScrutinyToolIT.testScrutinyWhileTakingWrites(IndexScrutinyToolIT.java:253)
> [ERROR] 
> testScrutinyWhileTakingWrites[1](org.apache.phoenix.end2end.IndexScrutinyToolIT)
>   Time elapsed: 7.437 s  <<< FAILURE!
> java.lang.AssertionError: expected:<1000> but was:<997>
>     at 
> org.apache.phoenix.end2end.IndexScrutinyToolIT.testScrutinyWhileTakingWrites(IndexScrutinyToolIT.java:253)
> [ERROR] 
> testScrutinyWhileTakingWrites[2](org.apache.phoenix.end2end.IndexScrutinyToolIT)
>   Time elapsed: 12.195 s  <<< FAILURE!
> java.lang.AssertionError: expected:<1000> but was:<999>
>     at 
> org.apache.phoenix.end2end.IndexScrutinyToolIT.testScrutinyWhileTakingWrites(IndexScrutinyToolIT.java:253)
> {noformat}
> Saw this on a {{mvn verify}} of 5.x. I don't know if we expect this one to be 
> broken or not -- I didn't see an open issue tracking it.
> Is this one we should get fixed before shipping an alpha/beta? My opinion 
> would be: unless it is a trivial/simple fix, we should get it for the next 
> release.
> [~sergey.soldatov], [~an...@apache.org], [~rajeshbabu].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4565) IndexScrutinyToolIT is failing

2018-02-01 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated PHOENIX-4565:

Attachment: PHOENIX-4565.2.patch

> IndexScrutinyToolIT is failing
> --
>
> Key: PHOENIX-4565
> URL: https://issues.apache.org/jira/browse/PHOENIX-4565
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Josh Elser
>Priority: Critical
> Fix For: 5.0.0
>
> Attachments: PHOENIX-4565.2.patch, PHOENIX-4565.patch
>
>
> {noformat}
> [ERROR] 
> testScrutinyWhileTakingWrites[0](org.apache.phoenix.end2end.IndexScrutinyToolIT)
>   Time elapsed: 12.494 s  <<< FAILURE!
> java.lang.AssertionError: expected:<1000> but was:<996>
>     at 
> org.apache.phoenix.end2end.IndexScrutinyToolIT.testScrutinyWhileTakingWrites(IndexScrutinyToolIT.java:253)
> [ERROR] 
> testScrutinyWhileTakingWrites[1](org.apache.phoenix.end2end.IndexScrutinyToolIT)
>   Time elapsed: 7.437 s  <<< FAILURE!
> java.lang.AssertionError: expected:<1000> but was:<997>
>     at 
> org.apache.phoenix.end2end.IndexScrutinyToolIT.testScrutinyWhileTakingWrites(IndexScrutinyToolIT.java:253)
> [ERROR] 
> testScrutinyWhileTakingWrites[2](org.apache.phoenix.end2end.IndexScrutinyToolIT)
>   Time elapsed: 12.195 s  <<< FAILURE!
> java.lang.AssertionError: expected:<1000> but was:<999>
>     at 
> org.apache.phoenix.end2end.IndexScrutinyToolIT.testScrutinyWhileTakingWrites(IndexScrutinyToolIT.java:253)
> {noformat}
> Saw this on a {{mvn verify}} of 5.x. I don't know if we expect this one to be 
> broken or not -- I didn't see an open issue tracking it.
> Is this one we should get fixed before shipping an alpha/beta? My opinion 
> would be: unless it is a trivial/simple fix, we should get it for the next 
> release.
> [~sergey.soldatov], [~an...@apache.org], [~rajeshbabu].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4503) Phoenix-Spark plugin doesn't release zookeeper connections

2018-02-01 Thread Suhas Nalapure (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348595#comment-16348595
 ] 

Suhas Nalapure commented on PHOENIX-4503:
-

Tested with the latest code from branch 4.x-HBase-1.2, the issue is fixed now. 
Thank you [~karanmehta93] and [~jamestaylor].

> Phoenix-Spark plugin doesn't release zookeeper connections
> --
>
> Key: PHOENIX-4503
> URL: https://issues.apache.org/jira/browse/PHOENIX-4503
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.11.0
> Environment: HBase 1.2 on Linux (Ubuntu, CentOS)
>Reporter: Suhas Nalapure
>Priority: Major
>
> *1. Phoenix-Spark plugin doesn't release zookeeper connections*
> Example: 
>   
> {code:java}
> for(int i=0; i < 50; i++){
>   Dataset df = 
> sqlContext.read().format("org.apache.phoenix.spark")
>   .option("table", 
> "\"Sales\"").option("zkUrl", "localhost:2181")
>   .load();
>   df.show(2);
>   }
>   Thread.sleep(1000*60); 
> {code}
>
>  When the above snippet is executed, we can see number of connections to 2181 
> increasing and not getting released until after the main thread wakes up from 
> sleep and program ends as can be seen below (14 is the number of connections 
> even before the program starts to run) :
> netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 16:52:05
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 22
> 16:52:15
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 38
> 16:52:18
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 68
> 16:52:23
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 100
> 16:52:27
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 116
> 16:52:32
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 116
> 16:52:38
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 116
> 16:52:52
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 116
> 16:53:00
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 116
> 16:53:24
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 16:53:32
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 16:53:34
> root@user1 ~ $
> *2. Instead if "jdbc" format is used to create Spark Dataframe, the 
> connection count doesn't shoot up*
> Example:
>   
> {code:java}
> for(int i=0; i < 50; i++){
>   Dataset df = sqlContext.read().format("jdbc")
>   .option("url", 
> "jdbc:phoenix:localhost:2181")
>   .option("dbtable", "\"Sales\"")
>   .option("driver", 
> "org.apache.phoenix.jdbc.PhoenixDriver")
>   .load();
>   df.show(2);
>   }
>   Thread.sleep(1000*60);  
> {code}
>   
> Connection counts during program execution(14 being the count before 
> execution starts):
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 17:00:42
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 17:00:43
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:00:46
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:00:50
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:00:55
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:12
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:18
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:28
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:34
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:37
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:39
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 17:02:07



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4278) Implement pure client side transactional index maintenance

2018-02-01 Thread Ohad Shacham (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348158#comment-16348158
 ] 

Ohad Shacham commented on PHOENIX-4278:
---

[~jamestaylor], As per our discussion I applied the patch on-top-of of 
4.x-HBase-1.3 and ran testing. It terminated with the same errors I received 
when running 4.x-HBase-1.3 without my patch. 

This are the errors:

[*ERROR*] *Failures:* 

[*ERROR*]   *AggregateIT.testAvgGroupByOrderPreservingWithStats:432 
expected:<13> but was:<8>*

[*INFO*] 

[*ERROR*] *Tests run: 3365, Failures: 1, Errors: 0, Skipped: 7*

 

[*ERROR*] *Failures:* 

[*ERROR*] 
*org.apache.phoenix.end2end.index.MutableIndexFailureIT.testIndexWriteFailure[MutableIndexFailureIT_transactional=false,localIndex=false,isNamespaceMapped=false,disableIndexOnWriteFailure=true,failRebuildTask=false,throwIndexWriteFailure=null](org.apache.phoenix.end2end.index.MutableIndexFailureIT)*

[*ERROR*]   *Run 1: 
MutableIndexFailureIT.testIndexWriteFailure:345->checkStateAfterRebuild:389*

[*INFO*]   *Run 2: PASS*

[*INFO*]   *Run 3: PASS*

[*INFO*] 

[*ERROR*]   *PartialIndexRebuilderIT.testConcurrentUpsertsWithRebuild:221 Ran 
out of time*

[*ERROR*]   *PartialIndexRebuilderIT.testMultiVersionsAfterFailure:461*

[*ERROR*]   *PartialIndexRebuilderIT.testUpsertNullTwiceAfterFailure:521*

[*INFO*] 

[*ERROR*] *Tests run: 600, Failures: 4, Errors: 0, Skipped: 44*

 

> Implement pure client side transactional index maintenance
> --
>
> Key: PHOENIX-4278
> URL: https://issues.apache.org/jira/browse/PHOENIX-4278
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: James Taylor
>Assignee: Ohad Shacham
>Priority: Major
>
> The index maintenance for transactions follows the same model as non 
> transactional tables - coprocessor based on data table updates that looks up 
> previous row value to perform maintenance. This is necessary for non 
> transactional tables to ensure the rows are locked so that a consistent view 
> may be obtained. However, for transactional tables, the time stamp oracle 
> ensures uniqueness of time stamps (via transaction IDs) and the filtering 
> handles a scan seeing the "true" last committed value for a row. Thus, 
> there's no hard dependency to perform this on the server side.
> Moving the index maintenance to the client side would prevent any RS->RS RPC 
> calls (which have proved to be troublesome for HBase). It would require 
> returning more data to the client (i.e. the prior row value), but this seems 
> like a reasonable tradeoff.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)