[jira] [Resolved] (PHOENIX-4056) java.lang.IllegalArgumentException: Can not create a Path from an empty string

2017-08-02 Thread Jepson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jepson resolved PHOENIX-4056.
-
   Resolution: Fixed
Fix Version/s: 4.11.0

*Degrade spark version from 2.2.0 to 2.1.1, is resolved.*

> java.lang.IllegalArgumentException: Can not create a Path from an empty string
> --
>
> Key: PHOENIX-4056
> URL: https://issues.apache.org/jira/browse/PHOENIX-4056
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.11.0
> Environment: CDH5.12
> Phoenix:4.11
> HBase:1.2
> Spark: 2.2.0
> phoenix-spark.version:4.11.0-HBase-1.2
>Reporter: Jepson
>  Labels: features, patch, test
> Fix For: 4.11.0
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> 1.use the configuration of server and client(scala project)
>  
> phoenix.schema.isNamespaceMappingEnabled
> true
>   
>   
> phoenix.schema.mapSystemTablesToNamespace
> true
>   
> 2.The Code:
> {code:java}
> resultDF.write
>  .format("org.apache.phoenix.spark")
>  .mode(SaveMode.Overwrite)
>  .option("table", "JYDW.ADDRESS_ORDERCOUNT")
>  .option("zkUrl","192.168.1.40,192.168.1.41,192.168.1.42:2181")
>  .save()
> {code}
> 3.Throw this error,help to fix it,thankyou :
> 7/08/02 01:07:25 INFO DAGScheduler: Job 6 finished: runJob at 
> SparkHadoopMapReduceWriter.scala:88, took 7.990715 s
> 17/08/02 01:07:25 ERROR SparkHadoopMapReduceWriter: Aborting job 
> job_20170802010717_0079.
> {color:#59afe1}*java.lang.IllegalArgumentException: Can not create a Path 
> from an empty string*{color}
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126)
>   at org.apache.hadoop.fs.Path.(Path.java:134)
>   at org.apache.hadoop.fs.Path.(Path.java:88)
>   at 
> org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.absPathStagingDir(HadoopMapReduceCommitProtocol.scala:58)
>   at 
> org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:132)
>   at 
> org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.write(SparkHadoopMapReduceWriter.scala:101)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1085)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>   at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1084)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply$mcV$sp(PairRDDFunctions.scala:1003)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:994)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:994)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>   at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:994)
>   at 
> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:59)
>   at 
> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:28)
>   at 
> org.apache.phoenix.spark.DefaultSource.createRelation(DefaultSource.scala:47)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:472)
>   at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
>   at 
> 

[jira] [Comment Edited] (PHOENIX-4056) java.lang.IllegalArgumentException: Can not create a Path from an empty string

2017-08-02 Thread Jepson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112202#comment-16112202
 ] 

Jepson edited comment on PHOENIX-4056 at 8/3/17 5:16 AM:
-

I degradate the spark version from 2.2.0 to 2.1.1, is resolved with the error.
So phoenix 4.11 -hbase1.20 with spark2.2.0 is not work, the compatibility is 
not good.


was (Author: 1028344...@qq.com):
I degradate the spark version from 2.2.0 to 2.1.1, is resolved with the error.

> java.lang.IllegalArgumentException: Can not create a Path from an empty string
> --
>
> Key: PHOENIX-4056
> URL: https://issues.apache.org/jira/browse/PHOENIX-4056
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.11.0
> Environment: CDH5.12
> Phoenix:4.11
> HBase:1.2
> Spark: 2.2.0
> phoenix-spark.version:4.11.0-HBase-1.2
>Reporter: Jepson
>  Labels: features, patch, test
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> 1.use the configuration of server and client(scala project)
>  
> phoenix.schema.isNamespaceMappingEnabled
> true
>   
>   
> phoenix.schema.mapSystemTablesToNamespace
> true
>   
> 2.The Code:
> {code:java}
> resultDF.write
>  .format("org.apache.phoenix.spark")
>  .mode(SaveMode.Overwrite)
>  .option("table", "JYDW.ADDRESS_ORDERCOUNT")
>  .option("zkUrl","192.168.1.40,192.168.1.41,192.168.1.42:2181")
>  .save()
> {code}
> 3.Throw this error,help to fix it,thankyou :
> 7/08/02 01:07:25 INFO DAGScheduler: Job 6 finished: runJob at 
> SparkHadoopMapReduceWriter.scala:88, took 7.990715 s
> 17/08/02 01:07:25 ERROR SparkHadoopMapReduceWriter: Aborting job 
> job_20170802010717_0079.
> {color:#59afe1}*java.lang.IllegalArgumentException: Can not create a Path 
> from an empty string*{color}
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126)
>   at org.apache.hadoop.fs.Path.(Path.java:134)
>   at org.apache.hadoop.fs.Path.(Path.java:88)
>   at 
> org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.absPathStagingDir(HadoopMapReduceCommitProtocol.scala:58)
>   at 
> org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:132)
>   at 
> org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.write(SparkHadoopMapReduceWriter.scala:101)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1085)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>   at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1084)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply$mcV$sp(PairRDDFunctions.scala:1003)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:994)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:994)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>   at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:994)
>   at 
> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:59)
>   at 
> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:28)
>   at 
> org.apache.phoenix.spark.DefaultSource.createRelation(DefaultSource.scala:47)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:472)
>   at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
>   at 
> 

[jira] [Commented] (PHOENIX-418) Support approximate COUNT DISTINCT

2017-08-02 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112203#comment-16112203
 ] 

Sergey Soldatov commented on PHOENIX-418:
-

[~aertoria] if you have several others commits on top of yours, you may get a 
single commit by command:
{code}
git format-patch -k -1 
{code}
Also make sure that it can be applied on top of the master branch (git am 
). If it's not, rebase it before submitting to the JIRA. You also 
may just resolve the conflicts during am and get a new patch with format-patch.

> Support approximate COUNT DISTINCT
> --
>
> Key: PHOENIX-418
> URL: https://issues.apache.org/jira/browse/PHOENIX-418
> Project: Phoenix
>  Issue Type: Task
>Reporter: James Taylor
>Assignee: Ethan Wang
>  Labels: gsoc2016
> Attachments: PHOENIX-418-v1.patch
>
>
> Support an "approximation" of count distinct to prevent having to hold on to 
> all distinct values (since this will not scale well when the number of 
> distinct values is huge). The Apache Drill folks have had some interesting 
> discussions on this 
> [here](http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201306.mbox/%3CJIRA.12650169.1369931282407.88049.1370645900553%40arcas%3E).
>  They recommend using  [Welford's 
> method](http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance_Online_algorithm).
>  I'm open to having a config option that uses exact versus approximate. I 
> don't have experience implementing an approximate implementation, so I'm not 
> sure how much state is required to keep on the server and return to the 
> client (other than realizing it'd be much less that returning all distinct 
> values and their counts).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4056) java.lang.IllegalArgumentException: Can not create a Path from an empty string

2017-08-02 Thread Jepson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112202#comment-16112202
 ] 

Jepson commented on PHOENIX-4056:
-

I degradate the spark version from 2.2.0 to 2.1.1, is resolved with the error.

> java.lang.IllegalArgumentException: Can not create a Path from an empty string
> --
>
> Key: PHOENIX-4056
> URL: https://issues.apache.org/jira/browse/PHOENIX-4056
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.11.0
> Environment: CDH5.12
> Phoenix:4.11
> HBase:1.2
> Spark: 2.2.0
> phoenix-spark.version:4.11.0-HBase-1.2
>Reporter: Jepson
>  Labels: features, patch, test
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> 1.use the configuration of server and client(scala project)
>  
> phoenix.schema.isNamespaceMappingEnabled
> true
>   
>   
> phoenix.schema.mapSystemTablesToNamespace
> true
>   
> 2.The Code:
> {code:java}
> resultDF.write
>  .format("org.apache.phoenix.spark")
>  .mode(SaveMode.Overwrite)
>  .option("table", "JYDW.ADDRESS_ORDERCOUNT")
>  .option("zkUrl","192.168.1.40,192.168.1.41,192.168.1.42:2181")
>  .save()
> {code}
> 3.Throw this error,help to fix it,thankyou :
> 7/08/02 01:07:25 INFO DAGScheduler: Job 6 finished: runJob at 
> SparkHadoopMapReduceWriter.scala:88, took 7.990715 s
> 17/08/02 01:07:25 ERROR SparkHadoopMapReduceWriter: Aborting job 
> job_20170802010717_0079.
> {color:#59afe1}*java.lang.IllegalArgumentException: Can not create a Path 
> from an empty string*{color}
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126)
>   at org.apache.hadoop.fs.Path.(Path.java:134)
>   at org.apache.hadoop.fs.Path.(Path.java:88)
>   at 
> org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.absPathStagingDir(HadoopMapReduceCommitProtocol.scala:58)
>   at 
> org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:132)
>   at 
> org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.write(SparkHadoopMapReduceWriter.scala:101)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1085)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>   at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1084)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply$mcV$sp(PairRDDFunctions.scala:1003)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:994)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:994)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>   at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:994)
>   at 
> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:59)
>   at 
> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:28)
>   at 
> org.apache.phoenix.spark.DefaultSource.createRelation(DefaultSource.scala:47)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:472)
>   at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
>   at 
> 

[jira] [Updated] (PHOENIX-4056) java.lang.IllegalArgumentException: Can not create a Path from an empty string

2017-08-02 Thread Jepson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jepson updated PHOENIX-4056:

Environment: 
CDH5.12
Phoenix:4.11
HBase:1.2
Spark: 2.2.0

phoenix-spark.version:4.11.0-HBase-1.2

  was:
CDH5.12
Phoenix:4.11
HBase:1.2

phoenix-spark.version:4.11.0-HBase-1.2


> java.lang.IllegalArgumentException: Can not create a Path from an empty string
> --
>
> Key: PHOENIX-4056
> URL: https://issues.apache.org/jira/browse/PHOENIX-4056
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.11.0
> Environment: CDH5.12
> Phoenix:4.11
> HBase:1.2
> Spark: 2.2.0
> phoenix-spark.version:4.11.0-HBase-1.2
>Reporter: Jepson
>  Labels: features, patch, test
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> 1.use the configuration of server and client(scala project)
>  
> phoenix.schema.isNamespaceMappingEnabled
> true
>   
>   
> phoenix.schema.mapSystemTablesToNamespace
> true
>   
> 2.The Code:
> {code:java}
> resultDF.write
>  .format("org.apache.phoenix.spark")
>  .mode(SaveMode.Overwrite)
>  .option("table", "JYDW.ADDRESS_ORDERCOUNT")
>  .option("zkUrl","192.168.1.40,192.168.1.41,192.168.1.42:2181")
>  .save()
> {code}
> 3.Throw this error,help to fix it,thankyou :
> 7/08/02 01:07:25 INFO DAGScheduler: Job 6 finished: runJob at 
> SparkHadoopMapReduceWriter.scala:88, took 7.990715 s
> 17/08/02 01:07:25 ERROR SparkHadoopMapReduceWriter: Aborting job 
> job_20170802010717_0079.
> {color:#59afe1}*java.lang.IllegalArgumentException: Can not create a Path 
> from an empty string*{color}
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126)
>   at org.apache.hadoop.fs.Path.(Path.java:134)
>   at org.apache.hadoop.fs.Path.(Path.java:88)
>   at 
> org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.absPathStagingDir(HadoopMapReduceCommitProtocol.scala:58)
>   at 
> org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:132)
>   at 
> org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.write(SparkHadoopMapReduceWriter.scala:101)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1085)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>   at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1084)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply$mcV$sp(PairRDDFunctions.scala:1003)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:994)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:994)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>   at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:994)
>   at 
> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:59)
>   at 
> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:28)
>   at 
> org.apache.phoenix.spark.DefaultSource.createRelation(DefaultSource.scala:47)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:472)
>   at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
>   at 
> 

[jira] [Commented] (PHOENIX-3769) OnDuplicateKeyIT#testNewAndMultiDifferentUpdateOnSingleColumn fails on ppc64le

2017-08-02 Thread Sneha Kanekar (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112191#comment-16112191
 ] 

Sneha Kanekar commented on PHOENIX-3769:


{code:borderStyle=solid}
[INFO] Running org.apache.phoenix.end2end.OnDuplicateKeyIT
[ERROR] Tests run: 45, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 
264.123 s <<< FAILURE! - in org.apache.phoenix.end2end.OnDuplicateKeyIT
[ERROR] 
testNewAndMultiDifferentUpdateOnSingleColumn[0](org.apache.phoenix.end2end.OnDuplicateKeyIT)
  Time elapsed: 2.291 s  <<< ERROR!
java.lang.ArrayIndexOutOfBoundsException: 182
at 
org.apache.phoenix.end2end.OnDuplicateKeyIT.testNewAndMultiDifferentUpdateOnSingleColumn(OnDuplicateKeyIT.java:386)

[ERROR] 
testNewAndMultiDifferentUpdateOnSingleColumn[1](org.apache.phoenix.end2end.OnDuplicateKeyIT)
  Time elapsed: 8.202 s  <<< ERROR!
java.lang.ArrayIndexOutOfBoundsException: 486
at 
org.apache.phoenix.end2end.OnDuplicateKeyIT.testNewAndMultiDifferentUpdateOnSingleColumn(OnDuplicateKeyIT.java:386)

[ERROR] 
testNewAndMultiDifferentUpdateOnSingleColumn[2](org.apache.phoenix.end2end.OnDuplicateKeyIT)
  Time elapsed: 8.207 s  <<< ERROR!
java.lang.ArrayIndexOutOfBoundsException: 477
at 
org.apache.phoenix.end2end.OnDuplicateKeyIT.testNewAndMultiDifferentUpdateOnSingleColumn(OnDuplicateKeyIT.java:386)
{code}
>From the array indexes for which this test is failing, I can see that 
>ArrayIndexOutOfBounds exception is thrown for a byte array onDupKeyBytes which 
>is created in file 
>phoenix-core/src/main/java/org/apache/phoenix/compile/UpsertCompiler.java. 
The following function combineOnDupKey(in file 
phoenix-core/src/main/java/org/apache/phoenix/execute/MutationState.java) is 
throwing this exception while accessing array newRow.onDupKeyBytes
{code:borderStyle=solid}
// Concatenate ON DUPLICATE KEY bytes to allow multiple
// increments of the same row in the same commit batch.
this.onDupKeyBytes = PhoenixIndexBuilder.combineOnDupKey(this.onDupKeyBytes, 
newRow.onDupKeyBytes);
{code}

[~elserj] even I am not sure of why this happens on ppc64le only but I am 
trying debug it more. If the information above is of any help to you in 
understanding the root cause then please let me know.

> OnDuplicateKeyIT#testNewAndMultiDifferentUpdateOnSingleColumn fails on ppc64le
> --
>
> Key: PHOENIX-3769
> URL: https://issues.apache.org/jira/browse/PHOENIX-3769
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.11.0
> Environment: $ uname -a
> Linux 6945c232192e 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 
> 17:42:36 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux
> $ java -version
> openjdk version "1.8.0_111"
> OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-3~14.04.1-b14)
> OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
>Reporter: Sneha Kanekar
>  Labels: ppc64le
> Attachments: OnDuplicateKeyIT_Standard_output.txt, 
> PHOENIX-3769.patch, TEST-org.apache.phoenix.end2end.OnDuplicateKeyIT.xml
>
>
> The testcase 
> org.apache.phoenix.end2end.OnDuplicateKeyIT.testNewAndMultiDifferentUpdateOnSingleColumn
>  fails consistently on ppc64le architechture. The error message is as follows:
> {code: borderStyle=solid}
> java.lang.ArrayIndexOutOfBoundsException: 179
>   at 
> org.apache.phoenix.end2end.OnDuplicateKeyIT.testNewAndMultiDifferentUpdateOnSingleColumn(OnDuplicateKeyIT.java:392)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-418) Support approximate COUNT DISTINCT

2017-08-02 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112172#comment-16112172
 ] 

James Taylor commented on PHOENIX-418:
--

Patch doesn't look right - it includes commits outside of yours. Try generating 
with the following command after committing it to your local repo:
{code}
git format-patch --stdout HEAD^ > PHOENIX-{NUMBER}.patch
{code}

> Support approximate COUNT DISTINCT
> --
>
> Key: PHOENIX-418
> URL: https://issues.apache.org/jira/browse/PHOENIX-418
> Project: Phoenix
>  Issue Type: Task
>Reporter: James Taylor
>Assignee: Ethan Wang
>  Labels: gsoc2016
> Attachments: PHOENIX-418-v1.patch
>
>
> Support an "approximation" of count distinct to prevent having to hold on to 
> all distinct values (since this will not scale well when the number of 
> distinct values is huge). The Apache Drill folks have had some interesting 
> discussions on this 
> [here](http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201306.mbox/%3CJIRA.12650169.1369931282407.88049.1370645900553%40arcas%3E).
>  They recommend using  [Welford's 
> method](http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance_Online_algorithm).
>  I'm open to having a config option that uses exact versus approximate. I 
> don't have experience implementing an approximate implementation, so I'm not 
> sure how much state is required to keep on the server and return to the 
> client (other than realizing it'd be much less that returning all distinct 
> values and their counts).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4059) Index maintenance incorrect when indexed column updated from null to null with STORE_NULLS=true

2017-08-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111985#comment-16111985
 ] 

Hudson commented on PHOENIX-4059:
-

FAILURE: Integrated in Jenkins build Phoenix-master #1726 (See 
[https://builds.apache.org/job/Phoenix-master/1726/])
PHOENIX-4059 Index maintenance incorrect when indexed column updated 
(jamestaylor: rev 0bd43f536a13609b35629308c415aebc800d6799)
* (edit) phoenix-core/src/it/java/org/apache/phoenix/end2end/StoreNullsIT.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/index/IndexMaintainer.java


> Index maintenance incorrect when indexed column updated from null to null 
> with STORE_NULLS=true
> ---
>
> Key: PHOENIX-4059
> URL: https://issues.apache.org/jira/browse/PHOENIX-4059
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.12.0, 4.11.1
>
> Attachments: PHOENIX-4059.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4057) Do not issue index updates for out of order mutation during index maintenance

2017-08-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111984#comment-16111984
 ] 

Hudson commented on PHOENIX-4057:
-

FAILURE: Integrated in Jenkins build Phoenix-master #1726 (See 
[https://builds.apache.org/job/Phoenix-master/1726/])
PHOENIX-4057 Do not issue index updates for out of order mutation (jamestaylor: 
rev e494fe9fa744ec2948cf75b007e92fef0b1ba829)
* (edit) 
phoenix-core/src/it/java/org/apache/phoenix/hbase/index/covered/example/EndToEndCoveredIndexingIT.java
* (edit) 
phoenix-core/src/it/java/org/apache/phoenix/end2end/ConcurrentMutationsIT.java
* (add) 
phoenix-core/src/it/java/org/apache/phoenix/end2end/OutOfOrderMutationsIT.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/hbase/index/covered/NonTxIndexBuilder.java


> Do not issue index updates for out of order mutation during index maintenance 
> --
>
> Key: PHOENIX-4057
> URL: https://issues.apache.org/jira/browse/PHOENIX-4057
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.12.0, 4.11.1
>
> Attachments: PHOENIX-4057_v1.patch, PHOENIX-4057_wip1.patch
>
>
> Index maintenance is not correct when rows arrive out of order (see 
> PHOENIX-4052). In particular, out of order deletes end up with a spurious Put 
> in the index. Rather than corrupt the secondary index, we can instead just 
> ignore out-of-order mutations. The only downside is that point-in-time 
> queries against an index will not work correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (PHOENIX-4061) Require STORE_NULLS=true on tables with mutable secondary indexes

2017-08-02 Thread James Taylor (JIRA)
James Taylor created PHOENIX-4061:
-

 Summary: Require STORE_NULLS=true on tables with mutable secondary 
indexes 
 Key: PHOENIX-4061
 URL: https://issues.apache.org/jira/browse/PHOENIX-4061
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor


Until PHOENIX-4058 is fixed, we need to ensure that STORE_NULLS=true on tables 
with mutable secondary indexes. Otherwise, if a set value to null occurs at the 
same timestamp as a set value to non null, the secondary index will get out of 
sync.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4060) Handle out of order updates during mutable secondary index maintenance

2017-08-02 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-4060:
--
Description: 
To correctly handle out of order updates during mutable secondary index 
maintenance, revert PHOENIX-4057 and ensure that the tests in 
OutOfOrderMutationsIT pass. It'd be important to add tests that do do 
multi-version scans and verify correctness.

This will be necessary to support point-in-time queries against an index to 
always work correctly.

Barring any bugs in the bowels of mutable secondary indexing, the main area to 
focus on would be to correctly put and delete the Phoenix empty key value row. 
This is currently done in IndexMaintainer.buildDeleteMutation() and 
IndexMaintainer.buildUpdateMutation() which are called through the 
IndexCodec.getIndexDeletes() and getIndexUpserts() calls. It seems that the 
NonTxIndexBuilder.addMutationsForBatch() and addCurrentStateMutationsForBatch() 
do not always go through this abstraction. Fixing this might solve the issue, 
but understanding the mutable secondary index code is no small feat.

  was:
To correctly handle out of order updates during mutable secondary index 
maintenance, revert PHOENIX-4057 and ensure that the tests in 
OutOfOrderMutationsIT pass. It'd be important to add tests that do do 
multi-version scans and verify correctness.

This will be necessary to support point-in-time queries against an index to 
always work correctly.


> Handle out of order updates during mutable secondary index maintenance
> --
>
> Key: PHOENIX-4060
> URL: https://issues.apache.org/jira/browse/PHOENIX-4060
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>
> To correctly handle out of order updates during mutable secondary index 
> maintenance, revert PHOENIX-4057 and ensure that the tests in 
> OutOfOrderMutationsIT pass. It'd be important to add tests that do do 
> multi-version scans and verify correctness.
> This will be necessary to support point-in-time queries against an index to 
> always work correctly.
> Barring any bugs in the bowels of mutable secondary indexing, the main area 
> to focus on would be to correctly put and delete the Phoenix empty key value 
> row. This is currently done in IndexMaintainer.buildDeleteMutation() and 
> IndexMaintainer.buildUpdateMutation() which are called through the 
> IndexCodec.getIndexDeletes() and getIndexUpserts() calls. It seems that the 
> NonTxIndexBuilder.addMutationsForBatch() and 
> addCurrentStateMutationsForBatch() do not always go through this abstraction. 
> Fixing this might solve the issue, but understanding the mutable secondary 
> index code is no small feat.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (PHOENIX-4060) Handle out of order updates during mutable secondary index maintenance

2017-08-02 Thread James Taylor (JIRA)
James Taylor created PHOENIX-4060:
-

 Summary: Handle out of order updates during mutable secondary 
index maintenance
 Key: PHOENIX-4060
 URL: https://issues.apache.org/jira/browse/PHOENIX-4060
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor


To correctly handle out of order updates during mutable secondary index 
maintenance, revert PHOENIX-4057 and ensure that the tests in 
OutOfOrderMutationsIT pass. It'd be important to add tests that do do 
multi-version scans and verify correctness.

This will be necessary to support point-in-time queries against an index to 
always work correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4059) Index maintenance incorrect when indexed column updated from null to null with STORE_NULLS=true

2017-08-02 Thread Mujtaba Chohan (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111851#comment-16111851
 ] 

Mujtaba Chohan commented on PHOENIX-4059:
-

Works perfectly in my test now.

> Index maintenance incorrect when indexed column updated from null to null 
> with STORE_NULLS=true
> ---
>
> Key: PHOENIX-4059
> URL: https://issues.apache.org/jira/browse/PHOENIX-4059
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
> Attachments: PHOENIX-4059.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4059) Index maintenance incorrect when indexed column updated from null to null with STORE_NULLS=true

2017-08-02 Thread Thomas D'Silva (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111827#comment-16111827
 ] 

Thomas D'Silva commented on PHOENIX-4059:
-

+1

> Index maintenance incorrect when indexed column updated from null to null 
> with STORE_NULLS=true
> ---
>
> Key: PHOENIX-4059
> URL: https://issues.apache.org/jira/browse/PHOENIX-4059
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
> Attachments: PHOENIX-4059.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4057) Do not issue index updates for out of order mutation during index maintenance

2017-08-02 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111767#comment-16111767
 ] 

James Taylor commented on PHOENIX-4057:
---

Theoretically, that's true, [~vincentpoon], if you got unlucky enough to choose 
a point-in-time where an out of order mutation occurred right before that time 
stamp. I wouldn't expect this would happen very often.

> Do not issue index updates for out of order mutation during index maintenance 
> --
>
> Key: PHOENIX-4057
> URL: https://issues.apache.org/jira/browse/PHOENIX-4057
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.12.0, 4.11.1
>
> Attachments: PHOENIX-4057_v1.patch, PHOENIX-4057_wip1.patch
>
>
> Index maintenance is not correct when rows arrive out of order (see 
> PHOENIX-4052). In particular, out of order deletes end up with a spurious Put 
> in the index. Rather than corrupt the secondary index, we can instead just 
> ignore out-of-order mutations. The only downside is that point-in-time 
> queries against an index will not work correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4057) Do not issue index updates for out of order mutation during index maintenance

2017-08-02 Thread Vincent Poon (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111757#comment-16111757
 ] 

Vincent Poon commented on PHOENIX-4057:
---

[~jamestaylor] I guess this would also mean we can't run a scrutiny as of an 
older timestamp?  As that would essentially be a point-in-time query comparison 
of data vs index table, and there could potentially be data table writes 
without corresponding index updates with this change?

> Do not issue index updates for out of order mutation during index maintenance 
> --
>
> Key: PHOENIX-4057
> URL: https://issues.apache.org/jira/browse/PHOENIX-4057
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.12.0, 4.11.1
>
> Attachments: PHOENIX-4057_v1.patch, PHOENIX-4057_wip1.patch
>
>
> Index maintenance is not correct when rows arrive out of order (see 
> PHOENIX-4052). In particular, out of order deletes end up with a spurious Put 
> in the index. Rather than corrupt the secondary index, we can instead just 
> ignore out-of-order mutations. The only downside is that point-in-time 
> queries against an index will not work correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (PHOENIX-4059) Index maintenance incorrect when indexed column updated from null to null with STORE_NULLS=true

2017-08-02 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor reassigned PHOENIX-4059:
-

Assignee: James Taylor

> Index maintenance incorrect when indexed column updated from null to null 
> with STORE_NULLS=true
> ---
>
> Key: PHOENIX-4059
> URL: https://issues.apache.org/jira/browse/PHOENIX-4059
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
> Attachments: PHOENIX-4059.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4059) Index maintenance incorrect when indexed column updated from null to null with STORE_NULLS=true

2017-08-02 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-4059:
--
Attachment: PHOENIX-4059.patch

Please review, [~tdsilva].

[~mujtabachohan] - would you mind trying with this patch?

> Index maintenance incorrect when indexed column updated from null to null 
> with STORE_NULLS=true
> ---
>
> Key: PHOENIX-4059
> URL: https://issues.apache.org/jira/browse/PHOENIX-4059
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
> Attachments: PHOENIX-4059.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (PHOENIX-4059) Index maintenance incorrect when indexed column updated from null to null with STORE_NULLS=true

2017-08-02 Thread James Taylor (JIRA)
James Taylor created PHOENIX-4059:
-

 Summary: Index maintenance incorrect when indexed column updated 
from null to null with STORE_NULLS=true
 Key: PHOENIX-4059
 URL: https://issues.apache.org/jira/browse/PHOENIX-4059
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4058) Generate correct index updates when DeleteColumn processed before Put with same timestamp

2017-08-02 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-4058:
--
Summary: Generate correct index updates when DeleteColumn processed before 
Put with same timestamp  (was: Generate correct index updates when DeleteColumn 
processed before Put of same timestamp)

> Generate correct index updates when DeleteColumn processed before Put with 
> same timestamp
> -
>
> Key: PHOENIX-4058
> URL: https://issues.apache.org/jira/browse/PHOENIX-4058
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>
> The following scenario is not handled correctly for mutable secondary 
> indexing:
> 1) indexed column set to null at ts1 (generates a DeleteColumn)
> 2) indexed column set to value at ts1 (i.e. different client, same ts)
> 3) RS processes (1) first
> 4) RS processes (2) next
> Because deletes take precedence over puts even though the put happens after 
> the delete, we still needs to not generate index updates.
> The quick fix is to add STORE_NULLS=true to the table as we skirt the issue 
> since no deletes are issued (i.e. last put wins).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4051) Prevent out-of-order updates for mutable index updates

2017-08-02 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-4051:
--
Attachment: PHOENIX-4051_v3.patch

Can you give this patch a try, [~mujtabachohan]? Let me know if you need an 
0.98 version of the patch.

> Prevent out-of-order updates for mutable index updates
> --
>
> Key: PHOENIX-4051
> URL: https://issues.apache.org/jira/browse/PHOENIX-4051
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
> Attachments: PHOENIX-4051_v1.patch, PHOENIX-4051_v2.patch, 
> PHOENIX-4051_v3.patch
>
>
> Out-of-order processing of data rows during index maintenance causes mutable 
> indexes to become out of sync with regard to the data table. Here's a simple 
> example to illustrate the issue:
> # Assume table T(K,V) and index X(V,K).
> # Upsert T(A, 1) at t10. Index updates: Put X(1,A) at t10.
> # Upsert T(A, 3) at t30. Index updates: Delete X(1,A) at t29, Put X(3,A) at 
> t30.
> # Upsert T(A,2) at t20. Index updates: Delete X(1,A) at t19, Put X(2,A) at 
> t20, Delete X(2,A) at t29
> Ideally, we'd want to remove the Delete X(1,A) at t29 since this isn't 
> correct in terms of timeline consistency, but we can't do that with HBase 
> without support for deleting/undoing Delete markers. 
> The above is not what is occurring. Instead, when T(A,2) comes in, the Put 
> X(2,A) will occur at t20, but the Delete won't occur. This causes more index 
> rows than data rows, essentially making it invalid.
> A quick fix is to reset the timestamp of the data table mutations to the 
> current time within the preBatchMutate call, when the row is exclusively 
> locked. This skirts the issue because then timestamps won't overlap.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (PHOENIX-4058) Generate correct index updates when DeleteColumn processed before Put of same timestamp

2017-08-02 Thread James Taylor (JIRA)
James Taylor created PHOENIX-4058:
-

 Summary: Generate correct index updates when DeleteColumn 
processed before Put of same timestamp
 Key: PHOENIX-4058
 URL: https://issues.apache.org/jira/browse/PHOENIX-4058
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor


The following scenario is not handled correctly for mutable secondary indexing:
1) indexed column set to null at ts1 (generates a DeleteColumn)
2) indexed column set to value at ts1 (i.e. different client, same ts)
3) RS processes (1) first
4) RS processes (2) next

Because deletes take precedence over puts even though the put happens after the 
delete, we still needs to not generate index updates.

The quick fix is to add STORE_NULLS=true to the table as we skirt the issue 
since no deletes are issued (i.e. last put wins).




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4057) Do not issue index updates for out of order mutation during index maintenance

2017-08-02 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111492#comment-16111492
 ] 

James Taylor commented on PHOENIX-4057:
---

Thanks for the review, [~samarthjain]. Yes, we'll want to release note it. It's 
not a regression, though - it wouldn't have worked correctly in the past either.

> Do not issue index updates for out of order mutation during index maintenance 
> --
>
> Key: PHOENIX-4057
> URL: https://issues.apache.org/jira/browse/PHOENIX-4057
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.12.0, 4.11.1
>
> Attachments: PHOENIX-4057_v1.patch, PHOENIX-4057_wip1.patch
>
>
> Index maintenance is not correct when rows arrive out of order (see 
> PHOENIX-4052). In particular, out of order deletes end up with a spurious Put 
> in the index. Rather than corrupt the secondary index, we can instead just 
> ignore out-of-order mutations. The only downside is that point-in-time 
> queries against an index will not work correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4057) Do not issue index updates for out of order mutation during index maintenance

2017-08-02 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111456#comment-16111456
 ] 

Samarth Jain commented on PHOENIX-4057:
---

Patch looks good, [~jamestaylor]. It sounds like with this change point in time 
queries are broken. Should this change warrant a release note that point in 
time queries are not supported if table has mutable indexes (global or local) 
on it?

> Do not issue index updates for out of order mutation during index maintenance 
> --
>
> Key: PHOENIX-4057
> URL: https://issues.apache.org/jira/browse/PHOENIX-4057
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.12.0, 4.11.1
>
> Attachments: PHOENIX-4057_v1.patch, PHOENIX-4057_wip1.patch
>
>
> Index maintenance is not correct when rows arrive out of order (see 
> PHOENIX-4052). In particular, out of order deletes end up with a spurious Put 
> in the index. Rather than corrupt the secondary index, we can instead just 
> ignore out-of-order mutations. The only downside is that point-in-time 
> queries against an index will not work correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4057) Do not issue index updates for out of order mutation

2017-08-02 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111079#comment-16111079
 ] 

James Taylor commented on PHOENIX-4057:
---

This JIRA only applies to updates made at index maintenance time. Replication 
is at the HBase level and would not be impacted. I'll update the JIRA summary.

> Do not issue index updates for out of order mutation
> 
>
> Key: PHOENIX-4057
> URL: https://issues.apache.org/jira/browse/PHOENIX-4057
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.12.0, 4.11.1
>
> Attachments: PHOENIX-4057_v1.patch, PHOENIX-4057_wip1.patch
>
>
> Index maintenance is not correct when rows arrive out of order (see 
> PHOENIX-4052). In particular, out of order deletes end up with a spurious Put 
> in the index. Rather than corrupt the secondary index, we can instead just 
> ignore out-of-order mutations. The only downside is that point-in-time 
> queries against an index will not work correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4057) Do not issue index updates for out of order mutation during index maintenance

2017-08-02 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-4057:
--
Summary: Do not issue index updates for out of order mutation during index 
maintenance   (was: Do not issue index updates for out of order mutation)

> Do not issue index updates for out of order mutation during index maintenance 
> --
>
> Key: PHOENIX-4057
> URL: https://issues.apache.org/jira/browse/PHOENIX-4057
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.12.0, 4.11.1
>
> Attachments: PHOENIX-4057_v1.patch, PHOENIX-4057_wip1.patch
>
>
> Index maintenance is not correct when rows arrive out of order (see 
> PHOENIX-4052). In particular, out of order deletes end up with a spurious Put 
> in the index. Rather than corrupt the secondary index, we can instead just 
> ignore out-of-order mutations. The only downside is that point-in-time 
> queries against an index will not work correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4057) Do not issue index updates for out of order mutation

2017-08-02 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111046#comment-16111046
 ] 

Lars Hofhansl commented on PHOENIX-4057:


Replication updates can sometimes come out of order.

> Do not issue index updates for out of order mutation
> 
>
> Key: PHOENIX-4057
> URL: https://issues.apache.org/jira/browse/PHOENIX-4057
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.12.0, 4.11.1
>
> Attachments: PHOENIX-4057_v1.patch, PHOENIX-4057_wip1.patch
>
>
> Index maintenance is not correct when rows arrive out of order (see 
> PHOENIX-4052). In particular, out of order deletes end up with a spurious Put 
> in the index. Rather than corrupt the secondary index, we can instead just 
> ignore out-of-order mutations. The only downside is that point-in-time 
> queries against an index will not work correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-418) Support approximate COUNT DISTINCT

2017-08-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16110419#comment-16110419
 ] 

Hadoop QA commented on PHOENIX-418:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12879968/PHOENIX-418-v1.patch
  against master branch at commit 5e33dc12bc088bd0008d89f0a5cd7d5c368efa25.
  ATTACHMENT ID: 12879968

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/1242//console

This message is automatically generated.

> Support approximate COUNT DISTINCT
> --
>
> Key: PHOENIX-418
> URL: https://issues.apache.org/jira/browse/PHOENIX-418
> Project: Phoenix
>  Issue Type: Task
>Reporter: James Taylor
>Assignee: Ethan Wang
>  Labels: gsoc2016
> Attachments: PHOENIX-418-v1.patch
>
>
> Support an "approximation" of count distinct to prevent having to hold on to 
> all distinct values (since this will not scale well when the number of 
> distinct values is huge). The Apache Drill folks have had some interesting 
> discussions on this 
> [here](http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201306.mbox/%3CJIRA.12650169.1369931282407.88049.1370645900553%40arcas%3E).
>  They recommend using  [Welford's 
> method](http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance_Online_algorithm).
>  I'm open to having a config option that uses exact versus approximate. I 
> don't have experience implementing an approximate implementation, so I'm not 
> sure how much state is required to keep on the server and return to the 
> client (other than realizing it'd be much less that returning all distinct 
> values and their counts).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-418) Support approximate COUNT DISTINCT

2017-08-02 Thread Ethan Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Wang updated PHOENIX-418:
---
Attachment: PHOENIX-418-v1.patch

> Support approximate COUNT DISTINCT
> --
>
> Key: PHOENIX-418
> URL: https://issues.apache.org/jira/browse/PHOENIX-418
> Project: Phoenix
>  Issue Type: Task
>Reporter: James Taylor
>Assignee: Ethan Wang
>  Labels: gsoc2016
> Attachments: PHOENIX-418-v1.patch
>
>
> Support an "approximation" of count distinct to prevent having to hold on to 
> all distinct values (since this will not scale well when the number of 
> distinct values is huge). The Apache Drill folks have had some interesting 
> discussions on this 
> [here](http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201306.mbox/%3CJIRA.12650169.1369931282407.88049.1370645900553%40arcas%3E).
>  They recommend using  [Welford's 
> method](http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance_Online_algorithm).
>  I'm open to having a config option that uses exact versus approximate. I 
> don't have experience implementing an approximate implementation, so I'm not 
> sure how much state is required to keep on the server and return to the 
> client (other than realizing it'd be much less that returning all distinct 
> values and their counts).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)