[jira] [Resolved] (PHOENIX-4056) java.lang.IllegalArgumentException: Can not create a Path from an empty string
[ https://issues.apache.org/jira/browse/PHOENIX-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jepson resolved PHOENIX-4056. - Resolution: Fixed Fix Version/s: 4.11.0 *Degrade spark version from 2.2.0 to 2.1.1, is resolved.* > java.lang.IllegalArgumentException: Can not create a Path from an empty string > -- > > Key: PHOENIX-4056 > URL: https://issues.apache.org/jira/browse/PHOENIX-4056 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.11.0 > Environment: CDH5.12 > Phoenix:4.11 > HBase:1.2 > Spark: 2.2.0 > phoenix-spark.version:4.11.0-HBase-1.2 >Reporter: Jepson > Labels: features, patch, test > Fix For: 4.11.0 > > Original Estimate: 12h > Remaining Estimate: 12h > > 1.use the configuration of server and client(scala project) > > phoenix.schema.isNamespaceMappingEnabled > true > > > phoenix.schema.mapSystemTablesToNamespace > true > > 2.The Code: > {code:java} > resultDF.write > .format("org.apache.phoenix.spark") > .mode(SaveMode.Overwrite) > .option("table", "JYDW.ADDRESS_ORDERCOUNT") > .option("zkUrl","192.168.1.40,192.168.1.41,192.168.1.42:2181") > .save() > {code} > 3.Throw this error,help to fix it,thankyou : > 7/08/02 01:07:25 INFO DAGScheduler: Job 6 finished: runJob at > SparkHadoopMapReduceWriter.scala:88, took 7.990715 s > 17/08/02 01:07:25 ERROR SparkHadoopMapReduceWriter: Aborting job > job_20170802010717_0079. > {color:#59afe1}*java.lang.IllegalArgumentException: Can not create a Path > from an empty string*{color} > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126) > at org.apache.hadoop.fs.Path.(Path.java:134) > at org.apache.hadoop.fs.Path.(Path.java:88) > at > org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.absPathStagingDir(HadoopMapReduceCommitProtocol.scala:58) > at > org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:132) > at > org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.write(SparkHadoopMapReduceWriter.scala:101) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1085) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1084) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply$mcV$sp(PairRDDFunctions.scala:1003) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:994) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:994) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:994) > at > org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:59) > at > org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:28) > at > org.apache.phoenix.spark.DefaultSource.createRelation(DefaultSource.scala:47) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:472) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) > at >
[jira] [Comment Edited] (PHOENIX-4056) java.lang.IllegalArgumentException: Can not create a Path from an empty string
[ https://issues.apache.org/jira/browse/PHOENIX-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112202#comment-16112202 ] Jepson edited comment on PHOENIX-4056 at 8/3/17 5:16 AM: - I degradate the spark version from 2.2.0 to 2.1.1, is resolved with the error. So phoenix 4.11 -hbase1.20 with spark2.2.0 is not work, the compatibility is not good. was (Author: 1028344...@qq.com): I degradate the spark version from 2.2.0 to 2.1.1, is resolved with the error. > java.lang.IllegalArgumentException: Can not create a Path from an empty string > -- > > Key: PHOENIX-4056 > URL: https://issues.apache.org/jira/browse/PHOENIX-4056 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.11.0 > Environment: CDH5.12 > Phoenix:4.11 > HBase:1.2 > Spark: 2.2.0 > phoenix-spark.version:4.11.0-HBase-1.2 >Reporter: Jepson > Labels: features, patch, test > Original Estimate: 12h > Remaining Estimate: 12h > > 1.use the configuration of server and client(scala project) > > phoenix.schema.isNamespaceMappingEnabled > true > > > phoenix.schema.mapSystemTablesToNamespace > true > > 2.The Code: > {code:java} > resultDF.write > .format("org.apache.phoenix.spark") > .mode(SaveMode.Overwrite) > .option("table", "JYDW.ADDRESS_ORDERCOUNT") > .option("zkUrl","192.168.1.40,192.168.1.41,192.168.1.42:2181") > .save() > {code} > 3.Throw this error,help to fix it,thankyou : > 7/08/02 01:07:25 INFO DAGScheduler: Job 6 finished: runJob at > SparkHadoopMapReduceWriter.scala:88, took 7.990715 s > 17/08/02 01:07:25 ERROR SparkHadoopMapReduceWriter: Aborting job > job_20170802010717_0079. > {color:#59afe1}*java.lang.IllegalArgumentException: Can not create a Path > from an empty string*{color} > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126) > at org.apache.hadoop.fs.Path.(Path.java:134) > at org.apache.hadoop.fs.Path.(Path.java:88) > at > org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.absPathStagingDir(HadoopMapReduceCommitProtocol.scala:58) > at > org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:132) > at > org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.write(SparkHadoopMapReduceWriter.scala:101) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1085) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1084) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply$mcV$sp(PairRDDFunctions.scala:1003) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:994) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:994) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:994) > at > org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:59) > at > org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:28) > at > org.apache.phoenix.spark.DefaultSource.createRelation(DefaultSource.scala:47) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:472) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at >
[jira] [Commented] (PHOENIX-418) Support approximate COUNT DISTINCT
[ https://issues.apache.org/jira/browse/PHOENIX-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112203#comment-16112203 ] Sergey Soldatov commented on PHOENIX-418: - [~aertoria] if you have several others commits on top of yours, you may get a single commit by command: {code} git format-patch -k -1 {code} Also make sure that it can be applied on top of the master branch (git am ). If it's not, rebase it before submitting to the JIRA. You also may just resolve the conflicts during am and get a new patch with format-patch. > Support approximate COUNT DISTINCT > -- > > Key: PHOENIX-418 > URL: https://issues.apache.org/jira/browse/PHOENIX-418 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: gsoc2016 > Attachments: PHOENIX-418-v1.patch > > > Support an "approximation" of count distinct to prevent having to hold on to > all distinct values (since this will not scale well when the number of > distinct values is huge). The Apache Drill folks have had some interesting > discussions on this > [here](http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201306.mbox/%3CJIRA.12650169.1369931282407.88049.1370645900553%40arcas%3E). > They recommend using [Welford's > method](http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance_Online_algorithm). > I'm open to having a config option that uses exact versus approximate. I > don't have experience implementing an approximate implementation, so I'm not > sure how much state is required to keep on the server and return to the > client (other than realizing it'd be much less that returning all distinct > values and their counts). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4056) java.lang.IllegalArgumentException: Can not create a Path from an empty string
[ https://issues.apache.org/jira/browse/PHOENIX-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112202#comment-16112202 ] Jepson commented on PHOENIX-4056: - I degradate the spark version from 2.2.0 to 2.1.1, is resolved with the error. > java.lang.IllegalArgumentException: Can not create a Path from an empty string > -- > > Key: PHOENIX-4056 > URL: https://issues.apache.org/jira/browse/PHOENIX-4056 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.11.0 > Environment: CDH5.12 > Phoenix:4.11 > HBase:1.2 > Spark: 2.2.0 > phoenix-spark.version:4.11.0-HBase-1.2 >Reporter: Jepson > Labels: features, patch, test > Original Estimate: 12h > Remaining Estimate: 12h > > 1.use the configuration of server and client(scala project) > > phoenix.schema.isNamespaceMappingEnabled > true > > > phoenix.schema.mapSystemTablesToNamespace > true > > 2.The Code: > {code:java} > resultDF.write > .format("org.apache.phoenix.spark") > .mode(SaveMode.Overwrite) > .option("table", "JYDW.ADDRESS_ORDERCOUNT") > .option("zkUrl","192.168.1.40,192.168.1.41,192.168.1.42:2181") > .save() > {code} > 3.Throw this error,help to fix it,thankyou : > 7/08/02 01:07:25 INFO DAGScheduler: Job 6 finished: runJob at > SparkHadoopMapReduceWriter.scala:88, took 7.990715 s > 17/08/02 01:07:25 ERROR SparkHadoopMapReduceWriter: Aborting job > job_20170802010717_0079. > {color:#59afe1}*java.lang.IllegalArgumentException: Can not create a Path > from an empty string*{color} > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126) > at org.apache.hadoop.fs.Path.(Path.java:134) > at org.apache.hadoop.fs.Path.(Path.java:88) > at > org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.absPathStagingDir(HadoopMapReduceCommitProtocol.scala:58) > at > org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:132) > at > org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.write(SparkHadoopMapReduceWriter.scala:101) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1085) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1084) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply$mcV$sp(PairRDDFunctions.scala:1003) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:994) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:994) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:994) > at > org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:59) > at > org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:28) > at > org.apache.phoenix.spark.DefaultSource.createRelation(DefaultSource.scala:47) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:472) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) > at >
[jira] [Updated] (PHOENIX-4056) java.lang.IllegalArgumentException: Can not create a Path from an empty string
[ https://issues.apache.org/jira/browse/PHOENIX-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jepson updated PHOENIX-4056: Environment: CDH5.12 Phoenix:4.11 HBase:1.2 Spark: 2.2.0 phoenix-spark.version:4.11.0-HBase-1.2 was: CDH5.12 Phoenix:4.11 HBase:1.2 phoenix-spark.version:4.11.0-HBase-1.2 > java.lang.IllegalArgumentException: Can not create a Path from an empty string > -- > > Key: PHOENIX-4056 > URL: https://issues.apache.org/jira/browse/PHOENIX-4056 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.11.0 > Environment: CDH5.12 > Phoenix:4.11 > HBase:1.2 > Spark: 2.2.0 > phoenix-spark.version:4.11.0-HBase-1.2 >Reporter: Jepson > Labels: features, patch, test > Original Estimate: 12h > Remaining Estimate: 12h > > 1.use the configuration of server and client(scala project) > > phoenix.schema.isNamespaceMappingEnabled > true > > > phoenix.schema.mapSystemTablesToNamespace > true > > 2.The Code: > {code:java} > resultDF.write > .format("org.apache.phoenix.spark") > .mode(SaveMode.Overwrite) > .option("table", "JYDW.ADDRESS_ORDERCOUNT") > .option("zkUrl","192.168.1.40,192.168.1.41,192.168.1.42:2181") > .save() > {code} > 3.Throw this error,help to fix it,thankyou : > 7/08/02 01:07:25 INFO DAGScheduler: Job 6 finished: runJob at > SparkHadoopMapReduceWriter.scala:88, took 7.990715 s > 17/08/02 01:07:25 ERROR SparkHadoopMapReduceWriter: Aborting job > job_20170802010717_0079. > {color:#59afe1}*java.lang.IllegalArgumentException: Can not create a Path > from an empty string*{color} > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126) > at org.apache.hadoop.fs.Path.(Path.java:134) > at org.apache.hadoop.fs.Path.(Path.java:88) > at > org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.absPathStagingDir(HadoopMapReduceCommitProtocol.scala:58) > at > org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:132) > at > org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.write(SparkHadoopMapReduceWriter.scala:101) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1085) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1084) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply$mcV$sp(PairRDDFunctions.scala:1003) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:994) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:994) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:994) > at > org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:59) > at > org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:28) > at > org.apache.phoenix.spark.DefaultSource.createRelation(DefaultSource.scala:47) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:472) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) > at >
[jira] [Commented] (PHOENIX-3769) OnDuplicateKeyIT#testNewAndMultiDifferentUpdateOnSingleColumn fails on ppc64le
[ https://issues.apache.org/jira/browse/PHOENIX-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112191#comment-16112191 ] Sneha Kanekar commented on PHOENIX-3769: {code:borderStyle=solid} [INFO] Running org.apache.phoenix.end2end.OnDuplicateKeyIT [ERROR] Tests run: 45, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 264.123 s <<< FAILURE! - in org.apache.phoenix.end2end.OnDuplicateKeyIT [ERROR] testNewAndMultiDifferentUpdateOnSingleColumn[0](org.apache.phoenix.end2end.OnDuplicateKeyIT) Time elapsed: 2.291 s <<< ERROR! java.lang.ArrayIndexOutOfBoundsException: 182 at org.apache.phoenix.end2end.OnDuplicateKeyIT.testNewAndMultiDifferentUpdateOnSingleColumn(OnDuplicateKeyIT.java:386) [ERROR] testNewAndMultiDifferentUpdateOnSingleColumn[1](org.apache.phoenix.end2end.OnDuplicateKeyIT) Time elapsed: 8.202 s <<< ERROR! java.lang.ArrayIndexOutOfBoundsException: 486 at org.apache.phoenix.end2end.OnDuplicateKeyIT.testNewAndMultiDifferentUpdateOnSingleColumn(OnDuplicateKeyIT.java:386) [ERROR] testNewAndMultiDifferentUpdateOnSingleColumn[2](org.apache.phoenix.end2end.OnDuplicateKeyIT) Time elapsed: 8.207 s <<< ERROR! java.lang.ArrayIndexOutOfBoundsException: 477 at org.apache.phoenix.end2end.OnDuplicateKeyIT.testNewAndMultiDifferentUpdateOnSingleColumn(OnDuplicateKeyIT.java:386) {code} >From the array indexes for which this test is failing, I can see that >ArrayIndexOutOfBounds exception is thrown for a byte array onDupKeyBytes which >is created in file >phoenix-core/src/main/java/org/apache/phoenix/compile/UpsertCompiler.java. The following function combineOnDupKey(in file phoenix-core/src/main/java/org/apache/phoenix/execute/MutationState.java) is throwing this exception while accessing array newRow.onDupKeyBytes {code:borderStyle=solid} // Concatenate ON DUPLICATE KEY bytes to allow multiple // increments of the same row in the same commit batch. this.onDupKeyBytes = PhoenixIndexBuilder.combineOnDupKey(this.onDupKeyBytes, newRow.onDupKeyBytes); {code} [~elserj] even I am not sure of why this happens on ppc64le only but I am trying debug it more. If the information above is of any help to you in understanding the root cause then please let me know. > OnDuplicateKeyIT#testNewAndMultiDifferentUpdateOnSingleColumn fails on ppc64le > -- > > Key: PHOENIX-3769 > URL: https://issues.apache.org/jira/browse/PHOENIX-3769 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.11.0 > Environment: $ uname -a > Linux 6945c232192e 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 > 17:42:36 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux > $ java -version > openjdk version "1.8.0_111" > OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-3~14.04.1-b14) > OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode) >Reporter: Sneha Kanekar > Labels: ppc64le > Attachments: OnDuplicateKeyIT_Standard_output.txt, > PHOENIX-3769.patch, TEST-org.apache.phoenix.end2end.OnDuplicateKeyIT.xml > > > The testcase > org.apache.phoenix.end2end.OnDuplicateKeyIT.testNewAndMultiDifferentUpdateOnSingleColumn > fails consistently on ppc64le architechture. The error message is as follows: > {code: borderStyle=solid} > java.lang.ArrayIndexOutOfBoundsException: 179 > at > org.apache.phoenix.end2end.OnDuplicateKeyIT.testNewAndMultiDifferentUpdateOnSingleColumn(OnDuplicateKeyIT.java:392) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-418) Support approximate COUNT DISTINCT
[ https://issues.apache.org/jira/browse/PHOENIX-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112172#comment-16112172 ] James Taylor commented on PHOENIX-418: -- Patch doesn't look right - it includes commits outside of yours. Try generating with the following command after committing it to your local repo: {code} git format-patch --stdout HEAD^ > PHOENIX-{NUMBER}.patch {code} > Support approximate COUNT DISTINCT > -- > > Key: PHOENIX-418 > URL: https://issues.apache.org/jira/browse/PHOENIX-418 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: gsoc2016 > Attachments: PHOENIX-418-v1.patch > > > Support an "approximation" of count distinct to prevent having to hold on to > all distinct values (since this will not scale well when the number of > distinct values is huge). The Apache Drill folks have had some interesting > discussions on this > [here](http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201306.mbox/%3CJIRA.12650169.1369931282407.88049.1370645900553%40arcas%3E). > They recommend using [Welford's > method](http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance_Online_algorithm). > I'm open to having a config option that uses exact versus approximate. I > don't have experience implementing an approximate implementation, so I'm not > sure how much state is required to keep on the server and return to the > client (other than realizing it'd be much less that returning all distinct > values and their counts). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4059) Index maintenance incorrect when indexed column updated from null to null with STORE_NULLS=true
[ https://issues.apache.org/jira/browse/PHOENIX-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111985#comment-16111985 ] Hudson commented on PHOENIX-4059: - FAILURE: Integrated in Jenkins build Phoenix-master #1726 (See [https://builds.apache.org/job/Phoenix-master/1726/]) PHOENIX-4059 Index maintenance incorrect when indexed column updated (jamestaylor: rev 0bd43f536a13609b35629308c415aebc800d6799) * (edit) phoenix-core/src/it/java/org/apache/phoenix/end2end/StoreNullsIT.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/index/IndexMaintainer.java > Index maintenance incorrect when indexed column updated from null to null > with STORE_NULLS=true > --- > > Key: PHOENIX-4059 > URL: https://issues.apache.org/jira/browse/PHOENIX-4059 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.12.0, 4.11.1 > > Attachments: PHOENIX-4059.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4057) Do not issue index updates for out of order mutation during index maintenance
[ https://issues.apache.org/jira/browse/PHOENIX-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111984#comment-16111984 ] Hudson commented on PHOENIX-4057: - FAILURE: Integrated in Jenkins build Phoenix-master #1726 (See [https://builds.apache.org/job/Phoenix-master/1726/]) PHOENIX-4057 Do not issue index updates for out of order mutation (jamestaylor: rev e494fe9fa744ec2948cf75b007e92fef0b1ba829) * (edit) phoenix-core/src/it/java/org/apache/phoenix/hbase/index/covered/example/EndToEndCoveredIndexingIT.java * (edit) phoenix-core/src/it/java/org/apache/phoenix/end2end/ConcurrentMutationsIT.java * (add) phoenix-core/src/it/java/org/apache/phoenix/end2end/OutOfOrderMutationsIT.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/covered/NonTxIndexBuilder.java > Do not issue index updates for out of order mutation during index maintenance > -- > > Key: PHOENIX-4057 > URL: https://issues.apache.org/jira/browse/PHOENIX-4057 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.12.0, 4.11.1 > > Attachments: PHOENIX-4057_v1.patch, PHOENIX-4057_wip1.patch > > > Index maintenance is not correct when rows arrive out of order (see > PHOENIX-4052). In particular, out of order deletes end up with a spurious Put > in the index. Rather than corrupt the secondary index, we can instead just > ignore out-of-order mutations. The only downside is that point-in-time > queries against an index will not work correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PHOENIX-4061) Require STORE_NULLS=true on tables with mutable secondary indexes
James Taylor created PHOENIX-4061: - Summary: Require STORE_NULLS=true on tables with mutable secondary indexes Key: PHOENIX-4061 URL: https://issues.apache.org/jira/browse/PHOENIX-4061 Project: Phoenix Issue Type: Bug Reporter: James Taylor Until PHOENIX-4058 is fixed, we need to ensure that STORE_NULLS=true on tables with mutable secondary indexes. Otherwise, if a set value to null occurs at the same timestamp as a set value to non null, the secondary index will get out of sync. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4060) Handle out of order updates during mutable secondary index maintenance
[ https://issues.apache.org/jira/browse/PHOENIX-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor updated PHOENIX-4060: -- Description: To correctly handle out of order updates during mutable secondary index maintenance, revert PHOENIX-4057 and ensure that the tests in OutOfOrderMutationsIT pass. It'd be important to add tests that do do multi-version scans and verify correctness. This will be necessary to support point-in-time queries against an index to always work correctly. Barring any bugs in the bowels of mutable secondary indexing, the main area to focus on would be to correctly put and delete the Phoenix empty key value row. This is currently done in IndexMaintainer.buildDeleteMutation() and IndexMaintainer.buildUpdateMutation() which are called through the IndexCodec.getIndexDeletes() and getIndexUpserts() calls. It seems that the NonTxIndexBuilder.addMutationsForBatch() and addCurrentStateMutationsForBatch() do not always go through this abstraction. Fixing this might solve the issue, but understanding the mutable secondary index code is no small feat. was: To correctly handle out of order updates during mutable secondary index maintenance, revert PHOENIX-4057 and ensure that the tests in OutOfOrderMutationsIT pass. It'd be important to add tests that do do multi-version scans and verify correctness. This will be necessary to support point-in-time queries against an index to always work correctly. > Handle out of order updates during mutable secondary index maintenance > -- > > Key: PHOENIX-4060 > URL: https://issues.apache.org/jira/browse/PHOENIX-4060 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor > > To correctly handle out of order updates during mutable secondary index > maintenance, revert PHOENIX-4057 and ensure that the tests in > OutOfOrderMutationsIT pass. It'd be important to add tests that do do > multi-version scans and verify correctness. > This will be necessary to support point-in-time queries against an index to > always work correctly. > Barring any bugs in the bowels of mutable secondary indexing, the main area > to focus on would be to correctly put and delete the Phoenix empty key value > row. This is currently done in IndexMaintainer.buildDeleteMutation() and > IndexMaintainer.buildUpdateMutation() which are called through the > IndexCodec.getIndexDeletes() and getIndexUpserts() calls. It seems that the > NonTxIndexBuilder.addMutationsForBatch() and > addCurrentStateMutationsForBatch() do not always go through this abstraction. > Fixing this might solve the issue, but understanding the mutable secondary > index code is no small feat. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PHOENIX-4060) Handle out of order updates during mutable secondary index maintenance
James Taylor created PHOENIX-4060: - Summary: Handle out of order updates during mutable secondary index maintenance Key: PHOENIX-4060 URL: https://issues.apache.org/jira/browse/PHOENIX-4060 Project: Phoenix Issue Type: Bug Reporter: James Taylor To correctly handle out of order updates during mutable secondary index maintenance, revert PHOENIX-4057 and ensure that the tests in OutOfOrderMutationsIT pass. It'd be important to add tests that do do multi-version scans and verify correctness. This will be necessary to support point-in-time queries against an index to always work correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4059) Index maintenance incorrect when indexed column updated from null to null with STORE_NULLS=true
[ https://issues.apache.org/jira/browse/PHOENIX-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111851#comment-16111851 ] Mujtaba Chohan commented on PHOENIX-4059: - Works perfectly in my test now. > Index maintenance incorrect when indexed column updated from null to null > with STORE_NULLS=true > --- > > Key: PHOENIX-4059 > URL: https://issues.apache.org/jira/browse/PHOENIX-4059 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor > Attachments: PHOENIX-4059.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4059) Index maintenance incorrect when indexed column updated from null to null with STORE_NULLS=true
[ https://issues.apache.org/jira/browse/PHOENIX-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111827#comment-16111827 ] Thomas D'Silva commented on PHOENIX-4059: - +1 > Index maintenance incorrect when indexed column updated from null to null > with STORE_NULLS=true > --- > > Key: PHOENIX-4059 > URL: https://issues.apache.org/jira/browse/PHOENIX-4059 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor > Attachments: PHOENIX-4059.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4057) Do not issue index updates for out of order mutation during index maintenance
[ https://issues.apache.org/jira/browse/PHOENIX-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111767#comment-16111767 ] James Taylor commented on PHOENIX-4057: --- Theoretically, that's true, [~vincentpoon], if you got unlucky enough to choose a point-in-time where an out of order mutation occurred right before that time stamp. I wouldn't expect this would happen very often. > Do not issue index updates for out of order mutation during index maintenance > -- > > Key: PHOENIX-4057 > URL: https://issues.apache.org/jira/browse/PHOENIX-4057 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.12.0, 4.11.1 > > Attachments: PHOENIX-4057_v1.patch, PHOENIX-4057_wip1.patch > > > Index maintenance is not correct when rows arrive out of order (see > PHOENIX-4052). In particular, out of order deletes end up with a spurious Put > in the index. Rather than corrupt the secondary index, we can instead just > ignore out-of-order mutations. The only downside is that point-in-time > queries against an index will not work correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4057) Do not issue index updates for out of order mutation during index maintenance
[ https://issues.apache.org/jira/browse/PHOENIX-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111757#comment-16111757 ] Vincent Poon commented on PHOENIX-4057: --- [~jamestaylor] I guess this would also mean we can't run a scrutiny as of an older timestamp? As that would essentially be a point-in-time query comparison of data vs index table, and there could potentially be data table writes without corresponding index updates with this change? > Do not issue index updates for out of order mutation during index maintenance > -- > > Key: PHOENIX-4057 > URL: https://issues.apache.org/jira/browse/PHOENIX-4057 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.12.0, 4.11.1 > > Attachments: PHOENIX-4057_v1.patch, PHOENIX-4057_wip1.patch > > > Index maintenance is not correct when rows arrive out of order (see > PHOENIX-4052). In particular, out of order deletes end up with a spurious Put > in the index. Rather than corrupt the secondary index, we can instead just > ignore out-of-order mutations. The only downside is that point-in-time > queries against an index will not work correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (PHOENIX-4059) Index maintenance incorrect when indexed column updated from null to null with STORE_NULLS=true
[ https://issues.apache.org/jira/browse/PHOENIX-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor reassigned PHOENIX-4059: - Assignee: James Taylor > Index maintenance incorrect when indexed column updated from null to null > with STORE_NULLS=true > --- > > Key: PHOENIX-4059 > URL: https://issues.apache.org/jira/browse/PHOENIX-4059 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor > Attachments: PHOENIX-4059.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4059) Index maintenance incorrect when indexed column updated from null to null with STORE_NULLS=true
[ https://issues.apache.org/jira/browse/PHOENIX-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor updated PHOENIX-4059: -- Attachment: PHOENIX-4059.patch Please review, [~tdsilva]. [~mujtabachohan] - would you mind trying with this patch? > Index maintenance incorrect when indexed column updated from null to null > with STORE_NULLS=true > --- > > Key: PHOENIX-4059 > URL: https://issues.apache.org/jira/browse/PHOENIX-4059 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor > Attachments: PHOENIX-4059.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PHOENIX-4059) Index maintenance incorrect when indexed column updated from null to null with STORE_NULLS=true
James Taylor created PHOENIX-4059: - Summary: Index maintenance incorrect when indexed column updated from null to null with STORE_NULLS=true Key: PHOENIX-4059 URL: https://issues.apache.org/jira/browse/PHOENIX-4059 Project: Phoenix Issue Type: Bug Reporter: James Taylor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4058) Generate correct index updates when DeleteColumn processed before Put with same timestamp
[ https://issues.apache.org/jira/browse/PHOENIX-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor updated PHOENIX-4058: -- Summary: Generate correct index updates when DeleteColumn processed before Put with same timestamp (was: Generate correct index updates when DeleteColumn processed before Put of same timestamp) > Generate correct index updates when DeleteColumn processed before Put with > same timestamp > - > > Key: PHOENIX-4058 > URL: https://issues.apache.org/jira/browse/PHOENIX-4058 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor > > The following scenario is not handled correctly for mutable secondary > indexing: > 1) indexed column set to null at ts1 (generates a DeleteColumn) > 2) indexed column set to value at ts1 (i.e. different client, same ts) > 3) RS processes (1) first > 4) RS processes (2) next > Because deletes take precedence over puts even though the put happens after > the delete, we still needs to not generate index updates. > The quick fix is to add STORE_NULLS=true to the table as we skirt the issue > since no deletes are issued (i.e. last put wins). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4051) Prevent out-of-order updates for mutable index updates
[ https://issues.apache.org/jira/browse/PHOENIX-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor updated PHOENIX-4051: -- Attachment: PHOENIX-4051_v3.patch Can you give this patch a try, [~mujtabachohan]? Let me know if you need an 0.98 version of the patch. > Prevent out-of-order updates for mutable index updates > -- > > Key: PHOENIX-4051 > URL: https://issues.apache.org/jira/browse/PHOENIX-4051 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor > Attachments: PHOENIX-4051_v1.patch, PHOENIX-4051_v2.patch, > PHOENIX-4051_v3.patch > > > Out-of-order processing of data rows during index maintenance causes mutable > indexes to become out of sync with regard to the data table. Here's a simple > example to illustrate the issue: > # Assume table T(K,V) and index X(V,K). > # Upsert T(A, 1) at t10. Index updates: Put X(1,A) at t10. > # Upsert T(A, 3) at t30. Index updates: Delete X(1,A) at t29, Put X(3,A) at > t30. > # Upsert T(A,2) at t20. Index updates: Delete X(1,A) at t19, Put X(2,A) at > t20, Delete X(2,A) at t29 > Ideally, we'd want to remove the Delete X(1,A) at t29 since this isn't > correct in terms of timeline consistency, but we can't do that with HBase > without support for deleting/undoing Delete markers. > The above is not what is occurring. Instead, when T(A,2) comes in, the Put > X(2,A) will occur at t20, but the Delete won't occur. This causes more index > rows than data rows, essentially making it invalid. > A quick fix is to reset the timestamp of the data table mutations to the > current time within the preBatchMutate call, when the row is exclusively > locked. This skirts the issue because then timestamps won't overlap. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PHOENIX-4058) Generate correct index updates when DeleteColumn processed before Put of same timestamp
James Taylor created PHOENIX-4058: - Summary: Generate correct index updates when DeleteColumn processed before Put of same timestamp Key: PHOENIX-4058 URL: https://issues.apache.org/jira/browse/PHOENIX-4058 Project: Phoenix Issue Type: Bug Reporter: James Taylor The following scenario is not handled correctly for mutable secondary indexing: 1) indexed column set to null at ts1 (generates a DeleteColumn) 2) indexed column set to value at ts1 (i.e. different client, same ts) 3) RS processes (1) first 4) RS processes (2) next Because deletes take precedence over puts even though the put happens after the delete, we still needs to not generate index updates. The quick fix is to add STORE_NULLS=true to the table as we skirt the issue since no deletes are issued (i.e. last put wins). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4057) Do not issue index updates for out of order mutation during index maintenance
[ https://issues.apache.org/jira/browse/PHOENIX-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111492#comment-16111492 ] James Taylor commented on PHOENIX-4057: --- Thanks for the review, [~samarthjain]. Yes, we'll want to release note it. It's not a regression, though - it wouldn't have worked correctly in the past either. > Do not issue index updates for out of order mutation during index maintenance > -- > > Key: PHOENIX-4057 > URL: https://issues.apache.org/jira/browse/PHOENIX-4057 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.12.0, 4.11.1 > > Attachments: PHOENIX-4057_v1.patch, PHOENIX-4057_wip1.patch > > > Index maintenance is not correct when rows arrive out of order (see > PHOENIX-4052). In particular, out of order deletes end up with a spurious Put > in the index. Rather than corrupt the secondary index, we can instead just > ignore out-of-order mutations. The only downside is that point-in-time > queries against an index will not work correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4057) Do not issue index updates for out of order mutation during index maintenance
[ https://issues.apache.org/jira/browse/PHOENIX-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111456#comment-16111456 ] Samarth Jain commented on PHOENIX-4057: --- Patch looks good, [~jamestaylor]. It sounds like with this change point in time queries are broken. Should this change warrant a release note that point in time queries are not supported if table has mutable indexes (global or local) on it? > Do not issue index updates for out of order mutation during index maintenance > -- > > Key: PHOENIX-4057 > URL: https://issues.apache.org/jira/browse/PHOENIX-4057 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.12.0, 4.11.1 > > Attachments: PHOENIX-4057_v1.patch, PHOENIX-4057_wip1.patch > > > Index maintenance is not correct when rows arrive out of order (see > PHOENIX-4052). In particular, out of order deletes end up with a spurious Put > in the index. Rather than corrupt the secondary index, we can instead just > ignore out-of-order mutations. The only downside is that point-in-time > queries against an index will not work correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4057) Do not issue index updates for out of order mutation
[ https://issues.apache.org/jira/browse/PHOENIX-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111079#comment-16111079 ] James Taylor commented on PHOENIX-4057: --- This JIRA only applies to updates made at index maintenance time. Replication is at the HBase level and would not be impacted. I'll update the JIRA summary. > Do not issue index updates for out of order mutation > > > Key: PHOENIX-4057 > URL: https://issues.apache.org/jira/browse/PHOENIX-4057 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.12.0, 4.11.1 > > Attachments: PHOENIX-4057_v1.patch, PHOENIX-4057_wip1.patch > > > Index maintenance is not correct when rows arrive out of order (see > PHOENIX-4052). In particular, out of order deletes end up with a spurious Put > in the index. Rather than corrupt the secondary index, we can instead just > ignore out-of-order mutations. The only downside is that point-in-time > queries against an index will not work correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4057) Do not issue index updates for out of order mutation during index maintenance
[ https://issues.apache.org/jira/browse/PHOENIX-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor updated PHOENIX-4057: -- Summary: Do not issue index updates for out of order mutation during index maintenance (was: Do not issue index updates for out of order mutation) > Do not issue index updates for out of order mutation during index maintenance > -- > > Key: PHOENIX-4057 > URL: https://issues.apache.org/jira/browse/PHOENIX-4057 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.12.0, 4.11.1 > > Attachments: PHOENIX-4057_v1.patch, PHOENIX-4057_wip1.patch > > > Index maintenance is not correct when rows arrive out of order (see > PHOENIX-4052). In particular, out of order deletes end up with a spurious Put > in the index. Rather than corrupt the secondary index, we can instead just > ignore out-of-order mutations. The only downside is that point-in-time > queries against an index will not work correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4057) Do not issue index updates for out of order mutation
[ https://issues.apache.org/jira/browse/PHOENIX-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111046#comment-16111046 ] Lars Hofhansl commented on PHOENIX-4057: Replication updates can sometimes come out of order. > Do not issue index updates for out of order mutation > > > Key: PHOENIX-4057 > URL: https://issues.apache.org/jira/browse/PHOENIX-4057 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.12.0, 4.11.1 > > Attachments: PHOENIX-4057_v1.patch, PHOENIX-4057_wip1.patch > > > Index maintenance is not correct when rows arrive out of order (see > PHOENIX-4052). In particular, out of order deletes end up with a spurious Put > in the index. Rather than corrupt the secondary index, we can instead just > ignore out-of-order mutations. The only downside is that point-in-time > queries against an index will not work correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-418) Support approximate COUNT DISTINCT
[ https://issues.apache.org/jira/browse/PHOENIX-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16110419#comment-16110419 ] Hadoop QA commented on PHOENIX-418: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12879968/PHOENIX-418-v1.patch against master branch at commit 5e33dc12bc088bd0008d89f0a5cd7d5c368efa25. ATTACHMENT ID: 12879968 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-PHOENIX-Build/1242//console This message is automatically generated. > Support approximate COUNT DISTINCT > -- > > Key: PHOENIX-418 > URL: https://issues.apache.org/jira/browse/PHOENIX-418 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: gsoc2016 > Attachments: PHOENIX-418-v1.patch > > > Support an "approximation" of count distinct to prevent having to hold on to > all distinct values (since this will not scale well when the number of > distinct values is huge). The Apache Drill folks have had some interesting > discussions on this > [here](http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201306.mbox/%3CJIRA.12650169.1369931282407.88049.1370645900553%40arcas%3E). > They recommend using [Welford's > method](http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance_Online_algorithm). > I'm open to having a config option that uses exact versus approximate. I > don't have experience implementing an approximate implementation, so I'm not > sure how much state is required to keep on the server and return to the > client (other than realizing it'd be much less that returning all distinct > values and their counts). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-418) Support approximate COUNT DISTINCT
[ https://issues.apache.org/jira/browse/PHOENIX-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Wang updated PHOENIX-418: --- Attachment: PHOENIX-418-v1.patch > Support approximate COUNT DISTINCT > -- > > Key: PHOENIX-418 > URL: https://issues.apache.org/jira/browse/PHOENIX-418 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: gsoc2016 > Attachments: PHOENIX-418-v1.patch > > > Support an "approximation" of count distinct to prevent having to hold on to > all distinct values (since this will not scale well when the number of > distinct values is huge). The Apache Drill folks have had some interesting > discussions on this > [here](http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201306.mbox/%3CJIRA.12650169.1369931282407.88049.1370645900553%40arcas%3E). > They recommend using [Welford's > method](http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance_Online_algorithm). > I'm open to having a config option that uses exact versus approximate. I > don't have experience implementing an approximate implementation, so I'm not > sure how much state is required to keep on the server and return to the > client (other than realizing it'd be much less that returning all distinct > values and their counts). -- This message was sent by Atlassian JIRA (v6.4.14#64029)