[jira] [Commented] (PHOENIX-2883) Region close during automatic disabling of index for rebuilding can lead to RS abort

2018-02-06 Thread Guizhou Feng (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355098#comment-16355098
 ] 

Guizhou Feng commented on PHOENIX-2883:
---

That's great to see Phoenix have specific release to support CDH, thanks a lot 
:)

> Region close during automatic disabling of index for rebuilding can lead to 
> RS abort
> 
>
> Key: PHOENIX-2883
> URL: https://issues.apache.org/jira/browse/PHOENIX-2883
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
>
> (disclaimer: still performing due-diligence on this one)
> I've been helping a user this week with what is thought to be a race 
> condition in secondary index updates. This user has a relatively heavy 
> write-based workload with a few tables that each have at least one index.
> What we have seen is that when the region distribution is changing 
> (concretely, we were doing a rolling restart of the cluster without the load 
> balancer disabled in the hopes of retaining as much availability as 
> possible), I've seen the following general outline in the logs:
> * An index update fails (due to {{ERROR 2008 (INT10)}} the index metadata 
> cache expired or is just missing)
> * The index is taken offline to be asynchronously rebuilt
> * A flush on the data table's region is queue for quite some time
> * RS is asked to close a region (due to a move, commonly)
> * RS aborts because the memstore for the data table's region is in an 
> inconsistent state (e.g. {{Assertion failed while closing store  
>  flushableSize expected=0, actual= 193392. Current 
> memstoreSize=-552208. Maybe a coprocessor operation failed and left the 
> memstore in a partially updated state.}}
> Some relevant HBase issues include HBASE-10514 and HBASE-10844.
> Have been talking to [~ayingshu] and [~devaraj] about it, but haven't found 
> anything definitively conclusive yet. Will dump findings here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


RE: 4.13.0-HBase-1.1 not released?

2018-02-06 Thread Stepan Migunov
Hi James,
I have submitted patch on the JIRA. I've never done this before, so please
excuse me if something is wrong. Thanks,
Stepan.

-Original Message-
From: James Taylor [mailto:jamestay...@apache.org]
Sent: Monday, November 20, 2017 6:13 PM
To: dev@phoenix.apache.org
Subject: Re: 4.13.0-HBase-1.1 not released?

Hi Stepan,
Please submit a patch on the JIRA.
Thanks,
James

On Mon, Nov 20, 2017 at 1:38 AM Stepan Migunov <
stepan.migu...@firstlinesoftware.com> wrote:

> Good news, thank you.
>
> Btw, do you know if https://issues.apache.org/jira/browse/PHOENIX-4056
> still unresolved? That means that Phoenix is not compatible with spark
> 2.2.
> I see  saveToPhoenix contains the follwing line:
> phxRDD.saveAsNewAPIHadoopFile("", ...). But spark 2.2 doesn't work if
> path is empty.
>
> It whould be great if this param will be changed to something like
> phxRDD.saveAsNewAPIHadoopFile(conf.get("phoenix.tempPath"),), then
> we could be able to set param "phoenix.tempPath" to some temp path as
> workaround.
>
> Regards,
> Stepan.
>
> On 2017-11-18 23:22, James Taylor  wrote:
> > FYI, we'll do one final release for Phoenix on HBase 1.1 (look for a
> 4.13.1
> > release soon). It looks like HBase 1.1 itself is nearing
> > end-of-life, so probably good to move off of it. If someone is
> > interested in being the RM for continued Phoenix HBase 1.1 releases,
> > please volunteer.
> >
> > On Mon, Nov 13, 2017 at 10:24 AM, James R. Taylor <
> jamestay...@apache.org>
> > wrote:
> >
> > > Hi Xavier,
> > > Please see these threads for some discussion. Would be great if
> > > you
> could
> > > volunteer to be the release manager for Phoenix released on HBase 1.1.
> > >
> > > https://lists.apache.org/thread.html/8a73efa27edb70ea5cbc89b
> > > 43c312faefaf2b78751c9459834523b81@%3Cuser.phoenix.apache.org%3E
> > > https://lists.apache.org/thread.html/04de7c47724d8ef2ed7414d
> > > 5bdc51325b2a0eecd324556d9e83f3718@%3Cdev.phoenix.apache.org%3E
> > > https://lists.apache.org/thread.html/ae13def3c024603ce3cdde8
> > > 71223cbdbae0219b4efe93ed4e48f55d5@%3Cdev.phoenix.apache.org%3E
> > >
> > > Thanks,
> > > James
> > >
> > > On 2017-11-13 07:51, Xavier Jodoin  wrote:
> > > > Hi,
> > > >
> > > > I would like to know if there is a reason why phoenix wasn't
> > > > released for hbase 1.1?
> > > >
> > > > Thanks
> > > >
> > > > Xavier Jodoin
> > > >
> > > >
> > >
> >
>


[jira] [Commented] (PHOENIX-4056) java.lang.IllegalArgumentException: Can not create a Path from an empty string

2018-02-06 Thread Jepson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355096#comment-16355096
 ] 

Jepson commented on PHOENIX-4056:
-

[~stepson] Thanks for reply.

> java.lang.IllegalArgumentException: Can not create a Path from an empty string
> --
>
> Key: PHOENIX-4056
> URL: https://issues.apache.org/jira/browse/PHOENIX-4056
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.11.0
> Environment: CDH5.12
> Phoenix:4.11
> HBase:1.2
> Spark: 2.2.0
> phoenix-spark.version:4.11.0-HBase-1.2
>Reporter: Jepson
>Priority: Major
>  Labels: features, patch, test
> Attachments: PHOENIX-4056.patch
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> 1.use the configuration of server and client(scala project)
>  
> phoenix.schema.isNamespaceMappingEnabled
> true
>   
>   
> phoenix.schema.mapSystemTablesToNamespace
> true
>   
> 2.The Code:
> {code:java}
> resultDF.write
>  .format("org.apache.phoenix.spark")
>  .mode(SaveMode.Overwrite)
>  .option("table", "JYDW.ADDRESS_ORDERCOUNT")
>  .option("zkUrl","192.168.1.40,192.168.1.41,192.168.1.42:2181")
>  .save()
> {code}
> 3.Throw this error,help to fix it,thankyou :
> 7/08/02 01:07:25 INFO DAGScheduler: Job 6 finished: runJob at 
> SparkHadoopMapReduceWriter.scala:88, took 7.990715 s
> 17/08/02 01:07:25 ERROR SparkHadoopMapReduceWriter: Aborting job 
> job_20170802010717_0079.
> {color:#59afe1}*java.lang.IllegalArgumentException: Can not create a Path 
> from an empty string*{color}
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126)
>   at org.apache.hadoop.fs.Path.(Path.java:134)
>   at org.apache.hadoop.fs.Path.(Path.java:88)
>   at 
> org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.absPathStagingDir(HadoopMapReduceCommitProtocol.scala:58)
>   at 
> org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:132)
>   at 
> org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.write(SparkHadoopMapReduceWriter.scala:101)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1085)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>   at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1084)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply$mcV$sp(PairRDDFunctions.scala:1003)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:994)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:994)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>   at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:994)
>   at 
> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:59)
>   at 
> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:28)
>   at 
> org.apache.phoenix.spark.DefaultSource.createRelation(DefaultSource.scala:47)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:472)
>   at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
>   at 
> 

[jira] [Commented] (PHOENIX-4056) java.lang.IllegalArgumentException: Can not create a Path from an empty string

2018-02-06 Thread Stepan Migunov (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355090#comment-16355090
 ] 

Stepan Migunov commented on PHOENIX-4056:
-

I see saveToPhoenix method contains the following line: 
phxRDD.saveAsNewAPIHadoopFile("", ...). But spark 2.2 doesn't work if path is 
empty. 
 
It whould be great if this param will be changed to something like 
phxRDD.saveAsNewAPIHadoopFile(conf.get("phoenix.tempPath"),), then we could 
be able to set param "phoenix.tempPath" to some temp path as workaround.

I have provided patch to make this workaround available. Client should add the 
following option to config:
{noformat}
phoenixConf.set("mapred.output.dir", tempPath) // tempPath some temporary path
ds.toDF().saveToPhoenix(tableName, conf = phoenixConf ...){noformat}
[^PHOENIX-4056.patch]

After that it works with spark 2.2.

> java.lang.IllegalArgumentException: Can not create a Path from an empty string
> --
>
> Key: PHOENIX-4056
> URL: https://issues.apache.org/jira/browse/PHOENIX-4056
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.11.0
> Environment: CDH5.12
> Phoenix:4.11
> HBase:1.2
> Spark: 2.2.0
> phoenix-spark.version:4.11.0-HBase-1.2
>Reporter: Jepson
>Priority: Major
>  Labels: features, patch, test
> Attachments: PHOENIX-4056.patch
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> 1.use the configuration of server and client(scala project)
>  
> phoenix.schema.isNamespaceMappingEnabled
> true
>   
>   
> phoenix.schema.mapSystemTablesToNamespace
> true
>   
> 2.The Code:
> {code:java}
> resultDF.write
>  .format("org.apache.phoenix.spark")
>  .mode(SaveMode.Overwrite)
>  .option("table", "JYDW.ADDRESS_ORDERCOUNT")
>  .option("zkUrl","192.168.1.40,192.168.1.41,192.168.1.42:2181")
>  .save()
> {code}
> 3.Throw this error,help to fix it,thankyou :
> 7/08/02 01:07:25 INFO DAGScheduler: Job 6 finished: runJob at 
> SparkHadoopMapReduceWriter.scala:88, took 7.990715 s
> 17/08/02 01:07:25 ERROR SparkHadoopMapReduceWriter: Aborting job 
> job_20170802010717_0079.
> {color:#59afe1}*java.lang.IllegalArgumentException: Can not create a Path 
> from an empty string*{color}
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126)
>   at org.apache.hadoop.fs.Path.(Path.java:134)
>   at org.apache.hadoop.fs.Path.(Path.java:88)
>   at 
> org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.absPathStagingDir(HadoopMapReduceCommitProtocol.scala:58)
>   at 
> org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:132)
>   at 
> org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.write(SparkHadoopMapReduceWriter.scala:101)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1085)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>   at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1084)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply$mcV$sp(PairRDDFunctions.scala:1003)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:994)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:994)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>   at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:994)
>   at 
> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:59)
>   at 
> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:28)
>   at 
> org.apache.phoenix.spark.DefaultSource.createRelation(DefaultSource.scala:47)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:472)
>   at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
>   at 
> 

[jira] [Commented] (PHOENIX-2883) Region close during automatic disabling of index for rebuilding can lead to RS abort

2018-02-06 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355087#comment-16355087
 ] 

James Taylor commented on PHOENIX-2883:
---

We have a recent CDH 5.11.2 compatible release for Phoenix 4.13.2. Would that 
help?

> Region close during automatic disabling of index for rebuilding can lead to 
> RS abort
> 
>
> Key: PHOENIX-2883
> URL: https://issues.apache.org/jira/browse/PHOENIX-2883
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
>
> (disclaimer: still performing due-diligence on this one)
> I've been helping a user this week with what is thought to be a race 
> condition in secondary index updates. This user has a relatively heavy 
> write-based workload with a few tables that each have at least one index.
> What we have seen is that when the region distribution is changing 
> (concretely, we were doing a rolling restart of the cluster without the load 
> balancer disabled in the hopes of retaining as much availability as 
> possible), I've seen the following general outline in the logs:
> * An index update fails (due to {{ERROR 2008 (INT10)}} the index metadata 
> cache expired or is just missing)
> * The index is taken offline to be asynchronously rebuilt
> * A flush on the data table's region is queue for quite some time
> * RS is asked to close a region (due to a move, commonly)
> * RS aborts because the memstore for the data table's region is in an 
> inconsistent state (e.g. {{Assertion failed while closing store  
>  flushableSize expected=0, actual= 193392. Current 
> memstoreSize=-552208. Maybe a coprocessor operation failed and left the 
> memstore in a partially updated state.}}
> Some relevant HBase issues include HBASE-10514 and HBASE-10844.
> Have been talking to [~ayingshu] and [~devaraj] about it, but haven't found 
> anything definitively conclusive yet. Will dump findings here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4056) java.lang.IllegalArgumentException: Can not create a Path from an empty string

2018-02-06 Thread Stepan Migunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stepan Migunov updated PHOENIX-4056:

Attachment: PHOENIX-4056.patch

> java.lang.IllegalArgumentException: Can not create a Path from an empty string
> --
>
> Key: PHOENIX-4056
> URL: https://issues.apache.org/jira/browse/PHOENIX-4056
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.11.0
> Environment: CDH5.12
> Phoenix:4.11
> HBase:1.2
> Spark: 2.2.0
> phoenix-spark.version:4.11.0-HBase-1.2
>Reporter: Jepson
>Priority: Major
>  Labels: features, patch, test
> Attachments: PHOENIX-4056.patch
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> 1.use the configuration of server and client(scala project)
>  
> phoenix.schema.isNamespaceMappingEnabled
> true
>   
>   
> phoenix.schema.mapSystemTablesToNamespace
> true
>   
> 2.The Code:
> {code:java}
> resultDF.write
>  .format("org.apache.phoenix.spark")
>  .mode(SaveMode.Overwrite)
>  .option("table", "JYDW.ADDRESS_ORDERCOUNT")
>  .option("zkUrl","192.168.1.40,192.168.1.41,192.168.1.42:2181")
>  .save()
> {code}
> 3.Throw this error,help to fix it,thankyou :
> 7/08/02 01:07:25 INFO DAGScheduler: Job 6 finished: runJob at 
> SparkHadoopMapReduceWriter.scala:88, took 7.990715 s
> 17/08/02 01:07:25 ERROR SparkHadoopMapReduceWriter: Aborting job 
> job_20170802010717_0079.
> {color:#59afe1}*java.lang.IllegalArgumentException: Can not create a Path 
> from an empty string*{color}
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126)
>   at org.apache.hadoop.fs.Path.(Path.java:134)
>   at org.apache.hadoop.fs.Path.(Path.java:88)
>   at 
> org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.absPathStagingDir(HadoopMapReduceCommitProtocol.scala:58)
>   at 
> org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:132)
>   at 
> org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.write(SparkHadoopMapReduceWriter.scala:101)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1085)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>   at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1084)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply$mcV$sp(PairRDDFunctions.scala:1003)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:994)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:994)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>   at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:994)
>   at 
> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:59)
>   at 
> org.apache.phoenix.spark.DataFrameFunctions.saveToPhoenix(DataFrameFunctions.scala:28)
>   at 
> org.apache.phoenix.spark.DefaultSource.createRelation(DefaultSource.scala:47)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:472)
>   at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
>   at 
> 

[jira] [Commented] (PHOENIX-2883) Region close during automatic disabling of index for rebuilding can lead to RS abort

2018-02-06 Thread Guizhou Feng (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355080#comment-16355080
 ] 

Guizhou Feng commented on PHOENIX-2883:
---

Thanks James for the information, upgrading is hard due to integration with CDH 
which require a whole upgrade, I'm trying to make the index work with less 
effort if possible, maybe create an empty table with index and then sync data 
there to include both old and new coming data.

 

> Region close during automatic disabling of index for rebuilding can lead to 
> RS abort
> 
>
> Key: PHOENIX-2883
> URL: https://issues.apache.org/jira/browse/PHOENIX-2883
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
>
> (disclaimer: still performing due-diligence on this one)
> I've been helping a user this week with what is thought to be a race 
> condition in secondary index updates. This user has a relatively heavy 
> write-based workload with a few tables that each have at least one index.
> What we have seen is that when the region distribution is changing 
> (concretely, we were doing a rolling restart of the cluster without the load 
> balancer disabled in the hopes of retaining as much availability as 
> possible), I've seen the following general outline in the logs:
> * An index update fails (due to {{ERROR 2008 (INT10)}} the index metadata 
> cache expired or is just missing)
> * The index is taken offline to be asynchronously rebuilt
> * A flush on the data table's region is queue for quite some time
> * RS is asked to close a region (due to a move, commonly)
> * RS aborts because the memstore for the data table's region is in an 
> inconsistent state (e.g. {{Assertion failed while closing store  
>  flushableSize expected=0, actual= 193392. Current 
> memstoreSize=-552208. Maybe a coprocessor operation failed and left the 
> memstore in a partially updated state.}}
> Some relevant HBase issues include HBASE-10514 and HBASE-10844.
> Have been talking to [~ayingshu] and [~devaraj] about it, but haven't found 
> anything definitively conclusive yet. Will dump findings here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (PHOENIX-2883) Region close during automatic disabling of index for rebuilding can lead to RS abort

2018-02-06 Thread Guizhou Feng (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355030#comment-16355030
 ] 

Guizhou Feng edited comment on PHOENIX-2883 at 2/7/18 7:02 AM:
---

I encounter similar case while build index async via IndexTool

HBase Version: 1.2.0-cdh5.10.1
Phoenix Version: phoenix-4.8.0-cdh5.8.0-server.jar

Behavior Description:

    1. Create Index: CREATE INDEX "prod:log_my_phx_3_idx"
    ON "prod:log_my_phx" ("id", "version", "event_time" )
    INCLUDE(
 "name",
 "code",
 "type",
 "decision",
 "monitoring") ASYNC;
    2. Run IndexTool mapreduce job

MapReduce job run succeed, index is activated, although alter index statement 
throw NullPointerException as below

ALTER INDEX IF EXISTS "prod:log_my_phx_3_idx" ON "prod:log_my_phx" ACTIVE

18/02/06 16:26:32 INFO index.IndexToolUtil: alterQuery: ALTER INDEX IF EXISTS 
"prod:log_my_phx_3_idx" ON "prod:log_my_phx" ACTIVE
18/02/06 16:26:32 ERROR index.IndexTool: An exception occurred while performing 
the indexing job: NullPointerException:  at:
java.lang.NullPointerException
    at org.apache.phoenix.schema.PMetaDataImpl.addTable(PMetaDataImpl.java:108)
    at 
org.apache.phoenix.jdbc.PhoenixConnection.addTable(PhoenixConnection.java:903)
    at 
org.apache.phoenix.schema.MetaDataClient.addTableToCache(MetaDataClient.java:3539)
    at 
org.apache.phoenix.schema.MetaDataClient.alterIndex(MetaDataClient.java:3504)
    at 
org.apache.phoenix.jdbc.PhoenixStatement$ExecutableAlterIndexStatement$1.execute(PhoenixStatement.java:993)
    at 
org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:344)
    at 
org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:332)
    at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
    at 
org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:331)
    at 
org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1442)
    at 
org.apache.phoenix.mapreduce.index.IndexToolUtil.updateIndexState(IndexToolUtil.java:75)
    at org.apache.phoenix.mapreduce.index.IndexTool.run(IndexTool.java:245)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at org.apache.phoenix.mapreduce.index.IndexTool.main(IndexTool.java:384)


FATAL Errors in RegionServer:
2018-02-07 13:18:14,229 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Finished memstore flush of ~358.52 MB/375939816, currentsize=-101.26 
MB/-106175240 for region 
prod:log_my_phx,11_1022430660_502_V5,1517881075389.3504c1c9e68e7b0c9c1c99ea396ccb57.
 in 1849ms, sequenceid=15325637, compaction requested=true
2018-02-07 13:18:14,263 INFO 
org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region 
prod:log_my_phx,11_1022430660_502_V5,1517881075389.3504c1c9e68e7b0c9c1c99ea396ccb57.
2018-02-07 13:18:14,384 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
my-stage-hadoop-prod05-bp,60020,1516757389774: Assertion failed while closing 
store 
prod:log_my_phx,11_1022430660_502_V5,1517881075389.3504c1c9e68e7b0c9c1c99ea396ccb57.
 0. flushableSize expected=0, actual= 1212904. Current memstoreSize=-106175240. 
Maybe a coprocessor operation failed and left the memstore in a partially 
updated state.
2018-02-07 13:18:14,384 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: loaded 
coprocessors are: [org.apache.phoenix.coprocessor.SequenceRegionObserver, 
org.apache.phoenix.coprocessor.ScanRegionObserver, 
org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver, 
org.apache.phoenix.hbase.index.Indexer, 
org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver, 
org.apache.hadoop.hbase.regionserver.LocalIndexSplitter, 
org.apache.phoenix.coprocessor.ServerCachingEndpointImpl, 
org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint, 
org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]

By the way, only one of the region server abort, the abort of region server 
bring a lot of inconsistency due to region in transition and hard to recover 
with hbase hbck -repair, it took whole day to run the repair bunch of times










 


was (Author: guizhou):
I encounter similar case while build index async via IndexTool

HBase Version: 1.2.0-cdh5.10.1
Phoenix Version: phoenix-4.8.0-cdh5.8.0-server.jar

Behavior Description:

    1. Create Index: CREATE INDEX "prod:log_my_phx_3_idx"
    ON "prod:log_my_phx" ("id", "version", "event_time" )
    INCLUDE(
 "name",
 "code",
 "type",
 "decision",
 "monitoring") ASYNC;
    2. Run IndexTool mapreduce job

MapReduce job run succeed, index is activated, although alter index statement 
throw NullPointerException as below

ALTER INDEX IF EXISTS "prod:log_my_phx_3_idx" ON "prod:log_my_phx" 

[jira] [Commented] (PHOENIX-2883) Region close during automatic disabling of index for rebuilding can lead to RS abort

2018-02-06 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355038#comment-16355038
 ] 

James Taylor commented on PHOENIX-2883:
---

Hundreds of fixes to secondary indexes between 4.8 and 4.13. Would it be 
possible for you to upgrade?

> Region close during automatic disabling of index for rebuilding can lead to 
> RS abort
> 
>
> Key: PHOENIX-2883
> URL: https://issues.apache.org/jira/browse/PHOENIX-2883
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
>
> (disclaimer: still performing due-diligence on this one)
> I've been helping a user this week with what is thought to be a race 
> condition in secondary index updates. This user has a relatively heavy 
> write-based workload with a few tables that each have at least one index.
> What we have seen is that when the region distribution is changing 
> (concretely, we were doing a rolling restart of the cluster without the load 
> balancer disabled in the hopes of retaining as much availability as 
> possible), I've seen the following general outline in the logs:
> * An index update fails (due to {{ERROR 2008 (INT10)}} the index metadata 
> cache expired or is just missing)
> * The index is taken offline to be asynchronously rebuilt
> * A flush on the data table's region is queue for quite some time
> * RS is asked to close a region (due to a move, commonly)
> * RS aborts because the memstore for the data table's region is in an 
> inconsistent state (e.g. {{Assertion failed while closing store  
>  flushableSize expected=0, actual= 193392. Current 
> memstoreSize=-552208. Maybe a coprocessor operation failed and left the 
> memstore in a partially updated state.}}
> Some relevant HBase issues include HBASE-10514 and HBASE-10844.
> Have been talking to [~ayingshu] and [~devaraj] about it, but haven't found 
> anything definitively conclusive yet. Will dump findings here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-3941) Filter regions to scan for local indexes based on data table leading pk filter conditions

2018-02-06 Thread Thomas D'Silva (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355037#comment-16355037
 ] 

Thomas D'Silva commented on PHOENIX-3941:
-

+1

> Filter regions to scan for local indexes based on data table leading pk 
> filter conditions
> -
>
> Key: PHOENIX-3941
> URL: https://issues.apache.org/jira/browse/PHOENIX-3941
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
>Priority: Major
>  Labels: SFDC, localIndex
> Fix For: 4.14.0
>
> Attachments: PHOENIX-3941_v1.patch, PHOENIX-3941_v2.patch, 
> PHOENIX-3941_v3.patch
>
>
> Had a good offline conversation with [~ndimiduk] at PhoenixCon about local 
> indexes. Depending on the query, we can often times prune the regions we need 
> to scan over based on the where conditions against the data table pk. For 
> example, with a multi-tenant table, we only need to scan the regions that are 
> prefixed by the tenant ID.
> We can easily get this information from the compilation of the query against 
> the data table (which we always do), through the 
> statementContext.getScanRanges() structure. We'd just want to keep a pointer 
> to the data table QueryPlan from the local index QueryPlan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-2883) Region close during automatic disabling of index for rebuilding can lead to RS abort

2018-02-06 Thread Guizhou Feng (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355030#comment-16355030
 ] 

Guizhou Feng commented on PHOENIX-2883:
---

I encounter similar case while build index async via IndexTool

HBase Version: 1.2.0-cdh5.10.1
Phoenix Version: phoenix-4.8.0-cdh5.8.0-server.jar

Behavior Description:

    1. Create Index: CREATE INDEX "prod:log_my_phx_3_idx"
    ON "prod:log_my_phx" ("id", "version", "event_time" )
    INCLUDE(
 "name",
 "code",
 "type",
 "decision",
 "monitoring") ASYNC;
    2. Run IndexTool mapreduce job

MapReduce job run succeed, index is activated, although alter index statement 
throw NullPointerException as below

ALTER INDEX IF EXISTS "prod:log_my_phx_3_idx" ON "prod:log_my_phx" ACTIVE

18/02/06 16:26:32 INFO index.IndexToolUtil: alterQuery: ALTER INDEX IF EXISTS 
"prod:log_my_phx_3_idx" ON "prod:log_my_phx" ACTIVE
18/02/06 16:26:32 ERROR index.IndexTool: An exception occurred while performing 
the indexing job: NullPointerException:  at:
java.lang.NullPointerException
    at org.apache.phoenix.schema.PMetaDataImpl.addTable(PMetaDataImpl.java:108)
    at 
org.apache.phoenix.jdbc.PhoenixConnection.addTable(PhoenixConnection.java:903)
    at 
org.apache.phoenix.schema.MetaDataClient.addTableToCache(MetaDataClient.java:3539)
    at 
org.apache.phoenix.schema.MetaDataClient.alterIndex(MetaDataClient.java:3504)
    at 
org.apache.phoenix.jdbc.PhoenixStatement$ExecutableAlterIndexStatement$1.execute(PhoenixStatement.java:993)
    at 
org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:344)
    at 
org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:332)
    at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
    at 
org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:331)
    at 
org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1442)
    at 
org.apache.phoenix.mapreduce.index.IndexToolUtil.updateIndexState(IndexToolUtil.java:75)
    at org.apache.phoenix.mapreduce.index.IndexTool.run(IndexTool.java:245)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at org.apache.phoenix.mapreduce.index.IndexTool.main(IndexTool.java:384)


FATAL Errors in RegionServer:
ABORTING region server my-stage-hadoop-prod08-bp,60020,1504851662123: Assertion 
failed while closing store 
prod:log_my_phx,12_1022360799_801_V19,1517885909466.4a9e01d05c167dc6bdcab962763d7096.
 0. flushableSize expected=0, actual= 207088. Current memstoreSize=-114100080. 
Maybe a coprocessor operation failed and left the memstore in a partially 
updated state.

RegionServer abort: loaded coprocessors are: 
[org.apache.phoenix.coprocessor.MetaDataEndpointImpl, 
org.apache.phoenix.coprocessor.SequenceRegionObserver, 
org.apache.phoenix.coprocessor.ScanRegionObserver, 
org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver, 
org.apache.phoenix.hbase.index.Indexer, 
org.apache.phoenix.coprocessor.MetaDataRegionObserver, 
org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver, 
org.apache.hadoop.hbase.regionserver.LocalIndexSplitter, 
org.apache.phoenix.coprocessor.ServerCachingEndpointImpl, 
org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint, 
org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 


By the way, only one of the region server abort, the abort of region server 
bring a lot of inconsistency due to region in transition and hard to recover 
with hbase hbck -repair, it took whole day to run the repair bunch of times










 

> Region close during automatic disabling of index for rebuilding can lead to 
> RS abort
> 
>
> Key: PHOENIX-2883
> URL: https://issues.apache.org/jira/browse/PHOENIX-2883
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
>
> (disclaimer: still performing due-diligence on this one)
> I've been helping a user this week with what is thought to be a race 
> condition in secondary index updates. This user has a relatively heavy 
> write-based workload with a few tables that each have at least one index.
> What we have seen is that when the region distribution is changing 
> (concretely, we were doing a rolling restart of the cluster without the load 
> balancer disabled in the hopes of retaining as much availability as 
> possible), I've seen the following general outline in the logs:
> * An index update fails (due to {{ERROR 2008 (INT10)}} the index metadata 
> cache expired or is just missing)
> * The index is taken offline to be asynchronously rebuilt
> * A flush on the data table's region is queue for quite some time
> * RS 

[jira] [Commented] (PHOENIX-4278) Implement pure client side transactional index maintenance

2018-02-06 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354826#comment-16354826
 ] 

Andrew Purtell commented on PHOENIX-4278:
-

HBase 1.x supports Java 8. We do require that contributions not use 8 only 
language features because it has to run on Java 7 JREs. We also build 1.x with 
Java 7 to ensure compatibility with 7 JREs. But 8 JREs are definitely supported 
with 1.x

> Implement pure client side transactional index maintenance
> --
>
> Key: PHOENIX-4278
> URL: https://issues.apache.org/jira/browse/PHOENIX-4278
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: James Taylor
>Assignee: Ohad Shacham
>Priority: Major
>
> The index maintenance for transactions follows the same model as non 
> transactional tables - coprocessor based on data table updates that looks up 
> previous row value to perform maintenance. This is necessary for non 
> transactional tables to ensure the rows are locked so that a consistent view 
> may be obtained. However, for transactional tables, the time stamp oracle 
> ensures uniqueness of time stamps (via transaction IDs) and the filtering 
> handles a scan seeing the "true" last committed value for a row. Thus, 
> there's no hard dependency to perform this on the server side.
> Moving the index maintenance to the client side would prevent any RS->RS RPC 
> calls (which have proved to be troublesome for HBase). It would require 
> returning more data to the client (i.e. the prior row value), but this seems 
> like a reasonable tradeoff.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4231) Support restriction of remote UDF load sources

2018-02-06 Thread Ethan Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354825#comment-16354825
 ] 

Ethan Wang commented on PHOENIX-4231:
-

{quote}The setting hbase.dynamic.jars.dir can be used to restrict locations for 
jar loading but is only applied to jars loaded from the local filesystem.
{quote}
So as of today, what configuration that user need to set for 
hbase.dynamic.jars.dir, in order to restrict that only the jar from the local 
filesystem (not the network) is able to load in ?

> Support restriction of remote UDF load sources 
> ---
>
> Key: PHOENIX-4231
> URL: https://issues.apache.org/jira/browse/PHOENIX-4231
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Andrew Purtell
>Assignee: Chinmay Kulkarni
>Priority: Major
>
> When allowUserDefinedFunctions is true, users can load UDFs remotely via a 
> jar file from any HDFS filesystem reachable on the network. The setting 
> hbase.dynamic.jars.dir can be used to restrict locations for jar loading but 
> is only applied to jars loaded from the local filesystem.  We should 
> implement support for similar restriction via configuration for jars loaded 
> via hdfs:// URIs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4231) Support restriction of remote UDF load sources

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354806#comment-16354806
 ] 

ASF GitHub Bot commented on PHOENIX-4231:
-

Github user aertoria commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/292#discussion_r166491289
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/jdbc/PhoenixStatement.java ---
@@ -907,10 +909,15 @@ public MutationState execute() throws SQLException {
 try {
 FileSystem fs = 
dynamicJarsDirPath.getFileSystem(conf);
 List jarPaths = getJarPaths();
-for (LiteralParseNode jarPath : jarPaths) {
-File f = new File((String) jarPath.getValue());
-fs.copyFromLocalFile(new 
Path(f.getAbsolutePath()), new Path(
-dynamicJarsDir + f.getName()));
+for (LiteralParseNode jarPathNode : jarPaths) {
+  String jarPathName = (String) 
jarPathNode.getValue();
+  File f = new File(jarPathName);
+  Path dynamicJarsDirPathWithJar = new 
Path(dynamicJarsDir + f.getName());
+  // Copy the jar (can be local or on HDFS) to the 
hbase.dynamic.jars.dir directory.
+  // Note that this does not support HDFS URIs 
without scheme and authority.
+  Path jarPath = new Path(jarPathName);
+  FileUtil.copy(jarPath.getFileSystem(conf), 
jarPath, fs, dynamicJarsDirPathWithJar,
--- End diff --

@apurtell 
Since @ChinmaySKulkarni will be absent for a while, I will be wrap up this 
patch for the team.
do you know how is hbase.dynamic.jars.dir used differently than 
hbase.local.jars.dir


> Support restriction of remote UDF load sources 
> ---
>
> Key: PHOENIX-4231
> URL: https://issues.apache.org/jira/browse/PHOENIX-4231
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Andrew Purtell
>Assignee: Chinmay Kulkarni
>Priority: Major
>
> When allowUserDefinedFunctions is true, users can load UDFs remotely via a 
> jar file from any HDFS filesystem reachable on the network. The setting 
> hbase.dynamic.jars.dir can be used to restrict locations for jar loading but 
> is only applied to jars loaded from the local filesystem.  We should 
> implement support for similar restriction via configuration for jars loaded 
> via hdfs:// URIs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] phoenix pull request #292: PHOENIX-4231: Support restriction of remote UDF l...

2018-02-06 Thread aertoria
Github user aertoria commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/292#discussion_r166491289
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/jdbc/PhoenixStatement.java ---
@@ -907,10 +909,15 @@ public MutationState execute() throws SQLException {
 try {
 FileSystem fs = 
dynamicJarsDirPath.getFileSystem(conf);
 List jarPaths = getJarPaths();
-for (LiteralParseNode jarPath : jarPaths) {
-File f = new File((String) jarPath.getValue());
-fs.copyFromLocalFile(new 
Path(f.getAbsolutePath()), new Path(
-dynamicJarsDir + f.getName()));
+for (LiteralParseNode jarPathNode : jarPaths) {
+  String jarPathName = (String) 
jarPathNode.getValue();
+  File f = new File(jarPathName);
+  Path dynamicJarsDirPathWithJar = new 
Path(dynamicJarsDir + f.getName());
+  // Copy the jar (can be local or on HDFS) to the 
hbase.dynamic.jars.dir directory.
+  // Note that this does not support HDFS URIs 
without scheme and authority.
+  Path jarPath = new Path(jarPathName);
+  FileUtil.copy(jarPath.getFileSystem(conf), 
jarPath, fs, dynamicJarsDirPathWithJar,
--- End diff --

@apurtell 
Since @ChinmaySKulkarni will be absent for a while, I will be wrap up this 
patch for the team.
do you know how is hbase.dynamic.jars.dir used differently than 
hbase.local.jars.dir


---


[jira] [Commented] (PHOENIX-4586) UPSERT SELECT doesn't take in account comparison operators for subqueries.

2018-02-06 Thread Maryann Xue (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354794#comment-16354794
 ] 

Maryann Xue commented on PHOENIX-4586:
--

I'll take a look, [~sergey.soldatov]

> UPSERT SELECT doesn't take in account comparison operators for subqueries.
> --
>
> Key: PHOENIX-4586
> URL: https://issues.apache.org/jira/browse/PHOENIX-4586
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.14.0
>Reporter: Sergey Soldatov
>Priority: Critical
> Fix For: 4.14.0
>
>
> If upsert select has a where condition that is using any comparison operator 
> (including ANY/SOME/etc), the whole WHERE clause just ignored. Table:
> {noformat}
> create table T (id integer primary key, i1 integer);
> upsert into T values (1,1);
> upsert into T values (2,2);
> {noformat}
> Query that should not upsert anything because we have a condition in where 
> that I1 should be greater than any value we already have as well as not 
> existing ID:
> {noformat}
> 0: jdbc:phoenix:> upsert into T select id, 4 from T where id = 3 AND i1 > 
> (select i1 from T);
> 2 rows affected (0.02 seconds)
> 0: jdbc:phoenix:> select * from T;
> +-+-+
> | ID  | I1  |
> +-+-+
> | 1   | 4   |
> | 2   | 4   |
> +-+-+
> 2 rows selected (0.014 seconds)
> {noformat}
> Now with ANY.  Should not upsert anything as well because ID is [1,2], while 
> I1 are all '4':
> {noformat}
> 0: jdbc:phoenix:> upsert into T select id, 5 from T where id = 2 AND i1 = ANY 
> (select ID from T);
> 2 rows affected (0.016 seconds)
> 0: jdbc:phoenix:> select * from T;
> +-+-+
> | ID  | I1  |
> +-+-+
> | 1   | 5   |
> | 2   | 5   |
> +-+-+
> 2 rows selected (0.013 seconds)
> {noformat}
> A similar query with IN works just fine:
> {noformat}
> 0: jdbc:phoenix:> upsert into T select id, 6 from T where id = 2 AND i1 IN 
> (select ID from T);
> No rows affected (0.094 seconds)
> 0: jdbc:phoenix:> select * from T;
> +-+-+
> | ID  | I1  |
> +-+-+
> | 1   | 5   |
> | 2   | 5   |
> +-+-+
> 2 rows selected (0.014 seconds)
> {noformat}
> The reason for this behavior is that for IN we convert subselect to semi-join 
> and execute upsert on the client side.  For comparisons, we don't perform any 
> transformations and query is considered flat and finally executed on the 
> server side.  Not sure why, but we also completely ignore the second 
> condition in WHERE clause as well and that may lead to a serious data loss. 
> [~jamestaylor], [~maryannxue] any thoughts or suggestions how to fix that are 
> really appreciated. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4231) Support restriction of remote UDF load sources

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354789#comment-16354789
 ] 

ASF GitHub Bot commented on PHOENIX-4231:
-

Github user aertoria commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/292#discussion_r166487244
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/jdbc/PhoenixStatement.java ---
@@ -907,10 +909,15 @@ public MutationState execute() throws SQLException {
 try {
 FileSystem fs = 
dynamicJarsDirPath.getFileSystem(conf);
 List jarPaths = getJarPaths();
-for (LiteralParseNode jarPath : jarPaths) {
-File f = new File((String) jarPath.getValue());
-fs.copyFromLocalFile(new 
Path(f.getAbsolutePath()), new Path(
-dynamicJarsDir + f.getName()));
+for (LiteralParseNode jarPathNode : jarPaths) {
+  String jarPathName = (String) 
jarPathNode.getValue();
+  File f = new File(jarPathName);
+  Path dynamicJarsDirPathWithJar = new 
Path(dynamicJarsDir + f.getName());
+  // Copy the jar (can be local or on HDFS) to the 
hbase.dynamic.jars.dir directory.
+  // Note that this does not support HDFS URIs 
without scheme and authority.
+  Path jarPath = new Path(jarPathName);
+  FileUtil.copy(jarPath.getFileSystem(conf), 
jarPath, fs, dynamicJarsDirPathWithJar,
+false, true, conf);
--- End diff --

@ChinmaySKulkarni 
How is Hadoop.fileUtil.copy different from filesystem.copy


> Support restriction of remote UDF load sources 
> ---
>
> Key: PHOENIX-4231
> URL: https://issues.apache.org/jira/browse/PHOENIX-4231
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Andrew Purtell
>Assignee: Chinmay Kulkarni
>Priority: Major
>
> When allowUserDefinedFunctions is true, users can load UDFs remotely via a 
> jar file from any HDFS filesystem reachable on the network. The setting 
> hbase.dynamic.jars.dir can be used to restrict locations for jar loading but 
> is only applied to jars loaded from the local filesystem.  We should 
> implement support for similar restriction via configuration for jars loaded 
> via hdfs:// URIs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] phoenix pull request #292: PHOENIX-4231: Support restriction of remote UDF l...

2018-02-06 Thread aertoria
Github user aertoria commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/292#discussion_r166487244
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/jdbc/PhoenixStatement.java ---
@@ -907,10 +909,15 @@ public MutationState execute() throws SQLException {
 try {
 FileSystem fs = 
dynamicJarsDirPath.getFileSystem(conf);
 List jarPaths = getJarPaths();
-for (LiteralParseNode jarPath : jarPaths) {
-File f = new File((String) jarPath.getValue());
-fs.copyFromLocalFile(new 
Path(f.getAbsolutePath()), new Path(
-dynamicJarsDir + f.getName()));
+for (LiteralParseNode jarPathNode : jarPaths) {
+  String jarPathName = (String) 
jarPathNode.getValue();
+  File f = new File(jarPathName);
+  Path dynamicJarsDirPathWithJar = new 
Path(dynamicJarsDir + f.getName());
+  // Copy the jar (can be local or on HDFS) to the 
hbase.dynamic.jars.dir directory.
+  // Note that this does not support HDFS URIs 
without scheme and authority.
+  Path jarPath = new Path(jarPathName);
+  FileUtil.copy(jarPath.getFileSystem(conf), 
jarPath, fs, dynamicJarsDirPathWithJar,
+false, true, conf);
--- End diff --

@ChinmaySKulkarni 
How is Hadoop.fileUtil.copy different from filesystem.copy


---


CFP for Dataworks Summit, San Jose, 2018

2018-02-06 Thread Devaraj Das
All, Dataworks Summit San Jose 2018 is June 17-21. The call for  abstracts is 
open through February 9th. Tracks like Datawarehousing and Operational Data 
Store might be a good fit for HBase & Phoenix talks. You can submit an abstract 
at https://dataworkssummit.com/san-jose-2018/​​
Thanks,
Devaraj.

[jira] [Commented] (PHOENIX-4231) Support restriction of remote UDF load sources

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354739#comment-16354739
 ] 

ASF GitHub Bot commented on PHOENIX-4231:
-

Github user apurtell commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/292#discussion_r166479944
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/jdbc/PhoenixStatement.java ---
@@ -907,10 +909,15 @@ public MutationState execute() throws SQLException {
 try {
 FileSystem fs = 
dynamicJarsDirPath.getFileSystem(conf);
 List jarPaths = getJarPaths();
-for (LiteralParseNode jarPath : jarPaths) {
-File f = new File((String) jarPath.getValue());
-fs.copyFromLocalFile(new 
Path(f.getAbsolutePath()), new Path(
-dynamicJarsDir + f.getName()));
+for (LiteralParseNode jarPathNode : jarPaths) {
+  String jarPathName = (String) 
jarPathNode.getValue();
+  File f = new File(jarPathName);
+  Path dynamicJarsDirPathWithJar = new 
Path(dynamicJarsDir + f.getName());
+  // Copy the jar (can be local or on HDFS) to the 
hbase.dynamic.jars.dir directory.
+  // Note that this does not support HDFS URIs 
without scheme and authority.
+  Path jarPath = new Path(jarPathName);
+  FileUtil.copy(jarPath.getFileSystem(conf), 
jarPath, fs, dynamicJarsDirPathWithJar,
--- End diff --

Actually, doesn't this imply the client should have write perms to 
hbase.dynamic.jars.dir? We don't want to allow arbitrary clients to write 
there. 


> Support restriction of remote UDF load sources 
> ---
>
> Key: PHOENIX-4231
> URL: https://issues.apache.org/jira/browse/PHOENIX-4231
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Andrew Purtell
>Assignee: Chinmay Kulkarni
>Priority: Major
>
> When allowUserDefinedFunctions is true, users can load UDFs remotely via a 
> jar file from any HDFS filesystem reachable on the network. The setting 
> hbase.dynamic.jars.dir can be used to restrict locations for jar loading but 
> is only applied to jars loaded from the local filesystem.  We should 
> implement support for similar restriction via configuration for jars loaded 
> via hdfs:// URIs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] phoenix pull request #292: PHOENIX-4231: Support restriction of remote UDF l...

2018-02-06 Thread apurtell
Github user apurtell commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/292#discussion_r166479944
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/jdbc/PhoenixStatement.java ---
@@ -907,10 +909,15 @@ public MutationState execute() throws SQLException {
 try {
 FileSystem fs = 
dynamicJarsDirPath.getFileSystem(conf);
 List jarPaths = getJarPaths();
-for (LiteralParseNode jarPath : jarPaths) {
-File f = new File((String) jarPath.getValue());
-fs.copyFromLocalFile(new 
Path(f.getAbsolutePath()), new Path(
-dynamicJarsDir + f.getName()));
+for (LiteralParseNode jarPathNode : jarPaths) {
+  String jarPathName = (String) 
jarPathNode.getValue();
+  File f = new File(jarPathName);
+  Path dynamicJarsDirPathWithJar = new 
Path(dynamicJarsDir + f.getName());
+  // Copy the jar (can be local or on HDFS) to the 
hbase.dynamic.jars.dir directory.
+  // Note that this does not support HDFS URIs 
without scheme and authority.
+  Path jarPath = new Path(jarPathName);
+  FileUtil.copy(jarPath.getFileSystem(conf), 
jarPath, fs, dynamicJarsDirPathWithJar,
--- End diff --

Actually, doesn't this imply the client should have write perms to 
hbase.dynamic.jars.dir? We don't want to allow arbitrary clients to write 
there. 


---


[jira] [Commented] (PHOENIX-4231) Support restriction of remote UDF load sources

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354737#comment-16354737
 ] 

ASF GitHub Bot commented on PHOENIX-4231:
-

Github user apurtell commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/292#discussion_r166479267
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/jdbc/PhoenixStatement.java ---
@@ -907,10 +909,15 @@ public MutationState execute() throws SQLException {
 try {
 FileSystem fs = 
dynamicJarsDirPath.getFileSystem(conf);
 List jarPaths = getJarPaths();
-for (LiteralParseNode jarPath : jarPaths) {
-File f = new File((String) jarPath.getValue());
-fs.copyFromLocalFile(new 
Path(f.getAbsolutePath()), new Path(
-dynamicJarsDir + f.getName()));
+for (LiteralParseNode jarPathNode : jarPaths) {
+  String jarPathName = (String) 
jarPathNode.getValue();
+  File f = new File(jarPathName);
+  Path dynamicJarsDirPathWithJar = new 
Path(dynamicJarsDir + f.getName());
+  // Copy the jar (can be local or on HDFS) to the 
hbase.dynamic.jars.dir directory.
+  // Note that this does not support HDFS URIs 
without scheme and authority.
+  Path jarPath = new Path(jarPathName);
+  FileUtil.copy(jarPath.getFileSystem(conf), 
jarPath, fs, dynamicJarsDirPathWithJar,
--- End diff --

If the client does not have perms to write to hbase.dynamic.jars.dir (and I 
expect normally clients will not have write perms to this directory, only admin 
clients will have it), the copy will fail and throw an IOException. The result 
may not be user friendly, though. Did you try this? What happens? 


> Support restriction of remote UDF load sources 
> ---
>
> Key: PHOENIX-4231
> URL: https://issues.apache.org/jira/browse/PHOENIX-4231
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Andrew Purtell
>Assignee: Chinmay Kulkarni
>Priority: Major
>
> When allowUserDefinedFunctions is true, users can load UDFs remotely via a 
> jar file from any HDFS filesystem reachable on the network. The setting 
> hbase.dynamic.jars.dir can be used to restrict locations for jar loading but 
> is only applied to jars loaded from the local filesystem.  We should 
> implement support for similar restriction via configuration for jars loaded 
> via hdfs:// URIs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] phoenix pull request #292: PHOENIX-4231: Support restriction of remote UDF l...

2018-02-06 Thread apurtell
Github user apurtell commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/292#discussion_r166479267
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/jdbc/PhoenixStatement.java ---
@@ -907,10 +909,15 @@ public MutationState execute() throws SQLException {
 try {
 FileSystem fs = 
dynamicJarsDirPath.getFileSystem(conf);
 List jarPaths = getJarPaths();
-for (LiteralParseNode jarPath : jarPaths) {
-File f = new File((String) jarPath.getValue());
-fs.copyFromLocalFile(new 
Path(f.getAbsolutePath()), new Path(
-dynamicJarsDir + f.getName()));
+for (LiteralParseNode jarPathNode : jarPaths) {
+  String jarPathName = (String) 
jarPathNode.getValue();
+  File f = new File(jarPathName);
+  Path dynamicJarsDirPathWithJar = new 
Path(dynamicJarsDir + f.getName());
+  // Copy the jar (can be local or on HDFS) to the 
hbase.dynamic.jars.dir directory.
+  // Note that this does not support HDFS URIs 
without scheme and authority.
+  Path jarPath = new Path(jarPathName);
+  FileUtil.copy(jarPath.getFileSystem(conf), 
jarPath, fs, dynamicJarsDirPathWithJar,
--- End diff --

If the client does not have perms to write to hbase.dynamic.jars.dir (and I 
expect normally clients will not have write perms to this directory, only admin 
clients will have it), the copy will fail and throw an IOException. The result 
may not be user friendly, though. Did you try this? What happens? 


---


[jira] [Created] (PHOENIX-4587) Store child links in separate table from system catalog

2018-02-06 Thread James Taylor (JIRA)
James Taylor created PHOENIX-4587:
-

 Summary: Store child links in separate table from system catalog
 Key: PHOENIX-4587
 URL: https://issues.apache.org/jira/browse/PHOENIX-4587
 Project: Phoenix
  Issue Type: Improvement
Reporter: James Taylor


Because there can be so many child links (in particular from a global table or 
view), we should store them in a separate table without a split policy that 
keeps them together.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-3941) Filter regions to scan for local indexes based on data table leading pk filter conditions

2018-02-06 Thread Thomas D'Silva (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354654#comment-16354654
 ] 

Thomas D'Silva commented on PHOENIX-3941:
-

Sure I can review it.

> Filter regions to scan for local indexes based on data table leading pk 
> filter conditions
> -
>
> Key: PHOENIX-3941
> URL: https://issues.apache.org/jira/browse/PHOENIX-3941
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
>Priority: Major
>  Labels: SFDC, localIndex
> Fix For: 4.14.0
>
> Attachments: PHOENIX-3941_v1.patch, PHOENIX-3941_v2.patch, 
> PHOENIX-3941_v3.patch
>
>
> Had a good offline conversation with [~ndimiduk] at PhoenixCon about local 
> indexes. Depending on the query, we can often times prune the regions we need 
> to scan over based on the where conditions against the data table pk. For 
> example, with a multi-tenant table, we only need to scan the regions that are 
> prefixed by the tenant ID.
> We can easily get this information from the compilation of the query against 
> the data table (which we always do), through the 
> statementContext.getScanRanges() structure. We'd just want to keep a pointer 
> to the data table QueryPlan from the local index QueryPlan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PHOENIX-4586) UPSERT SELECT doesn't take in account comparison operators for subqueries.

2018-02-06 Thread Sergey Soldatov (JIRA)
Sergey Soldatov created PHOENIX-4586:


 Summary: UPSERT SELECT doesn't take in account comparison 
operators for subqueries.
 Key: PHOENIX-4586
 URL: https://issues.apache.org/jira/browse/PHOENIX-4586
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.14.0
Reporter: Sergey Soldatov
 Fix For: 4.14.0


If upsert select has a where condition that is using any comparison operator 
(including ANY/SOME/etc), the whole WHERE clause just ignored. Table:
{noformat}
create table T (id integer primary key, i1 integer);
upsert into T values (1,1);
upsert into T values (2,2);
{noformat}
Query that should not upsert anything because we have a condition in where that 
I1 should be greater than any value we already have as well as not existing ID:
{noformat}
0: jdbc:phoenix:> upsert into T select id, 4 from T where id = 3 AND i1 > 
(select i1 from T);
2 rows affected (0.02 seconds)
0: jdbc:phoenix:> select * from T;
+-+-+
| ID  | I1  |
+-+-+
| 1   | 4   |
| 2   | 4   |
+-+-+
2 rows selected (0.014 seconds)
{noformat}
Now with ANY.  Should not upsert anything as well because ID is [1,2], while I1 
are all '4':
{noformat}
0: jdbc:phoenix:> upsert into T select id, 5 from T where id = 2 AND i1 = ANY 
(select ID from T);
2 rows affected (0.016 seconds)
0: jdbc:phoenix:> select * from T;
+-+-+
| ID  | I1  |
+-+-+
| 1   | 5   |
| 2   | 5   |
+-+-+
2 rows selected (0.013 seconds)
{noformat}
A similar query with IN works just fine:
{noformat}
0: jdbc:phoenix:> upsert into T select id, 6 from T where id = 2 AND i1 IN 
(select ID from T);
No rows affected (0.094 seconds)
0: jdbc:phoenix:> select * from T;
+-+-+
| ID  | I1  |
+-+-+
| 1   | 5   |
| 2   | 5   |
+-+-+
2 rows selected (0.014 seconds)
{noformat}

The reason for this behavior is that for IN we convert subselect to semi-join 
and execute upsert on the client side.  For comparisons, we don't perform any 
transformations and query is considered flat and finally executed on the server 
side.  Not sure why, but we also completely ignore the second condition in 
WHERE clause as well and that may lead to a serious data loss. 
[~jamestaylor], [~maryannxue] any thoughts or suggestions how to fix that are 
really appreciated. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4278) Implement pure client side transactional index maintenance

2018-02-06 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354306#comment-16354306
 ] 

James Taylor commented on PHOENIX-4278:
---

What JVM are you using, [~ohads]? We're still stuck on JDK 7 since HBase 1.x 
releases are still on it. If you're using JDK 8, it'd be worth trying 7 instead.

Of the list above, the only one that's a little of a concern is this one:
{code}
[*ERROR*]   AggregateIT.testAvgGroupByOrderPreservingWithStats:432 
expected:<13> but was:<8>
{code}
However, if you get the same error without your patch, I'm a bit less 
concerned. FYI, these test pass in our Jenkins build: 
https://builds.apache.org/job/Phoenix-4.x-HBase-1.3/27/

> Implement pure client side transactional index maintenance
> --
>
> Key: PHOENIX-4278
> URL: https://issues.apache.org/jira/browse/PHOENIX-4278
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: James Taylor
>Assignee: Ohad Shacham
>Priority: Major
>
> The index maintenance for transactions follows the same model as non 
> transactional tables - coprocessor based on data table updates that looks up 
> previous row value to perform maintenance. This is necessary for non 
> transactional tables to ensure the rows are locked so that a consistent view 
> may be obtained. However, for transactional tables, the time stamp oracle 
> ensures uniqueness of time stamps (via transaction IDs) and the filtering 
> handles a scan seeing the "true" last committed value for a row. Thus, 
> there's no hard dependency to perform this on the server side.
> Moving the index maintenance to the client side would prevent any RS->RS RPC 
> calls (which have proved to be troublesome for HBase). It would require 
> returning more data to the client (i.e. the prior row value), but this seems 
> like a reasonable tradeoff.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-3941) Filter regions to scan for local indexes based on data table leading pk filter conditions

2018-02-06 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354295#comment-16354295
 ] 

James Taylor commented on PHOENIX-3941:
---

[~tdsilva] - would you have some spare cycles to review? The only case we're 
not handling is when a join uses a local index. That follow up work will be 
done in PHOENIX-4585.

> Filter regions to scan for local indexes based on data table leading pk 
> filter conditions
> -
>
> Key: PHOENIX-3941
> URL: https://issues.apache.org/jira/browse/PHOENIX-3941
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
>Priority: Major
>  Labels: SFDC, localIndex
> Fix For: 4.14.0
>
> Attachments: PHOENIX-3941_v1.patch, PHOENIX-3941_v2.patch, 
> PHOENIX-3941_v3.patch
>
>
> Had a good offline conversation with [~ndimiduk] at PhoenixCon about local 
> indexes. Depending on the query, we can often times prune the regions we need 
> to scan over based on the where conditions against the data table pk. For 
> example, with a multi-tenant table, we only need to scan the regions that are 
> prefixed by the tenant ID.
> We can easily get this information from the compilation of the query against 
> the data table (which we always do), through the 
> statementContext.getScanRanges() structure. We'd just want to keep a pointer 
> to the data table QueryPlan from the local index QueryPlan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-3941) Filter regions to scan for local indexes based on data table leading pk filter conditions

2018-02-06 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-3941:
--
Attachment: PHOENIX-3941_v3.patch

> Filter regions to scan for local indexes based on data table leading pk 
> filter conditions
> -
>
> Key: PHOENIX-3941
> URL: https://issues.apache.org/jira/browse/PHOENIX-3941
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
>Priority: Major
>  Labels: SFDC, localIndex
> Fix For: 4.14.0
>
> Attachments: PHOENIX-3941_v1.patch, PHOENIX-3941_v2.patch, 
> PHOENIX-3941_v3.patch
>
>
> Had a good offline conversation with [~ndimiduk] at PhoenixCon about local 
> indexes. Depending on the query, we can often times prune the regions we need 
> to scan over based on the where conditions against the data table pk. For 
> example, with a multi-tenant table, we only need to scan the regions that are 
> prefixed by the tenant ID.
> We can easily get this information from the compilation of the query against 
> the data table (which we always do), through the 
> statementContext.getScanRanges() structure. We'd just want to keep a pointer 
> to the data table QueryPlan from the local index QueryPlan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (PHOENIX-4585) Prune local index regions used for join queries

2018-02-06 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor reassigned PHOENIX-4585:
-

Assignee: Maryann Xue

> Prune local index regions used for join queries
> ---
>
> Key: PHOENIX-4585
> URL: https://issues.apache.org/jira/browse/PHOENIX-4585
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: James Taylor
>Assignee: Maryann Xue
>Priority: Major
>
> Some remaining work from PHOENIX-3941: we currently do not capture the data 
> plan as part of the index plan due to the way in which we rewrite the 
> statement during join processing. See comment here for more detail: 
> https://issues.apache.org/jira/browse/PHOENIX-3941?focusedCommentId=16351017=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16351017



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-3941) Filter regions to scan for local indexes based on data table leading pk filter conditions

2018-02-06 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354161#comment-16354161
 ] 

James Taylor commented on PHOENIX-3941:
---

Thanks, [~maryannxue]. I've filed PHOENIX-4585 for this follow up work. It's 
fine for PHOENIX-1556 to go in first.

> Filter regions to scan for local indexes based on data table leading pk 
> filter conditions
> -
>
> Key: PHOENIX-3941
> URL: https://issues.apache.org/jira/browse/PHOENIX-3941
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
>Priority: Major
>  Labels: SFDC, localIndex
> Fix For: 4.14.0
>
> Attachments: PHOENIX-3941_v1.patch, PHOENIX-3941_v2.patch
>
>
> Had a good offline conversation with [~ndimiduk] at PhoenixCon about local 
> indexes. Depending on the query, we can often times prune the regions we need 
> to scan over based on the where conditions against the data table pk. For 
> example, with a multi-tenant table, we only need to scan the regions that are 
> prefixed by the tenant ID.
> We can easily get this information from the compilation of the query against 
> the data table (which we always do), through the 
> statementContext.getScanRanges() structure. We'd just want to keep a pointer 
> to the data table QueryPlan from the local index QueryPlan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PHOENIX-4585) Prune local index regions used for join queries

2018-02-06 Thread James Taylor (JIRA)
James Taylor created PHOENIX-4585:
-

 Summary: Prune local index regions used for join queries
 Key: PHOENIX-4585
 URL: https://issues.apache.org/jira/browse/PHOENIX-4585
 Project: Phoenix
  Issue Type: Improvement
Reporter: James Taylor


Some remaining work from PHOENIX-3941: we currently do not capture the data 
plan as part of the index plan due to the way in which we rewrite the statement 
during join processing. See comment here for more detail: 
https://issues.apache.org/jira/browse/PHOENIX-3941?focusedCommentId=16351017=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16351017



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4529) Users should only require RX access to SYSTEM.SEQUENCE table

2018-02-06 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354160#comment-16354160
 ] 

James Taylor commented on PHOENIX-4529:
---

I can see this issue is good to implement for completeness, but is it really 
that important, [~tdsilva]? Worst case, a user could call NEXT VALUE FOR on 
another user's sequence. That's not the big of deal IMHO.

> Users should only require RX access to SYSTEM.SEQUENCE table
> 
>
> Key: PHOENIX-4529
> URL: https://issues.apache.org/jira/browse/PHOENIX-4529
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Karan Mehta
>Assignee: Thomas D'Silva
>Priority: Major
>
> Currently, users don't need to have Write access to {{SYSTEM.CATALOG}} and 
> other tables, since the code is run on the server side as login user. However 
> for {{SYSTEM.SEQUENCE}}, write permission is still needed. This is a 
> potential security concern, since it allows anyone to modify the sequences 
> created by others. This JIRA is to discuss how we can improve the security of 
> this table. 
> Potential options include
> 1. Usage of HBase Cell Level Permissions (works only with HFile version 3 and 
> above)
> 2. AccessControl at Phoenix Layer by addition of user column in the 
> {{SYSTEM.SEQUENCE}} table and use it for access control (Can be error-prone 
> for complex scenarios like sequence sharing)
> Please advice.
> [~tdsilva] [~jamestaylor] [~apurtell] [~an...@apache.org] [~elserj]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)