[jira] [Updated] (PHOENIX-5035) phoenix-spark dataframe filtes date or timestamp type with error

2018-11-23 Thread zhongyuhai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhongyuhai updated PHOENIX-5035:

Attachment: (was: PHOENIX-5035.patch)

> phoenix-spark dataframe filtes date or timestamp type with error
> 
>
> Key: PHOENIX-5035
> URL: https://issues.apache.org/jira/browse/PHOENIX-5035
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.13.0, 4.14.0, 4.13.1, 5.0.0, 4.14.1
> Environment: HBase:apache 1.2
> Phoenix:4.13.1-HBase-1.2
> Hadoop:CDH 2.6
> Spark:2.3.1
>Reporter: zhongyuhai
>Priority: Critical
>  Labels: patch, pull-request-available
> Attachments: PHOENIX-5035.patch, table desc.png
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> *table desc as following:*
> attach "table desc.png"
>  
> *code as following:*
> val df = SparkUtil.getActiveSession().read.format( 
> "org.apache.phoenix.spark").options(options).load()
> df.filter("INCREATEDDATE = date'2018-07-14'")
>  
> *exception as following:*
> java.lang.RuntimeException: org.apache.phoenix.schema.TypeMismatchException: 
> ERROR 203 (22005): Type mismatch. DATE and BIGINT for "INCREATEDDATE" = 1997
>  at 
> org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:201)
>  at 
> org.apache.phoenix.mapreduce.PhoenixInputFormat.getSplits(PhoenixInputFormat.java:87)
>  at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:127)
>  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
>  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
>  
> *analyse as following:*
> In the org.apache.phoenix.spark.PhoenixRelation.compileValue(value: Any): Any 
> ,
>  
>  
> {code:java}
> private def compileValue(value: Any): Any = {
> value match {
> case stringValue: String => s"'${escapeStringConstant(stringValue)}'"
> // Borrowed from 'elasticsearch-hadoop', support these internal UTF types 
> across Spark versions
> // Spark 1.4
> case utf if (isClass(utf, "org.apache.spark.sql.types.UTF8String")) => 
> s"'${escapeStringConstant(utf.toString)}'"
> // Spark 1.5
> case utf if (isClass(utf, "org.apache.spark.unsafe.types.UTF8String")) => 
> s"'${escapeStringConstant(utf.toString)}'"
>  
> // Pass through anything else
> case _ => value
> }
> {code}
>  
> It only handles the String type , other type returns the toString。It makes 
> the Spark filte condition "INCREATEDDATE = date'2018-07-14'" translate to 
> Phoenix filte condition like "INCREATEDDATE = 2018-07-14" ,so Phoenix can not 
> run with this syntax and throw the exception ERROR 203 (22005): Type 
> mismatch. DATE and BIGINT for "INCREATEDDATE" = 1997 。
> *soluation as following:*
> add handle to other type just like Date 、Timestamp 
> {code:java}
> private def compileValue(value: Any): Any = {
> value match {
> case stringValue: String => s"'${escapeStringConstant(stringValue)}'"
> // Borrowed from 'elasticsearch-hadoop', support these internal UTF types 
> across Spark versions
> // Spark 1.4
> case utf if (isClass(utf, "org.apache.spark.sql.types.UTF8String")) => 
> s"'${escapeStringConstant(utf.toString)}'"
> // Spark 1.5
> case utf if (isClass(utf, "org.apache.spark.unsafe.types.UTF8String")) => 
> s"'${escapeStringConstant(utf.toString)}'"
> case d if(isClass(d , "java.lang.Date") || isClass(d , "java.sql.Date")) => {
> val config: Configuration = 
> HBaseFactoryProvider.getConfigurationFactory.getConfiguration
> val dateFormat = config.get(QueryServices.DATE_FORMAT_ATTRIB, 
> DateUtil.DEFAULT_DATE_FORMAT)
> val df = new SimpleDateFormat(dateFormat)
> s"date'${df.format(d)}'"
> }
> case dt if(isClass(dt , "java.sql.Timestamp")) => {
> val config: Configuration = 
> HBaseFactoryProvider.getConfigurationFactory.getConfiguration
> val dateTimeFormat = config.get(QueryServices.TIMESTAMP_FORMAT_ATTRIB, 
> DateUtil.DEFAULT_TIMESTAMP_FORMAT)
> val df = new SimpleDateFormat(dateTimeFormat)
> s"timestamp'${df.format(dt)}'"
> }
> // Pass through anything else
> case _ => value
> }
> }
> {code}
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-5035) phoenix-spark dataframe filtes date or timestamp type with error

2018-11-20 Thread zhongyuhai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhongyuhai updated PHOENIX-5035:

Attachment: (was: patch)

> phoenix-spark dataframe filtes date or timestamp type with error
> 
>
> Key: PHOENIX-5035
> URL: https://issues.apache.org/jira/browse/PHOENIX-5035
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.13.0, 4.14.0, 4.13.1, 5.0.0, 4.14.1
> Environment: HBase:apache 1.2
> Phoenix:4.13.1-HBase-1.2
> Hadoop:CDH 2.6
> Spark:2.3.1
>Reporter: zhongyuhai
>Priority: Critical
>  Labels: patch, pull-request-available
> Attachments: PHOENIX-5035.patch, table desc.png
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> *table desc as following:*
> attach "table desc.png"
>  
> *code as following:*
> val df = SparkUtil.getActiveSession().read.format( 
> "org.apache.phoenix.spark").options(options).load()
> df.filter("INCREATEDDATE = date'2018-07-14'")
>  
> *exception as following:*
> java.lang.RuntimeException: org.apache.phoenix.schema.TypeMismatchException: 
> ERROR 203 (22005): Type mismatch. DATE and BIGINT for "INCREATEDDATE" = 1997
>  at 
> org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:201)
>  at 
> org.apache.phoenix.mapreduce.PhoenixInputFormat.getSplits(PhoenixInputFormat.java:87)
>  at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:127)
>  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
>  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
>  
> *analyse as following:*
> In the org.apache.phoenix.spark.PhoenixRelation.compileValue(value: Any): Any 
> ,
>  
>  
> {code:java}
> private def compileValue(value: Any): Any = {
> value match {
> case stringValue: String => s"'${escapeStringConstant(stringValue)}'"
> // Borrowed from 'elasticsearch-hadoop', support these internal UTF types 
> across Spark versions
> // Spark 1.4
> case utf if (isClass(utf, "org.apache.spark.sql.types.UTF8String")) => 
> s"'${escapeStringConstant(utf.toString)}'"
> // Spark 1.5
> case utf if (isClass(utf, "org.apache.spark.unsafe.types.UTF8String")) => 
> s"'${escapeStringConstant(utf.toString)}'"
>  
> // Pass through anything else
> case _ => value
> }
> {code}
>  
> It only handles the String type , other type returns the toString。It makes 
> the Spark filte condition "INCREATEDDATE = date'2018-07-14'" translate to 
> Phoenix filte condition like "INCREATEDDATE = 2018-07-14" ,so Phoenix can not 
> run with this syntax and throw the exception ERROR 203 (22005): Type 
> mismatch. DATE and BIGINT for "INCREATEDDATE" = 1997 。
> *soluation as following:*
> add handle to other type just like Date 、Timestamp 
> {code:java}
> private def compileValue(value: Any): Any = {
> value match {
> case stringValue: String => s"'${escapeStringConstant(stringValue)}'"
> // Borrowed from 'elasticsearch-hadoop', support these internal UTF types 
> across Spark versions
> // Spark 1.4
> case utf if (isClass(utf, "org.apache.spark.sql.types.UTF8String")) => 
> s"'${escapeStringConstant(utf.toString)}'"
> // Spark 1.5
> case utf if (isClass(utf, "org.apache.spark.unsafe.types.UTF8String")) => 
> s"'${escapeStringConstant(utf.toString)}'"
> case d if(isClass(d , "java.lang.Date") || isClass(d , "java.sql.Date")) => {
> val config: Configuration = 
> HBaseFactoryProvider.getConfigurationFactory.getConfiguration
> val dateFormat = config.get(QueryServices.DATE_FORMAT_ATTRIB, 
> DateUtil.DEFAULT_DATE_FORMAT)
> val df = new SimpleDateFormat(dateFormat)
> s"date'${df.format(d)}'"
> }
> case dt if(isClass(dt , "java.sql.Timestamp")) => {
> val config: Configuration = 
> HBaseFactoryProvider.getConfigurationFactory.getConfiguration
> val dateTimeFormat = config.get(QueryServices.TIMESTAMP_FORMAT_ATTRIB, 
> DateUtil.DEFAULT_TIMESTAMP_FORMAT)
> val df = new SimpleDateFormat(dateTimeFormat)
> s"timestamp'${df.format(dt)}'"
> }
> // Pass through anything else
> case _ => value
> }
> }
> {code}
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PHOENIX-5035) phoenix-spark dataframe filtes date or timestamp type with error

2018-11-20 Thread zhongyuhai (JIRA)
zhongyuhai created PHOENIX-5035:
---

 Summary: phoenix-spark dataframe filtes date or timestamp type 
with error
 Key: PHOENIX-5035
 URL: https://issues.apache.org/jira/browse/PHOENIX-5035
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.14.1, 5.0.0, 4.13.1, 4.14.0, 4.13.0
 Environment: HBase:apache 1.2

Phoenix:4.13.1-HBase-1.2

Hadoop:CDH 2.6

Spark:2.3.1
Reporter: zhongyuhai
 Attachments: table desc.png

*table desc as following:*

attach "table desc.png"

 

*code as following:*

val df = SparkUtil.getActiveSession().read.format( 
"org.apache.phoenix.spark").options(options).load()

df.filter("INCREATEDDATE = date'2018-07-14'")

 

*exception as following:*

java.lang.RuntimeException: org.apache.phoenix.schema.TypeMismatchException: 
ERROR 203 (22005): Type mismatch. DATE and BIGINT for "INCREATEDDATE" = 1997
 at 
org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:201)
 at 
org.apache.phoenix.mapreduce.PhoenixInputFormat.getSplits(PhoenixInputFormat.java:87)
 at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:127)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)

 

*analyse as following:*

In the org.apache.phoenix.spark.PhoenixRelation.compileValue(value: Any): Any ,

 

 
{code:java}
private def compileValue(value: Any): Any = {
value match {
case stringValue: String => s"'${escapeStringConstant(stringValue)}'"

// Borrowed from 'elasticsearch-hadoop', support these internal UTF types 
across Spark versions
// Spark 1.4
case utf if (isClass(utf, "org.apache.spark.sql.types.UTF8String")) => 
s"'${escapeStringConstant(utf.toString)}'"
// Spark 1.5
case utf if (isClass(utf, "org.apache.spark.unsafe.types.UTF8String")) => 
s"'${escapeStringConstant(utf.toString)}'"

 

// Pass through anything else
case _ => value
}
{code}
 

It only handles the String type , other type returns the toString。It makes the 
Spark filte condition "INCREATEDDATE = date'2018-07-14'" translate to Phoenix 
filte condition like "INCREATEDDATE = 2018-07-14" ,so Phoenix can not run with 
this syntax and throw the exception ERROR 203 (22005): Type mismatch. DATE and 
BIGINT for "INCREATEDDATE" = 1997 。

*soluation as following:*

add handle to other type just like Date 、Timestamp 
{code:java}
private def compileValue(value: Any): Any = {
value match {
case stringValue: String => s"'${escapeStringConstant(stringValue)}'"

// Borrowed from 'elasticsearch-hadoop', support these internal UTF types 
across Spark versions
// Spark 1.4
case utf if (isClass(utf, "org.apache.spark.sql.types.UTF8String")) => 
s"'${escapeStringConstant(utf.toString)}'"
// Spark 1.5
case utf if (isClass(utf, "org.apache.spark.unsafe.types.UTF8String")) => 
s"'${escapeStringConstant(utf.toString)}'"

case d if(isClass(d , "java.lang.Date") || isClass(d , "java.sql.Date")) => {
val config: Configuration = 
HBaseFactoryProvider.getConfigurationFactory.getConfiguration
val dateFormat = config.get(QueryServices.DATE_FORMAT_ATTRIB, 
DateUtil.DEFAULT_DATE_FORMAT)
val df = new SimpleDateFormat(dateFormat)
s"date'${df.format(d)}'"
}

case dt if(isClass(dt , "java.sql.Timestamp")) => {
val config: Configuration = 
HBaseFactoryProvider.getConfigurationFactory.getConfiguration
val dateTimeFormat = config.get(QueryServices.TIMESTAMP_FORMAT_ATTRIB, 
DateUtil.DEFAULT_TIMESTAMP_FORMAT)
val df = new SimpleDateFormat(dateTimeFormat)
s"timestamp'${df.format(dt)}'"
}

// Pass through anything else
case _ => value
}
}
{code}
 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)