GitHub user sarutak opened a pull request:
https://github.com/apache/spark/pull/3083
[SPARK-4213] ParquetFilters - No support for LT, LTE, GT, GTE operators
Following description is quoted from JIRA:
When I issue a hql query against a HiveContext where my predicate uses a
column of string type with one of LT, LTE, GT, or GTE operator, I get the
following error:
scala.MatchError: StringType (of class
org.apache.spark.sql.catalyst.types.StringType$)
Looking at the code in org.apache.spark.sql.parquet.ParquetFilters,
StringType is absent from the corresponding functions for creating these
filters.
To reproduce, in a Hive 0.13.1 shell, I created the following table (at a
specified DB):
create table sparkbug (
id int,
event string
) stored as parquet;
Insert some sample data:
insert into table sparkbug select 1, '2011-06-18' from <some table> limit 1;
insert into table sparkbug select 2, '2012-01-01' from <some table> limit 1;
Launch a spark shell and create a HiveContext to the metastore where the
table above is located.
import org.apache.spark.sql._
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.hive.HiveContext
val hc = new HiveContext(sc)
hc.setConf("spark.sql.shuffle.partitions", "10")
hc.setConf("spark.sql.hive.convertMetastoreParquet", "true")
hc.setConf("spark.sql.parquet.compression.codec", "snappy")
import hc._
hc.hql("select * from <db>.sparkbug where event >= '2011-12-01'")
A scala.MatchError will appear in the output.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sarutak/spark SPARK-4213
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/3083.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3083
----
commit 9a1fae7fc87ad32af78ba843ab8a4457a9f8c067
Author: Kousuke Saruta <[email protected]>
Date: 2014-11-04T00:50:46Z
Fixed ParquetFilters so that compare Strings
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]