Re: Potential Thrift Server Bug on Spark SQL,perhaps with cache table?

Cheng Lian Mon, 25 Aug 2014 12:42:36 -0700

Hi John,

I tried to follow your description but failed to reproduce this issue.
Would you mind to provide some more details? Especially:


   -

   Exact Git commit hash of the snapshot version you were using

   Mine: e0f946265b9ea5bc48849cf7794c2c03d5e29fba
   
<https://github.com/apache/spark/commit/e0f946265b9ea5bc48849cf7794c2c03d5e29fba>


   -

   Compilation flags (Hadoop version, profiles enabled, etc.)

   Mine:

   ./sbt/sbt -Pyarn,kinesis-asl,hive,hadoop-2.3 -Dhadoop.version=2.3.0
clean assembly/assembly

    -

   Also, it would be great if you can provide the schema of your table plus
   some sample data that can help reproduce this issue.

Cheng



On Wed, Aug 20, 2014 at 6:11 AM, John Omernik <j...@omernik.com> wrote:

> I am working with Spark SQL and the Thrift server.  I ran into an
> interesting bug, and I am curious on what information/testing I can provide
> to help narrow things down.
>
> My setup is as follows:
>
> Hive 0.12 with a table that has lots of columns (50+) stored as rcfile.
> Spark-1.1.0-SNAPSHOT with Hive Built in (and Thrift Server)
>
> My query is only selecting one STRING column from the data, but only
> returning data based on other columns .
>
> Types:
> col1 = STRING
> col2 = STRING
> col3 = STRING
> col4 = Partition Field (TYPE STRING)
>
> Queries
> cache table table1;
> --Run some other queries on other data
> select col1 from table1
> where col2 = 'foo' and col3 = 'bar' and col4 = 'foobar' and col1 is not
> null limit 100
>
> Fairly simple query.
>
> When I run this in SQL Squirrel I get no results. When I remove the and
> col1 is not null I get 100 rows of <null>
>
> When I run this in beeline (the one that is in the spark-1.1.0-SNAPSHOT) I
> get no results and when I remove 'and col1 is not null' I gett 100 rows of
> <null>
>
> Note: Both of these are after I ran some other queries.. .i.e. on other
> columns, after I ran CACHE TABLE TABLE1 first before any queries. That
> seemed interesting to me...
>
> So I went to the spark-shell to determine if it was a spark issue, or a
> thrift issue.
>
> I ran:
> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
> import hiveContext._
> cacheTable("table1")
>
> Then I ran the same "other" queries" got results, and then I ran the query
> above, and I got results as expected.
>
> Interestingly enough, if I don't cache the table through cache table
> table1 in thrift, I get results for all queries. If I uncache, I start
> getting results again.
>
> I hope I was clear enough here, I am happy to help however I can.
>
> John
>
>
>

Re: Potential Thrift Server Bug on Spark SQL,perhaps with cache table?

Reply via email to