Re: Queries to custom serde return 'NULL' until hiveserver2 restart

2018-09-11 Thread Jason Gerlowski
Hi all,

Thanks for the suggestion Gopal.  It turns out the error occurs on
both "SELECT *" and "SELECT col" queries.  The only sort of query that
seems safe are those with aggregations or other things that cause them
to be run as mr tasks (e.g. "SELECT SUM(price_f) FROM
my_external_table").  Logging out the column names as you suggested
doesn't turn up anything unexpected either.

I've tracked the unexpected 'NULL' values down to an early exit from
my SerDe's deserialize() method.  The first thing deserialize() does
is make sure that the received Writable can be cast to the particular
type it expects (LWDocumentWritable).  In my case, this instanceof
check is failing.  The method returns 'null', which gets displayed as
NULL in HiveCLI. (code pointer here:
https://github.com/lucidworks/hive-solr/blob/master/solr-hive-core/src/main/java/com/lucidworks/hadoop/hive/LWSerDe.java#L55)

Curious about what other Writable I could've been receiving, I logged
out the class details.  The name of the class matches the class I'm
expected (and checking for with 'instanceof').  Some more logging
showed that the class definitions were identical, but that the classes
came from different UDFClassLoader's, and were thus being treated as
different classes!  I thought (partially from the UDFClassLoader
itself), that each Hive session had access to one (and only one)
UDFClassLoader.  But whatever passes the Writable to my Serde's
deserialize() passes a class object loaded by a distinct
UDFClassLoader, which my SerDe then can't recognize.

(I drew this conclusion from some logging shared here:
https://pastebin.com/TwV0HPBA)

Is it a bug that my SerDe receives input from a different class
loader? Or am I misunderstanding the lifecycle/purpose of
UDFClassLoader instances?  Is there a more robust way to cast Writable
instances in a custom SerDe implementation?  Thanks in advance for any
clarification you can give.

Best,

Jason
On Mon, Sep 10, 2018 at 10:37 PM Gopal Vijayaraghavan  wrote:
>
> >query the external table using HiveCLI (e.g. SELECT * FROM
> >my_external_table), HiveCLI prints out a table with the correct
>
> If the error is always on a "select *", then the issue might be the SerDe's 
> handling of included columns.
>
> Check what you get for
>
> colNames = 
> Arrays.asList(tblProperties.getProperty(serdeConstants.LIST_COLUMNS).split(","));
>
> Or to confirm it, try doing "Select col from table" instead of "*".
>
> Cheers,
> Gopal
>
>


Re: Queries to custom serde return 'NULL' until hiveserver2 restart

2018-09-10 Thread Gopal Vijayaraghavan
>query the external table using HiveCLI (e.g. SELECT * FROM
>my_external_table), HiveCLI prints out a table with the correct

If the error is always on a "select *", then the issue might be the SerDe's 
handling of included columns.

Check what you get for  

colNames = 
Arrays.asList(tblProperties.getProperty(serdeConstants.LIST_COLUMNS).split(","));

Or to confirm it, try doing "Select col from table" instead of "*".

Cheers,
Gopal