[jira] [Comment Edited] (HIVE-13098) Add a strict check for when the decimal gets converted to null due to insufficient width

2016-09-30 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15536835#comment-15536835
 ] 

Sergey Shelukhin edited comment on HIVE-13098 at 9/30/16 7:41 PM:
--

Well, the crux of the matter is, whatever solution we do in Hive would be super 
unwieldy code-wise, because decimals, decimal OIs, etc. are created in 100 
places in giant static methods. I was going to add a global for compilation 
(thread-local since that is single threaded) to be able to populate it  
everywhere. At runtime (or during import) it can be used from the fields in 
runtime objects (see DecimalUdf interface in the patch and its usage). That 
would reduce the impact a lot compared to 700Kb of code changes...
Then we can choose what to do with the config once all the requisite code has 
it.
The question is whether to do it at all, esp. since as Gopal noted other types 
are also converted to null on overflow (unless they are bugged like 
decimal-to-int cast ;)), so the proper fix that covers all the cases uniformly 
would be very um, impactful in the codebase.

It is much easier to make the solution that requires user to actively 
participate (e.g. trycast or functions to explore data), but it doesn't cover 
the main case of the automated nullification.


was (Author: sershe):
Well, the crux of the matter is, whatever solution we do in Hive would be super 
unwieldy code-wise, because decimals, decimal OIs, etc. are created in 100 
places in giant static methods. I was going to add a global for compilation 
(thread-local since that is single threaded) to be able to populate it  
everywhere. At runtime (or during import) it can be used from the fields in 
runtime objects (see DecimalUdf interface in the patch and its usage). That 
would reduce the impact a lot compared to 700Kb of code changes...
Then we can choose what to do with the config once all the requisite code has 
it.
The question is whether to do it at all, esp. since as Gopal noted other types 
are also converted to null on overflow (unless they are bugged like 
decimal-to-int cast ;)), so the proper fix that covers all the cases uniformly 
would be very um, impactful in the codebase.

> Add a strict check for when the decimal gets converted to null due to 
> insufficient width
> 
>
> Key: HIVE-13098
> URL: https://issues.apache.org/jira/browse/HIVE-13098
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13098.WIP.patch, HIVE-13098.WIP2.patch
>
>
> When e.g. 99 is selected as decimal(5,0), the result is null. This can be 
> problematic, esp. if the data is written to a table and lost without the user 
> realizing it. There should be an option to error out in such cases instead; 
> it should probably be on by default and the error message should instruct the 
> user on how to disable it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-13098) Add a strict check for when the decimal gets converted to null due to insufficient width

2016-09-30 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15536835#comment-15536835
 ] 

Sergey Shelukhin edited comment on HIVE-13098 at 9/30/16 7:40 PM:
--

Well, the crux of the matter is, whatever solution we do in Hive would be super 
unwieldy code-wise, because decimals, decimal OIs, etc. are created in 100 
places in giant static methods. I was going to add a global for compilation 
(thread-local since that is single threaded) to be able to populate it  
everywhere. At runtime (or during import) it can be used from the fields in 
runtime objects (see DecimalUdf interface in the patch and its usage). That 
would reduce the impact a lot compared to 700Kb of code changes...
Then we can choose what to do with the config once all the requisite code has 
it.
The question is whether to do it at all, esp. since as Gopal noted other types 
are also converted to null on overflow (unless they are bugged like 
decimal-to-int cast ;)), so the proper fix that covers all the cases uniformly 
would be very um, impactful in the codebase.


was (Author: sershe):
Well, the crux of the matter is, whatever solution we do in Hive would be super 
unwieldy code-wise, because decimals, decimal OIs, etc. are created in 100 
places in giant static methods. I was going to add a global for compilation 
(thread-local since that is single threaded) to be able to populate it  
everywhere. At runtime (or during import) it can be used from the fields in 
runtime objects (see DecimalUdf interface in the patch and its usage).
Then we can choose what to do with it.
The question is whether to do it at all, esp. since as Gopal noted other types 
are also converted to null on overflow (unless they are bugged like 
decimal-to-int cast ;)), so the proper fix that covers all the cases uniformly 
would be very um, impactful in the codebase.

> Add a strict check for when the decimal gets converted to null due to 
> insufficient width
> 
>
> Key: HIVE-13098
> URL: https://issues.apache.org/jira/browse/HIVE-13098
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13098.WIP.patch, HIVE-13098.WIP2.patch
>
>
> When e.g. 99 is selected as decimal(5,0), the result is null. This can be 
> problematic, esp. if the data is written to a table and lost without the user 
> realizing it. There should be an option to error out in such cases instead; 
> it should probably be on by default and the error message should instruct the 
> user on how to disable it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-13098) Add a strict check for when the decimal gets converted to null due to insufficient width

2016-09-30 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15536835#comment-15536835
 ] 

Sergey Shelukhin edited comment on HIVE-13098 at 9/30/16 7:39 PM:
--

Well, the crux of the matter is, whatever solution we do in Hive would be super 
unwieldy code-wise, because decimals, decimal OIs, etc. are created in 100 
places in giant static methods. I was going to add a global for compilation 
(thread-local since that is single threaded) to be able to populate it  
everywhere. At runtime (or during import) it can be used from the fields in 
runtime objects (see DecimalUdf interface in the patch and its usage).
Then we can choose what to do with it.
The question is whether to do it at all, esp. since as Gopal noted other types 
are also converted to null on overflow (unless they are bugged like 
decimal-to-int cast ;)), so the proper fix that covers all the cases uniformly 
would be very um, impactful in the codebase.


was (Author: sershe):
Well, the crux of the matter is, whatever solution we do in Hive would be super 
unwieldy code-wise, because decimals, decimal OIs, etc. are created in 100 
places in giant static methods. I was going to add a global for compilation 
(thread-local since that is single threaded) to be able to populate it  
everywhere. At runtime (or during import) it can be used from the fields in 
runtime objects (see DecimalUdf interface in the patch and its usage).
Then we can choose what to do with it.
The question is whether to do it at all, esp. since as Gopal noted other types 
are also converted to null on overflow (unless they are bugged like 
decimal-to-int cast ;)).

> Add a strict check for when the decimal gets converted to null due to 
> insufficient width
> 
>
> Key: HIVE-13098
> URL: https://issues.apache.org/jira/browse/HIVE-13098
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13098.WIP.patch, HIVE-13098.WIP2.patch
>
>
> When e.g. 99 is selected as decimal(5,0), the result is null. This can be 
> problematic, esp. if the data is written to a table and lost without the user 
> realizing it. There should be an option to error out in such cases instead; 
> it should probably be on by default and the error message should instruct the 
> user on how to disable it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-13098) Add a strict check for when the decimal gets converted to null due to insufficient width

2016-09-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534285#comment-15534285
 ] 

Sergey Shelukhin edited comment on HIVE-13098 at 9/29/16 10:49 PM:
---

Upon looking further on OI path I don't think it's possible to propagate it 
there without major changes, in fact OI-related parts of this patch are not 
valid, since OIs are assumed to be stateless and are cached process-wide, ditto 
for TypeInfo-s. There are lots of static method paths accessing those...
I think I might scrape a lot of the patch and add a globally accessible static 
that would have to be initialize on CLI/HS2/task startup.. The only exception 
would be write path that happens outside of Hive services... 

This will reduce size of the patch a lot (but also make it a global setting not 
modifiable per query...)

Update: another alternative would be a (TADA!) threadlocal.
We could set it at compile time and change the patch to have only compile paths 
use it, whereas runtime paths would use the fields in OIs and fns that compile 
populates. As much as I hate threadlocals, I think that's the best approach as 
it will make patch smaller (right now 700kb of code changes is not even 
everything, OI changes would be massive), also allow one to set it per query 
and remove the requirement to initialize it for everyone using Hive libs, since 
APIs would not use it beyond compilation.

[~ashutoshc] [~hagleitn] [~jdere] opinions?


was (Author: sershe):
Upon looking further on OI path I don't think it's possible to propagate it 
there without major changes, in fact OI-related parts of this patch are not 
valid, since OIs are assumed to be stateless and are cached process-wide, ditto 
for TypeInfo-s. There are lots of static method paths accessing those...
I think I might scrape a lot of the patch and add a globally accessible static 
that would have to be initialize on CLI/HS2/task startup.. The only exception 
would be write path that happens outside of Hive services... 

This will reduce size of the patch a lot (but also make it a global setting not 
modifiable per query...)

Update: another alternative would be a (TADA!) threadlocal.
We could set it at compile time and change the patch to have only compile paths 
use it, whereas runtime paths would use the fields in OIs and fns that compile 
populates.

[~ashutoshc] [~hagleitn] [~jdere] opinions?

> Add a strict check for when the decimal gets converted to null due to 
> insufficient width
> 
>
> Key: HIVE-13098
> URL: https://issues.apache.org/jira/browse/HIVE-13098
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13098.WIP.patch, HIVE-13098.WIP2.patch
>
>
> When e.g. 99 is selected as decimal(5,0), the result is null. This can be 
> problematic, esp. if the data is written to a table and lost without the user 
> realizing it. There should be an option to error out in such cases instead; 
> it should probably be on by default and the error message should instruct the 
> user on how to disable it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-13098) Add a strict check for when the decimal gets converted to null due to insufficient width

2016-09-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534285#comment-15534285
 ] 

Sergey Shelukhin edited comment on HIVE-13098 at 9/29/16 10:47 PM:
---

Upon looking further on OI path I don't think it's possible to propagate it 
there without major changes, in fact OI-related parts of this patch are not 
valid, since OIs are assumed to be stateless and are cached process-wide, ditto 
for TypeInfo-s. There are lots of static method paths accessing those...
I think I might scrape a lot of the patch and add a globally accessible static 
that would have to be initialize on CLI/HS2/task startup.. The only exception 
would be write path that happens outside of Hive services... 

This will reduce size of the patch a lot (but also make it a global setting not 
modifiable per query...)

Update: another alternative would be a (TADA!) threadlocal.
We could set it at compile time and change the patch to have only compile paths 
use it, whereas runtime paths would use the fields in OIs and fns that compile 
populates.

[~ashutoshc] [~hagleitn] [~jdere] opinions?


was (Author: sershe):
Upon looking further on OI path I don't think it's possible to propagate it 
there without major changes, in fact OI-related parts of this patch are not 
valid, since OIs are assumed to be stateless and are cached process-wide, ditto 
for TypeInfo-s. There are lots of static method paths accessing those...
I think I might scrape a lot of the patch and add a globally accessible static 
that would have to be initialize on CLI/HS2/task startup.. The only exception 
would be write path that happens outside of Hive services... 

This will reduce size of the patch a lot (but also make it a global setting not 
modifiable per query...)

[~ashutoshc] [~hagleitn] [~jdere] opinions?

> Add a strict check for when the decimal gets converted to null due to 
> insufficient width
> 
>
> Key: HIVE-13098
> URL: https://issues.apache.org/jira/browse/HIVE-13098
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13098.WIP.patch, HIVE-13098.WIP2.patch
>
>
> When e.g. 99 is selected as decimal(5,0), the result is null. This can be 
> problematic, esp. if the data is written to a table and lost without the user 
> realizing it. There should be an option to error out in such cases instead; 
> it should probably be on by default and the error message should instruct the 
> user on how to disable it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)