[jira] [Comment Edited] (HIVE-13098) Add a strict check for when the decimal gets converted to null due to insufficient width
[ https://issues.apache.org/jira/browse/HIVE-13098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15536835#comment-15536835 ] Sergey Shelukhin edited comment on HIVE-13098 at 9/30/16 7:41 PM: -- Well, the crux of the matter is, whatever solution we do in Hive would be super unwieldy code-wise, because decimals, decimal OIs, etc. are created in 100 places in giant static methods. I was going to add a global for compilation (thread-local since that is single threaded) to be able to populate it everywhere. At runtime (or during import) it can be used from the fields in runtime objects (see DecimalUdf interface in the patch and its usage). That would reduce the impact a lot compared to 700Kb of code changes... Then we can choose what to do with the config once all the requisite code has it. The question is whether to do it at all, esp. since as Gopal noted other types are also converted to null on overflow (unless they are bugged like decimal-to-int cast ;)), so the proper fix that covers all the cases uniformly would be very um, impactful in the codebase. It is much easier to make the solution that requires user to actively participate (e.g. trycast or functions to explore data), but it doesn't cover the main case of the automated nullification. was (Author: sershe): Well, the crux of the matter is, whatever solution we do in Hive would be super unwieldy code-wise, because decimals, decimal OIs, etc. are created in 100 places in giant static methods. I was going to add a global for compilation (thread-local since that is single threaded) to be able to populate it everywhere. At runtime (or during import) it can be used from the fields in runtime objects (see DecimalUdf interface in the patch and its usage). That would reduce the impact a lot compared to 700Kb of code changes... Then we can choose what to do with the config once all the requisite code has it. The question is whether to do it at all, esp. since as Gopal noted other types are also converted to null on overflow (unless they are bugged like decimal-to-int cast ;)), so the proper fix that covers all the cases uniformly would be very um, impactful in the codebase. > Add a strict check for when the decimal gets converted to null due to > insufficient width > > > Key: HIVE-13098 > URL: https://issues.apache.org/jira/browse/HIVE-13098 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13098.WIP.patch, HIVE-13098.WIP2.patch > > > When e.g. 99 is selected as decimal(5,0), the result is null. This can be > problematic, esp. if the data is written to a table and lost without the user > realizing it. There should be an option to error out in such cases instead; > it should probably be on by default and the error message should instruct the > user on how to disable it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-13098) Add a strict check for when the decimal gets converted to null due to insufficient width
[ https://issues.apache.org/jira/browse/HIVE-13098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15536835#comment-15536835 ] Sergey Shelukhin edited comment on HIVE-13098 at 9/30/16 7:40 PM: -- Well, the crux of the matter is, whatever solution we do in Hive would be super unwieldy code-wise, because decimals, decimal OIs, etc. are created in 100 places in giant static methods. I was going to add a global for compilation (thread-local since that is single threaded) to be able to populate it everywhere. At runtime (or during import) it can be used from the fields in runtime objects (see DecimalUdf interface in the patch and its usage). That would reduce the impact a lot compared to 700Kb of code changes... Then we can choose what to do with the config once all the requisite code has it. The question is whether to do it at all, esp. since as Gopal noted other types are also converted to null on overflow (unless they are bugged like decimal-to-int cast ;)), so the proper fix that covers all the cases uniformly would be very um, impactful in the codebase. was (Author: sershe): Well, the crux of the matter is, whatever solution we do in Hive would be super unwieldy code-wise, because decimals, decimal OIs, etc. are created in 100 places in giant static methods. I was going to add a global for compilation (thread-local since that is single threaded) to be able to populate it everywhere. At runtime (or during import) it can be used from the fields in runtime objects (see DecimalUdf interface in the patch and its usage). Then we can choose what to do with it. The question is whether to do it at all, esp. since as Gopal noted other types are also converted to null on overflow (unless they are bugged like decimal-to-int cast ;)), so the proper fix that covers all the cases uniformly would be very um, impactful in the codebase. > Add a strict check for when the decimal gets converted to null due to > insufficient width > > > Key: HIVE-13098 > URL: https://issues.apache.org/jira/browse/HIVE-13098 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13098.WIP.patch, HIVE-13098.WIP2.patch > > > When e.g. 99 is selected as decimal(5,0), the result is null. This can be > problematic, esp. if the data is written to a table and lost without the user > realizing it. There should be an option to error out in such cases instead; > it should probably be on by default and the error message should instruct the > user on how to disable it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-13098) Add a strict check for when the decimal gets converted to null due to insufficient width
[ https://issues.apache.org/jira/browse/HIVE-13098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15536835#comment-15536835 ] Sergey Shelukhin edited comment on HIVE-13098 at 9/30/16 7:39 PM: -- Well, the crux of the matter is, whatever solution we do in Hive would be super unwieldy code-wise, because decimals, decimal OIs, etc. are created in 100 places in giant static methods. I was going to add a global for compilation (thread-local since that is single threaded) to be able to populate it everywhere. At runtime (or during import) it can be used from the fields in runtime objects (see DecimalUdf interface in the patch and its usage). Then we can choose what to do with it. The question is whether to do it at all, esp. since as Gopal noted other types are also converted to null on overflow (unless they are bugged like decimal-to-int cast ;)), so the proper fix that covers all the cases uniformly would be very um, impactful in the codebase. was (Author: sershe): Well, the crux of the matter is, whatever solution we do in Hive would be super unwieldy code-wise, because decimals, decimal OIs, etc. are created in 100 places in giant static methods. I was going to add a global for compilation (thread-local since that is single threaded) to be able to populate it everywhere. At runtime (or during import) it can be used from the fields in runtime objects (see DecimalUdf interface in the patch and its usage). Then we can choose what to do with it. The question is whether to do it at all, esp. since as Gopal noted other types are also converted to null on overflow (unless they are bugged like decimal-to-int cast ;)). > Add a strict check for when the decimal gets converted to null due to > insufficient width > > > Key: HIVE-13098 > URL: https://issues.apache.org/jira/browse/HIVE-13098 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13098.WIP.patch, HIVE-13098.WIP2.patch > > > When e.g. 99 is selected as decimal(5,0), the result is null. This can be > problematic, esp. if the data is written to a table and lost without the user > realizing it. There should be an option to error out in such cases instead; > it should probably be on by default and the error message should instruct the > user on how to disable it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-13098) Add a strict check for when the decimal gets converted to null due to insufficient width
[ https://issues.apache.org/jira/browse/HIVE-13098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534285#comment-15534285 ] Sergey Shelukhin edited comment on HIVE-13098 at 9/29/16 10:49 PM: --- Upon looking further on OI path I don't think it's possible to propagate it there without major changes, in fact OI-related parts of this patch are not valid, since OIs are assumed to be stateless and are cached process-wide, ditto for TypeInfo-s. There are lots of static method paths accessing those... I think I might scrape a lot of the patch and add a globally accessible static that would have to be initialize on CLI/HS2/task startup.. The only exception would be write path that happens outside of Hive services... This will reduce size of the patch a lot (but also make it a global setting not modifiable per query...) Update: another alternative would be a (TADA!) threadlocal. We could set it at compile time and change the patch to have only compile paths use it, whereas runtime paths would use the fields in OIs and fns that compile populates. As much as I hate threadlocals, I think that's the best approach as it will make patch smaller (right now 700kb of code changes is not even everything, OI changes would be massive), also allow one to set it per query and remove the requirement to initialize it for everyone using Hive libs, since APIs would not use it beyond compilation. [~ashutoshc] [~hagleitn] [~jdere] opinions? was (Author: sershe): Upon looking further on OI path I don't think it's possible to propagate it there without major changes, in fact OI-related parts of this patch are not valid, since OIs are assumed to be stateless and are cached process-wide, ditto for TypeInfo-s. There are lots of static method paths accessing those... I think I might scrape a lot of the patch and add a globally accessible static that would have to be initialize on CLI/HS2/task startup.. The only exception would be write path that happens outside of Hive services... This will reduce size of the patch a lot (but also make it a global setting not modifiable per query...) Update: another alternative would be a (TADA!) threadlocal. We could set it at compile time and change the patch to have only compile paths use it, whereas runtime paths would use the fields in OIs and fns that compile populates. [~ashutoshc] [~hagleitn] [~jdere] opinions? > Add a strict check for when the decimal gets converted to null due to > insufficient width > > > Key: HIVE-13098 > URL: https://issues.apache.org/jira/browse/HIVE-13098 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13098.WIP.patch, HIVE-13098.WIP2.patch > > > When e.g. 99 is selected as decimal(5,0), the result is null. This can be > problematic, esp. if the data is written to a table and lost without the user > realizing it. There should be an option to error out in such cases instead; > it should probably be on by default and the error message should instruct the > user on how to disable it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-13098) Add a strict check for when the decimal gets converted to null due to insufficient width
[ https://issues.apache.org/jira/browse/HIVE-13098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534285#comment-15534285 ] Sergey Shelukhin edited comment on HIVE-13098 at 9/29/16 10:47 PM: --- Upon looking further on OI path I don't think it's possible to propagate it there without major changes, in fact OI-related parts of this patch are not valid, since OIs are assumed to be stateless and are cached process-wide, ditto for TypeInfo-s. There are lots of static method paths accessing those... I think I might scrape a lot of the patch and add a globally accessible static that would have to be initialize on CLI/HS2/task startup.. The only exception would be write path that happens outside of Hive services... This will reduce size of the patch a lot (but also make it a global setting not modifiable per query...) Update: another alternative would be a (TADA!) threadlocal. We could set it at compile time and change the patch to have only compile paths use it, whereas runtime paths would use the fields in OIs and fns that compile populates. [~ashutoshc] [~hagleitn] [~jdere] opinions? was (Author: sershe): Upon looking further on OI path I don't think it's possible to propagate it there without major changes, in fact OI-related parts of this patch are not valid, since OIs are assumed to be stateless and are cached process-wide, ditto for TypeInfo-s. There are lots of static method paths accessing those... I think I might scrape a lot of the patch and add a globally accessible static that would have to be initialize on CLI/HS2/task startup.. The only exception would be write path that happens outside of Hive services... This will reduce size of the patch a lot (but also make it a global setting not modifiable per query...) [~ashutoshc] [~hagleitn] [~jdere] opinions? > Add a strict check for when the decimal gets converted to null due to > insufficient width > > > Key: HIVE-13098 > URL: https://issues.apache.org/jira/browse/HIVE-13098 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13098.WIP.patch, HIVE-13098.WIP2.patch > > > When e.g. 99 is selected as decimal(5,0), the result is null. This can be > problematic, esp. if the data is written to a table and lost without the user > realizing it. There should be an option to error out in such cases instead; > it should probably be on by default and the error message should instruct the > user on how to disable it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)