[jira] [Commented] (HIVE-15491) Failures are masked/swallowed in GenericUDTFJSONTuple::process().

2017-11-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250824#comment-16250824
 ] 

Ashutosh Chauhan commented on HIVE-15491:
-

+1

> Failures are masked/swallowed in GenericUDTFJSONTuple::process().
> -
>
> Key: HIVE-15491
> URL: https://issues.apache.org/jira/browse/HIVE-15491
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-15491.patch
>
>
> I draw your attention to the following piece of code in 
> {{GenericUDTFJSONTuple::process()}}:
> {code:java}
>   @Override
>   public void process(Object[] o) throws HiveException {
>   ...
> for (int i = 0; i < numCols; ++i) {
> if (retCols[i] == null) {
>   retCols[i] = cols[i]; // use the object pool rather than creating a 
> new object
> }
> Object extractObject = ((Map)jsonObj).get(paths[i]);
> if (extractObject instanceof Map || extractObject instanceof List) {
>   retCols[i].set(MAPPER.writeValueAsString(extractObject));
> } else if (extractObject != null) {
>   retCols[i].set(extractObject.toString());
> } else {
>   retCols[i] = null;
> }
>   }
>   forward(retCols);
>   return;
> } catch (Throwable e) {  <= Yikes.
>   LOG.error("JSON parsing/evaluation exception" + e);
>   forward(nullCols);
> }
>   }
> {code}
> The error-handling here seems suspect. Judging from the error message, the 
> intention here seems to be to catch JSON-specific errors arising from 
> {{MAPPER.readValue()}} and {{MAPPER.writeValueAsString()}}. By catching 
> {{Throwable}}, this code masks the errors that arise from the call to 
> {{forward(retCols)}}.
> I just ran into this in production. A user with a nearly exhausted HDFS quota 
> attempted to use {{json_tuple}} to extract fields from json strings in his 
> data. The data turned out to have large record counts and the query used over 
> 25K mappers. Every task failed to create a {{RecordWriter}}, thanks to the 
> exhausted quota. But the thrown exception was swallowed in the code above. 
> {{process()}} ignored the failure for the record and proceeded to the next 
> one. Eventually, this resulted in DDoS-ing the name-node.
> I'll have a patch for this shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15491) Failures are masked/swallowed in GenericUDTFJSONTuple::process().

2016-12-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15769070#comment-15769070
 ] 

Hive QA commented on HIVE-15491:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844316/HIVE-15491.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10777 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=144)

[vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_char_mapjoin1.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dbtxnmgr_showlocks] 
(batchId=71)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=93)
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery 
(batchId=216)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2687/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2687/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2687/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844316 - PreCommit-HIVE-Build

> Failures are masked/swallowed in GenericUDTFJSONTuple::process().
> -
>
> Key: HIVE-15491
> URL: https://issues.apache.org/jira/browse/HIVE-15491
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-15491.patch
>
>
> I draw your attention to the following piece of code in 
> {{GenericUDTFJSONTuple::process()}}:
> {code:java}
>   @Override
>   public void process(Object[] o) throws HiveException {
>   ...
> for (int i = 0; i < numCols; ++i) {
> if (retCols[i] == null) {
>   retCols[i] = cols[i]; // use the object pool rather than creating a 
> new object
> }
> Object extractObject = ((Map)jsonObj).get(paths[i]);
> if (extractObject instanceof Map || extractObject instanceof List) {
>   retCols[i].set(MAPPER.writeValueAsString(extractObject));
> } else if (extractObject != null) {
>   retCols[i].set(extractObject.toString());
> } else {
>   retCols[i] = null;
> }
>   }
>   forward(retCols);
>   return;
> } catch (Throwable e) {  <= Yikes.
>   LOG.error("JSON parsing/evaluation exception" + e);
>   forward(nullCols);
> }
>   }
> {code}
> The error-handling here seems suspect. Judging from the error message, the 
> intention here seems to be to catch JSON-specific errors arising from 
> {{MAPPER.readValue()}} and {{MAPPER.writeValueAsString()}}. By catching 
> {{Throwable}}, this code masks the errors that arise from the call to 
> {{forward(retCols)}}.
> I just ran into this in production. A user with a nearly exhausted HDFS quota 
> attempted to use {{json_tuple}} to extract fields from json strings in his 
> data. The data turned out to have large record counts and the query used over 
> 25K mappers. Every task failed to create a {{RecordWriter}}, thanks to the 
> exhausted quota. But the thrown exception was swallowed in the code above. 
> {{process()}} ignored the failure for the record and proceeded to the next 
> one. Eventually, this resulted in DDoS-ing the name-node.
> I'll have a patch for this shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)