[ 
https://issues.apache.org/jira/browse/DRILL-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17756425#comment-17756425
 ] 

ASF GitHub Bot commented on DRILL-8450:
---------------------------------------

cgivre commented on code in PR #2819:
URL: https://github.com/apache/drill/pull/2819#discussion_r1299285670


##########
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpXmlOptions.java:
##########
@@ -111,7 +111,7 @@ public String toString() {
   public static class HttpXmlOptionsBuilder {
 
     private int dataLevel;
-    private boolean allTextMode;
+    private Boolean allTextMode;

Review Comment:
   @mbeckerle 
   In the JSON reader there are two parameters: `allTextMode` and 
`readAllNumbersAsDouble`.  Both are boolean.    For the XML reader, I chose not 
to implement the `readAllNumbersAsDouble` parameter because in practice, it 
requires very clean data.   From using Drill with clients, I can tell you from 
a lot of personal experience that this was one of the biggest data challenges.  
 For instance, you'd get data where there was an DOUBLE field and then there 
would be a row with zero denoted as `0`.   This would then cause schema change 
exceptions. 
   
   We have actually made significant improvements in Drill's implicit casting 
rules which do prevent a lot of schema change exceptions and as a result, IMHO, 
it makes distinguishing between INTs and DOUBLES a lot less important.  So.. 
out of laziness I decided it wasn't worth it.  I can be convinced otherwise.
   
   





> Add Data Type Inference to XML Format Plugin
> --------------------------------------------
>
>                 Key: DRILL-8450
>                 URL: https://issues.apache.org/jira/browse/DRILL-8450
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Format - XML
>    Affects Versions: 1.21.1
>            Reporter: Charles Givre
>            Assignee: Charles Givre
>            Priority: Major
>             Fix For: 1.22.0
>
>
> This PR adds data type inference to the XML format plugin.  In similar 
> fashion to other plugins, it adds a new configuration parameter: allTextMode, 
> which when set to true, reads all data as strings.  The default is true.
> Note that the inference is limited to doubles, date, timestamps, boolean and 
> strings.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to