[ 
https://issues.apache.org/jira/browse/DRILL-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers closed DRILL-5105.
------------------------------

> Query time increases exponentially with increasing nested levels
> ----------------------------------------------------------------
>
>                 Key: DRILL-5105
>                 URL: https://issues.apache.org/jira/browse/DRILL-5105
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - JSON
>    Affects Versions: 1.9.0
>         Environment: 3 Node Cluster with default memory and configurations. 
>            Reporter: Abhishek Girish
>            Assignee: Chunhui Shi
>              Labels: ready-to-commit
>
> The time taken to query any JSON dataset depends on number of nested levels 
> within the dataset. Also, increasing the complexity of the dataset further 
> impacts the execution time. 
> Tabulated below is cached query execution times for a simple select * query 
> over two simple forms of JSON datasets: 
> || No. Levels   || Time (s) Dataset 1 || Time (s) Dataset 2  ||
> |1               |0.22                          |0.27                         
>  |
> |2               |0.23                             |0.25                      
>     |
> |4               |0.24                             |0.22                      
>     |
> |8               |0.22                             |0.23                      
>     |
> |16              |0.34                             |0.48                      
>     |
> |24              |25.76                            |72.51                     
>    |
> |26              |103.48                           |289.6                     
>    |
> |28              |336.12                           |1151.94                   
>  |
> |30              |1342.22                  |4586.79                    |
> |32              |5360.2                           |Expected: ~20k        |
> The above table lists query times for 20 different JSON files, 10 belonging 
> to dataset 1 & 10 belonging to dataset 2. Each have 1 record, but the number 
> of nested levels within them vary as mentioned in the "No. Levels" column. 
> It appears that the query time almost doubles with addition of a nested level 
> (note that in the table above, it translates to almost 4x across levels 
> starting 24) 
> The below two are the representative datasets, showcasing simple JSON 
> structures with nested levels.
> Structure of Dataset 1:
> {code}
> {
>   "level1": {
>     "field1": "a",
>     "level2": {
>       "field1"": "b",
>       ...
>     }
>   }
> }
> {code}
> Structure of Dataset 2:
> {code}
> "{
>   "level1": {
>     "field1": ""a",
>     "field2": {
>       "nfield1": true,
>       "nfield2": 1.1
>     },
>     "level2": {
>       "field1": "b",
>       "field2": {
>         "nfield1": false,
>         "nfield2": 2.2
>       },
>       ...
>     }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to