[jira] [Commented] (DRILL-6879) Indicate a warning in the WebUI when a query makes little to no progress for a while

2019-05-01 Thread Bridget Bevens (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831291#comment-16831291
 ] 

Bridget Bevens commented on DRILL-6879:
---

Hi [~kkhatua],

I added the spacing in the icon column. It looks better -  I think.

Thanks,
Bridget

> Indicate a warning in the WebUI when a query makes little to no progress for 
> a while
> 
>
> Key: DRILL-6879
> URL: https://issues.apache.org/jira/browse/DRILL-6879
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Monitoring, Web Server
>Affects Versions: 1.14.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-complete, ready-to-commit
> Fix For: 1.16.0
>
> Attachments: image-2018-12-04-11-54-54-247.png, 
> image-2018-12-06-11-19-00-339.png, image-2018-12-06-11-27-14-719.png
>
>
> When running a very large query on a cluster with limited resource, we 
> noticed that one of the node's VM thread freezes the fragment threads as it 
> tries to do some work (GC perhaps?). This is a clear indication that the 
> query is stuck in a weird state where it might not recover from.
>  Under such circumstances, it makes sense to cancel or atleast warn the user 
> on that page of the query exceeding a certain threshold. 
>  For detecting this, the user will find that the {{Last Progress}} column in 
> the Fragments Overview section will show large times.
> !image-2018-12-04-11-54-54-247.png|width=969,height=336!
> In addition, there are instances where a query might have buffered operators 
> spilling to disk, which also hits performance (and, subsequently, longer run 
> times). Calling out this skew can be very useful.
> !image-2018-12-06-11-27-14-719.png|width=969,height=256!  
> Or there might be cases where a single fragment takes much longer than the 
> average (indicated by an extreme skew in the Gantt chart).
> !image-2018-12-06-11-19-00-339.png|width=969,height=150!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6879) Indicate a warning in the WebUI when a query makes little to no progress for a while

2019-04-18 Thread Kunal Khatua (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821517#comment-16821517
 ] 

Kunal Khatua commented on DRILL-6879:
-

Hi Bridget,

The content looks good, but the table in 
[https://drill.apache.org/docs/query-profiles/#query-profile-warnings] is 
unusually squeezed for the *Icon* column.

Can you see if something like suffixing {{}} to the *Icon* column header 
helps?

> Indicate a warning in the WebUI when a query makes little to no progress for 
> a while
> 
>
> Key: DRILL-6879
> URL: https://issues.apache.org/jira/browse/DRILL-6879
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Monitoring, Web Server
>Affects Versions: 1.14.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-complete, ready-to-commit
> Fix For: 1.16.0
>
> Attachments: image-2018-12-04-11-54-54-247.png, 
> image-2018-12-06-11-19-00-339.png, image-2018-12-06-11-27-14-719.png
>
>
> When running a very large query on a cluster with limited resource, we 
> noticed that one of the node's VM thread freezes the fragment threads as it 
> tries to do some work (GC perhaps?). This is a clear indication that the 
> query is stuck in a weird state where it might not recover from.
>  Under such circumstances, it makes sense to cancel or atleast warn the user 
> on that page of the query exceeding a certain threshold. 
>  For detecting this, the user will find that the {{Last Progress}} column in 
> the Fragments Overview section will show large times.
> !image-2018-12-04-11-54-54-247.png|width=969,height=336!
> In addition, there are instances where a query might have buffered operators 
> spilling to disk, which also hits performance (and, subsequently, longer run 
> times). Calling out this skew can be very useful.
> !image-2018-12-06-11-27-14-719.png|width=969,height=256!  
> Or there might be cases where a single fragment takes much longer than the 
> average (indicated by an extreme skew in the Gantt chart).
> !image-2018-12-06-11-19-00-339.png|width=969,height=150!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6879) Indicate a warning in the WebUI when a query makes little to no progress for a while

2019-04-18 Thread Bridget Bevens (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821451#comment-16821451
 ] 

Bridget Bevens commented on DRILL-6879:
---

Hi [~kkhatua],

I added content about the warnings is here:
https://drill.apache.org/docs/query-profiles/#query-profile-warnings 
You've reviewed this content already, but please let me know if you want me to 
make any changes.

Thanks,
Bridget


> Indicate a warning in the WebUI when a query makes little to no progress for 
> a while
> 
>
> Key: DRILL-6879
> URL: https://issues.apache.org/jira/browse/DRILL-6879
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Monitoring, Web Server
>Affects Versions: 1.14.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.16.0
>
> Attachments: image-2018-12-04-11-54-54-247.png, 
> image-2018-12-06-11-19-00-339.png, image-2018-12-06-11-27-14-719.png
>
>
> When running a very large query on a cluster with limited resource, we 
> noticed that one of the node's VM thread freezes the fragment threads as it 
> tries to do some work (GC perhaps?). This is a clear indication that the 
> query is stuck in a weird state where it might not recover from.
>  Under such circumstances, it makes sense to cancel or atleast warn the user 
> on that page of the query exceeding a certain threshold. 
>  For detecting this, the user will find that the {{Last Progress}} column in 
> the Fragments Overview section will show large times.
> !image-2018-12-04-11-54-54-247.png|width=969,height=336!
> In addition, there are instances where a query might have buffered operators 
> spilling to disk, which also hits performance (and, subsequently, longer run 
> times). Calling out this skew can be very useful.
> !image-2018-12-06-11-27-14-719.png|width=969,height=256!  
> Or there might be cases where a single fragment takes much longer than the 
> average (indicated by an extreme skew in the Gantt chart).
> !image-2018-12-06-11-19-00-339.png|width=969,height=150!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6879) Indicate a warning in the WebUI when a query makes little to no progress for a while

2018-12-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718815#comment-16718815
 ] 

ASF GitHub Bot commented on DRILL-6879:
---

arina-ielchiieva commented on a change in pull request #1572: DRILL-6879: Show 
warnings for potential performance issues
URL: https://github.com/apache/drill/pull/1572#discussion_r240975519
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileWrapper.java
 ##
 @@ -73,6 +73,10 @@ public ProfileWrapper(final QueryProfile profile, 
DrillConfig drillConfig) {
 final List majors = new 
ArrayList<>(profile.getFragmentProfileList());
 Collections.sort(majors, Comparators.majorId);
 
+//Setting warning thresholds for performance-degrading queries (DRILL-6879)
 
 Review comment:
   Do not use static methods, just pass drillConfig in the constructor for each 
of there classes and set warning threshold value during instance creation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Indicate a warning in the WebUI when a query makes little to no progress for 
> a while
> 
>
> Key: DRILL-6879
> URL: https://issues.apache.org/jira/browse/DRILL-6879
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Monitoring, Web Server
>Affects Versions: 1.14.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: user-experience
> Fix For: 1.16.0
>
> Attachments: image-2018-12-04-11-54-54-247.png, 
> image-2018-12-06-11-19-00-339.png, image-2018-12-06-11-27-14-719.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> When running a very large query on a cluster with limited resource, we 
> noticed that one of the node's VM thread freezes the fragment threads as it 
> tries to do some work (GC perhaps?). This is a clear indication that the 
> query is stuck in a weird state where it might not recover from.
>  Under such circumstances, it makes sense to cancel or atleast warn the user 
> on that page of the query exceeding a certain threshold. 
>  For detecting this, the user will find that the {{Last Progress}} column in 
> the Fragments Overview section will show large times.
> !image-2018-12-04-11-54-54-247.png|width=969,height=336!
> In addition, there are instances where a query might have buffered operators 
> spilling to disk, which also hits performance (and, subsequently, longer run 
> times). Calling out this skew can be very useful.
> !image-2018-12-06-11-27-14-719.png|width=969,height=256!  
> Or there might be cases where a single fragment takes much longer than the 
> average (indicated by an extreme skew in the Gantt chart).
> !image-2018-12-06-11-19-00-339.png|width=969,height=150!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6879) Indicate a warning in the WebUI when a query makes little to no progress for a while

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718120#comment-16718120
 ] 

ASF GitHub Bot commented on DRILL-6879:
---

kkhatua commented on issue #1572: DRILL-6879: Show warnings for potential 
performance issues
URL: https://github.com/apache/drill/pull/1572#issuecomment-446391795
 
 
   **Screenshot** for fragments making no progress (test setup has threshold 
`drill.exec.http.profile.warning.progress.threshold` as **1 sec** ):
   
![image](https://user-images.githubusercontent.com/4335237/49835182-436b0480-fd53-11e8-8c43-8ad7126b4abb.png)
   
   **Screenshot** for operators that have spilled to disk, or run with 
unusually long wait times; or have the longest running fragments with an 
extreme skew:
   
![image](https://user-images.githubusercontent.com/4335237/49835251-731a0c80-fd53-11e8-8a3e-72d47d4423ff.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Indicate a warning in the WebUI when a query makes little to no progress for 
> a while
> 
>
> Key: DRILL-6879
> URL: https://issues.apache.org/jira/browse/DRILL-6879
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Monitoring, Web Server
>Affects Versions: 1.14.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: user-experience
> Fix For: 1.16.0
>
> Attachments: image-2018-12-04-11-54-54-247.png, 
> image-2018-12-06-11-19-00-339.png, image-2018-12-06-11-27-14-719.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> When running a very large query on a cluster with limited resource, we 
> noticed that one of the node's VM thread freezes the fragment threads as it 
> tries to do some work (GC perhaps?). This is a clear indication that the 
> query is stuck in a weird state where it might not recover from.
>  Under such circumstances, it makes sense to cancel or atleast warn the user 
> on that page of the query exceeding a certain threshold. 
>  For detecting this, the user will find that the {{Last Progress}} column in 
> the Fragments Overview section will show large times.
> !image-2018-12-04-11-54-54-247.png|width=969,height=336!
> In addition, there are instances where a query might have buffered operators 
> spilling to disk, which also hits performance (and, subsequently, longer run 
> times). Calling out this skew can be very useful.
> !image-2018-12-06-11-27-14-719.png|width=969,height=256!  
> Or there might be cases where a single fragment takes much longer than the 
> average (indicated by an extreme skew in the Gantt chart).
> !image-2018-12-06-11-19-00-339.png|width=969,height=150!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6879) Indicate a warning in the WebUI when a query makes little to no progress for a while

2018-12-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718117#comment-16718117
 ] 

ASF GitHub Bot commented on DRILL-6879:
---

kkhatua opened a new pull request #1572: DRILL-6879: Show warnings for 
potential performance issues
URL: https://github.com/apache/drill/pull/1572
 
 
   1. Introduced warning for non-progressive fragments. Based on a threshold 
(`drill.exec.http.profile.warning.progress.threshold`), if all fragments have 
not made progress within that time, a warning is issued. The default is 5 
minutes (300sec)
   
   2. Introduced a warning if any of the buffered operators spill to disk.
   
   3. Introduced a warning for operators where the longest running fragment 
runs beyond a minimum threshold 
(`drill.exec.http.profile.warning.time.skew.min`), and runs atleast 2 times 
longer than the average 
(`drill.exec.http.profile.warning.time.skew.ratio.process`). The _clock_ symbol 
with a tooltip indicates the extent of the skew. A similar comparison is made 
for the wait time of a fragment, but with a max wait time exceeding the average 
by a separate ratio (`drill.exec.http.profile.warning.time.skew.ratio.wait`)
   
   4. Introduced a warning for operators where the average wait time of a scan 
operator exceeds its processing time, for a minimum threshold 
(`drill.exec.http.profile.warning.scan.wait.min`). The _turtle_ symbol with a 
tooltip indicates which scan operator spent more time waiting than processing.
   
   5. `TableBuilder` class refactored
a. Using attribute map instead of String arguments, eg. for 'title'
b. Removed APIs that pass a hyperlink since that is never used.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Indicate a warning in the WebUI when a query makes little to no progress for 
> a while
> 
>
> Key: DRILL-6879
> URL: https://issues.apache.org/jira/browse/DRILL-6879
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Monitoring, Web Server
>Affects Versions: 1.14.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: user-experience
> Fix For: 1.16.0
>
> Attachments: image-2018-12-04-11-54-54-247.png, 
> image-2018-12-06-11-19-00-339.png, image-2018-12-06-11-27-14-719.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> When running a very large query on a cluster with limited resource, we 
> noticed that one of the node's VM thread freezes the fragment threads as it 
> tries to do some work (GC perhaps?). This is a clear indication that the 
> query is stuck in a weird state where it might not recover from.
>  Under such circumstances, it makes sense to cancel or atleast warn the user 
> on that page of the query exceeding a certain threshold. 
>  For detecting this, the user will find that the {{Last Progress}} column in 
> the Fragments Overview section will show large times.
> !image-2018-12-04-11-54-54-247.png|width=969,height=336!
> In addition, there are instances where a query might have buffered operators 
> spilling to disk, which also hits performance (and, subsequently, longer run 
> times). Calling out this skew can be very useful.
> !image-2018-12-06-11-27-14-719.png|width=969,height=256!  
> Or there might be cases where a single fragment takes much longer than the 
> average (indicated by an extreme skew in the Gantt chart).
> !image-2018-12-06-11-19-00-339.png|width=969,height=150!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)