[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN
Github user kr-arjun commented on the issue: https://github.com/apache/drill/pull/1011 @paul-rogers I was able to resolve this issue by workaround of setting 'yarn.timeline-service.enabled' to false ( Copied yarn-site.xml with this property set to $DRILL_SITE directory). This issue is specific to environment where Timeline server is enabled. Initially , it failed with 'java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig'. On copying required jars to Drill classpath , it failed with exception I have shared in the previous attachment. The same issue is reported in Spark as well (https://issues.apache.org/jira/browse/SPARK-15343). To find the error stack trace, I had to modify the DrillOnYarn.java to print StackTrace. Thought it would be useful if stack trace can be logged for troubleshooting purpose. ---
[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1011 @arina-ielchiieva, thanks much for your help with this PR. Glad to see it is finally in Drill master after all this time! ---
[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1011 @kr-arjun, thanks for the text file. The error is related to security. DoY, in its current for, is an "MVP": it works, but leaves off advanced features. One of those missing features is to work with a secure cluster. Please file a JIRA asking for DoY to support a secure cluster. While at it, please look at the internal JIRA and locate all DoY enhancements or bugs. Now that DoY is part of Drill, those tickets should be moved to the public Apache Drill Jira. (I can't do it because I don't have access to the internal tickets.) ---
[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN
Github user kr-arjun commented on the issue: https://github.com/apache/drill/pull/1011 @paul-rogers The client error message changes look good. I did quick test with client error message changes and could verify that error message are logged. > Where you using the start command?" Yes, I was trying to start DoY in YARN environment with timeline server enabled. It failed to start Drill application master due to timeline client related error. Since it failed within DOY client process, there were no stack trace available. Attaching test scenarios of the changes for your reference. [DOY-client-error-logging.txt](https://github.com/apache/drill/files/1778349/DOY-client-error-logging.txt) ---
[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1011 @arina-ielchiieva, turned out that there were unneeded dependencies in the DoY additions to the drill-root pom.xml file. Removed these and the json.org warnings went away. Please take a look at the new commits. If all looks good, I'll squash commits to prepare for merging into master. ---
[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1011 Rebased onto latest master. ---
[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1011 @kr-arjun, thanks for your note on error handling. Where you using the `start` command? There is exactly one place where the error "Failed to start Drill application master" is thrown: it is when Drill-on-YARN fails to start the application master. There are lots of other messages for other issues such as "Error: AM already running as Application ID: 1234" or "Failed to allocate Drill application master." When writing the client, I made an explicit decision not to create a log file to avoid cluttering up things. There is no good place to put a client log since Drill does not actually run on the client machine. We could add a log, but it would be messy. What we can do, however, is include the text of the message we got from YARN when we tried to start the AM process. When an error occurs, the client will now print something line the following: ``` Failed to start Drill application master. Caused by: Some YARN error ``` The other thing we can add is a full stack dump, but only when requested with the `-v` (verbose) option: ``` > drill-on-yarn.sh -v start Failed to start Drill application master. Caused by: Some YARN error Full stack trace: (stack trace here) ``` I can't easily test this code. Please grab the latest sources and rerun your test case to ensure that it now prints out more information: whatever YARN tells us about why it would not start the AM. ---
[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN
Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1011 @paul-rogers when unit running tests with mapr profile, they fail because this commit bring banned dependency: `[INFO] --- maven-enforcer-plugin:1.3.1:enforce (avoid_bad_dependencies) @ drill-java-exec --- [WARNING] Rule 0: org.apache.maven.plugins.enforcer.BannedDependencies failed with message: Found Banned Dependency: org.json:json:jar:20080701 Use 'mvn dependency:tree' to locate the source of the banned dependencies.` Please use `mvn dependency:tree -Dincludes=org.json:json -Pmapr` to see the results. ---
[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN
Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1011 @kr-arjun, I think logging full stack trace is good idea. Let's address in new Jira. +1, LGTM. ---
[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN
Github user kr-arjun commented on the issue: https://github.com/apache/drill/pull/1011 @paul-rogers Currently , the Client exception is being output as 'ClientContext.err.println(e.getMessage())' in DrillOnYarn.java. For most of application master launcher failures, only message available is 'Failed to start Drill application master'. Do you think it would benefit troubleshooting Drill on yarn client failures if exception stacktrace can be logged? ---
[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1011 @arina-ielchiieva, do you want to give this one a committer +1? Then I'll mark it ready-to-commit. Thanks! ---
[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN
Github user Agirish commented on the issue: https://github.com/apache/drill/pull/1011 Looks good! +1. Getting this into AD 1.13.0 would be great for users. ---
[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1011 Rebased onto latest master. ---
[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1011 Fixed the drill-common dependency as @ilooner requested. ---
[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN
Github user Agirish commented on the issue: https://github.com/apache/drill/pull/1011 @arina-ielchiieva, sorry was held-up with something. I've just started on this - will get back shortly. ---
[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN
Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1011 It would be good to add this feature in the upcoming 1.13.0 Drill release. To do so we need to ensure the following: @paul-rogers 1. Could you please fix failures on Travis? Tim has added comment regarding the possible fix. 2. Also it would great if you can file the Jira indicating what possible enhancements can be done. This will definitely help in future to identify main areas of improvement for Drill on Yarn. @Agirish Did you have a chance to do sanity checks? ---
[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN
Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1011 @paul-rogers You need to add this dependency to your drill-yarn pom.xml ``` org.apache.drill drill-common ${project.version} tests test ``` ---
[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1011 Failing in Travis, apparently due to test-framework issue: ``` Caused by: java.lang.ClassNotFoundException: org.apache.drill.categories.SecurityTest ``` @ilooner, any idea what's going on? ---
[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN
Github user Agirish commented on the issue: https://github.com/apache/drill/pull/1011 @paul-rogers, I'll give it a try and update with my findings. ---
[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1011 Rebased on latest master and resolved merge conflicts. Some ZK-related classes changed. Would be good if Abhishek could do a quick sanity test on his test cluster to make sure things still work. This is a "minimum viable product" (MVP). It omits many nice-to-haves such as security, graceful shutdown, recovery from YARN RM failures and so on. Folks should feel free to file JIRAs for these enhancements as they find the need for them. ---
[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN
Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1011 @paul-rogers based on @sachouche feedback could you please create Jira for enhancement and also resolve conflicts in bin.xml file? Thank you in advance! ---
[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN
Github user sachouche commented on the issue: https://github.com/apache/drill/pull/1011 +1 I have reviewed the code and overall looks good. My main feedback is that the current implementation doesn't currently support secure clusters (at least didn't see any logic associated with that). Yarn applications have issues staying up for a long time because of ticket renewal limitations. We might want to create another enhancement JIRA to support such use-cases. ---
[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN
Github user priteshm commented on the issue: https://github.com/apache/drill/pull/1011 @sachouche @vrozov @arina-ielchiieva please review ---