[jira] [Updated] (HIVE-13511) Run clidriver tests from within the qtest dir for the precommit tests
[ https://issues.apache.org/jira/browse/HIVE-13511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-13511: -- Description: The tests are currently run from the itests directory - which means there's additional overhead of having to at least check whether files have changed. Will attach a sample output - this adds up to 40+ seconds per batch. Getting rid of this should be a reasonable saving overall. > Run clidriver tests from within the qtest dir for the precommit tests > - > > Key: HIVE-13511 > URL: https://issues.apache.org/jira/browse/HIVE-13511 > Project: Hive > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > > The tests are currently run from the itests directory - which means there's > additional overhead of having to at least check whether files have changed. > Will attach a sample output - this adds up to 40+ seconds per batch. Getting > rid of this should be a reasonable saving overall. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13511) Run clidriver tests from within the qtest dir for the precommit tests
[ https://issues.apache.org/jira/browse/HIVE-13511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-13511: -- Attachment: example_testExecution.txt example_maven-test.txt > Run clidriver tests from within the qtest dir for the precommit tests > - > > Key: HIVE-13511 > URL: https://issues.apache.org/jira/browse/HIVE-13511 > Project: Hive > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: example_maven-test.txt, example_testExecution.txt > > > The tests are currently run from the itests directory - which means there's > additional overhead of having to at least check whether files have changed. > Will attach a sample output - this adds up to 40+ seconds per batch. Getting > rid of this should be a reasonable saving overall. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13496) Create initial test data once across multiple test runs
[ https://issues.apache.org/jira/browse/HIVE-13496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-13496: -- Status: Patch Available (was: Open) > Create initial test data once across multiple test runs > --- > > Key: HIVE-13496 > URL: https://issues.apache.org/jira/browse/HIVE-13496 > Project: Hive > Issue Type: Sub-task > Components: Test >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-13496.01.patch > > > All TestCliDriver, TezMiniTezCliDriver etc tests create a standard data set > when they start up. When running on a box with SSDs - this step takes over a > minute. > Running a single qtest cannot be faster than this. On the ptest framework - > all batches end up doing this which is a lot of wastage. > Instead, this data generation should be shared across runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13511) Run clidriver tests from within the qtest dir for the precommit tests
[ https://issues.apache.org/jira/browse/HIVE-13511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-13511: -- Status: Patch Available (was: Open) > Run clidriver tests from within the qtest dir for the precommit tests > - > > Key: HIVE-13511 > URL: https://issues.apache.org/jira/browse/HIVE-13511 > Project: Hive > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-13511.01.patch, example_maven-test.txt, > example_testExecution.txt > > > The tests are currently run from the itests directory - which means there's > additional overhead of having to at least check whether files have changed. > Will attach a sample output - this adds up to 40+ seconds per batch. Getting > rid of this should be a reasonable saving overall. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13504) Add a timeout for static initialization blocks
[ https://issues.apache.org/jira/browse/HIVE-13504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-13504: -- Target Version/s: 2.1.0 > Add a timeout for static initialization blocks > -- > > Key: HIVE-13504 > URL: https://issues.apache.org/jira/browse/HIVE-13504 > Project: Hive > Issue Type: Sub-task >Reporter: Siddharth Seth > > Junit test annotations cannot be used for @BeforeClass or static > initialization blocks in tests. > This would likely be a custom monitor. > TestJdbcWithMiniHS2, Test*Cli are good candidates to get started. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13496) Create initial test data once across multiple test runs
[ https://issues.apache.org/jira/browse/HIVE-13496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-13496: -- Attachment: HIVE-13496.01.patch Patch to make this change. This does the third point mentioned in the previous post. If the data does not exist, create it and copy it to a known location for future runs. If the data exists in the known location, copy it over for the current run. mvn clean gets rid of the cached data, in case it needs to be re-generated again. For "mvn test -Dtest=TestCliDriver -Dqfile="udf_md5.q"" Without patch: Run1: Total time: 1:09.271s Run2: Total time: 1:07.661s Run3: Total time: 1:09.281s With patch: Run1: Total time: 1:08.162s Run2: Total time: 18.754s Run3: Total time: 18.680s For Precommit tests, TestCliDriver runs 2131 tests - ~143 batches on 14 nodes - so an average 10 batches per node. Lookin at existing test results (specifically the mvn output against the test xml) - there's over a minute of data gen overhead on the build machines. Should take 10+ minutes off the runtime. Only done for TestCliDriver right now. I think we should get this change in (ideally without pre-commit), and then look at the other tests. [~ashutoshc], [~thejas] - could you please take a look. > Create initial test data once across multiple test runs > --- > > Key: HIVE-13496 > URL: https://issues.apache.org/jira/browse/HIVE-13496 > Project: Hive > Issue Type: Sub-task > Components: Test >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-13496.01.patch > > > All TestCliDriver, TezMiniTezCliDriver etc tests create a standard data set > when they start up. When running on a box with SSDs - this step takes over a > minute. > Running a single qtest cannot be faster than this. On the ptest framework - > all batches end up doing this which is a lot of wastage. > Instead, this data generation should be shared across runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13507) Improved logging for ptest
[ https://issues.apache.org/jira/browse/HIVE-13507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-13507: -- Description: Include information about batch runtimes, outlier lists, host completion times, etc. Try identifying tests which cause the build to take a long time while holding onto resources. > Improved logging for ptest > -- > > Key: HIVE-13507 > URL: https://issues.apache.org/jira/browse/HIVE-13507 > Project: Hive > Issue Type: Sub-task >Reporter: Siddharth Seth > > Include information about batch runtimes, outlier lists, host completion > times, etc. Try identifying tests which cause the build to take a long time > while holding onto resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13495) Add timeout for individual tests
[ https://issues.apache.org/jira/browse/HIVE-13495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-13495: -- Issue Type: Sub-task (was: Improvement) Parent: HIVE-13503 > Add timeout for individual tests > > > Key: HIVE-13495 > URL: https://issues.apache.org/jira/browse/HIVE-13495 > Project: Hive > Issue Type: Sub-task > Components: Tests >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-13495.patch > > > Some of the tests may get into hang state or may take long time to execute. > We shall make test infra robust to that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13496) Create initial test data once across multiple test runs
[ https://issues.apache.org/jira/browse/HIVE-13496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-13496: -- Issue Type: Sub-task (was: Improvement) Parent: HIVE-13503 > Create initial test data once across multiple test runs > --- > > Key: HIVE-13496 > URL: https://issues.apache.org/jira/browse/HIVE-13496 > Project: Hive > Issue Type: Sub-task > Components: Test >Reporter: Siddharth Seth >Assignee: Siddharth Seth > > All TestCliDriver, TezMiniTezCliDriver etc tests create a standard data set > when they start up. When running on a box with SSDs - this step takes over a > minute. > Running a single qtest cannot be faster than this. On the ptest framework - > all batches end up doing this which is a lot of wastage. > Instead, this data generation should be shared across runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13503) [Umbrella] Test sub-system improvements
[ https://issues.apache.org/jira/browse/HIVE-13503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240003#comment-15240003 ] Siddharth Seth commented on HIVE-13503: --- The initial build, deploy phase takes close to 30 minutes, during which the test boxes do nothing - this includes git checkout, git gc, build, TestDummy, build qtests, TestDummy on qtests. That's over 20 minutes. This is followed by transferring the artifacts over to the actual tests boxes - that's another 8-10 minutes. Need to look at how this can be made faster. I suspect the data transfers could be optimized as a single tar instead of 100s of individual files. Needs investigation. TestDummy is almost useless and adds a good amount of time. I don't think much can be done about the build itself - not as a quick fix anyway. > [Umbrella] Test sub-system improvements > --- > > Key: HIVE-13503 > URL: https://issues.apache.org/jira/browse/HIVE-13503 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth > > Primarily targeting faster pre-commit builds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13505) Skip running TestDummy where possibe during precommit builds
[ https://issues.apache.org/jira/browse/HIVE-13505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-13505: -- Summary: Skip running TestDummy where possibe during precommit builds (was: Skip running TestDummy where possibe) > Skip running TestDummy where possibe during precommit builds > > > Key: HIVE-13505 > URL: https://issues.apache.org/jira/browse/HIVE-13505 > Project: Hive > Issue Type: Sub-task >Reporter: Siddharth Seth > > On the main Hive build - this does nothing. There are some tests named > TestDummy under qtests - I'm not sure they do anything useful though. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-13505) Skip running TestDummy where possibe during precommit builds
[ https://issues.apache.org/jira/browse/HIVE-13505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth reassigned HIVE-13505: - Assignee: Siddharth Seth > Skip running TestDummy where possibe during precommit builds > > > Key: HIVE-13505 > URL: https://issues.apache.org/jira/browse/HIVE-13505 > Project: Hive > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-13505.01.patch > > > On the main Hive build - this does nothing. There are some tests named > TestDummy under qtests - I'm not sure they do anything useful though. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13505) Skip running TestDummy where possibe during precommit builds
[ https://issues.apache.org/jira/browse/HIVE-13505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-13505: -- Attachment: HIVE-13505.01.patch Simple patch to avoid running TestDummy. These tests don't exist in the main root, and there's two empty tests under qtests. Don't see the point of running this step - which adds over 5 minutes to build time. [~ashutoshc] - please review. > Skip running TestDummy where possibe during precommit builds > > > Key: HIVE-13505 > URL: https://issues.apache.org/jira/browse/HIVE-13505 > Project: Hive > Issue Type: Sub-task >Reporter: Siddharth Seth > Attachments: HIVE-13505.01.patch > > > On the main Hive build - this does nothing. There are some tests named > TestDummy under qtests - I'm not sure they do anything useful though. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13505) Skip running TestDummy where possibe during precommit builds
[ https://issues.apache.org/jira/browse/HIVE-13505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-13505: -- Status: Patch Available (was: Open) > Skip running TestDummy where possibe during precommit builds > > > Key: HIVE-13505 > URL: https://issues.apache.org/jira/browse/HIVE-13505 > Project: Hive > Issue Type: Sub-task >Reporter: Siddharth Seth > Attachments: HIVE-13505.01.patch > > > On the main Hive build - this does nothing. There are some tests named > TestDummy under qtests - I'm not sure they do anything useful though. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13496) Create initial test data once across multiple test runs
[ https://issues.apache.org/jira/browse/HIVE-13496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241538#comment-15241538 ] Siddharth Seth commented on HIVE-13496: --- Thanks Ashutosh. I'm going to commit this and monitor the next few builds for failures. > Create initial test data once across multiple test runs > --- > > Key: HIVE-13496 > URL: https://issues.apache.org/jira/browse/HIVE-13496 > Project: Hive > Issue Type: Sub-task > Components: Test >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-13496.01.patch > > > All TestCliDriver, TezMiniTezCliDriver etc tests create a standard data set > when they start up. When running on a box with SSDs - this step takes over a > minute. > Running a single qtest cannot be faster than this. On the ptest framework - > all batches end up doing this which is a lot of wastage. > Instead, this data generation should be shared across runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13496) Create initial test data once across multiple test runs - TestCliDriver
[ https://issues.apache.org/jira/browse/HIVE-13496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-13496: -- Resolution: Fixed Fix Version/s: 2.1.0 Status: Resolved (was: Patch Available) > Create initial test data once across multiple test runs - TestCliDriver > --- > > Key: HIVE-13496 > URL: https://issues.apache.org/jira/browse/HIVE-13496 > Project: Hive > Issue Type: Sub-task > Components: Test >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 2.1.0 > > Attachments: HIVE-13496.01.patch > > > All TestCliDriver, TezMiniTezCliDriver etc tests create a standard data set > when they start up. When running on a box with SSDs - this step takes over a > minute. > Running a single qtest cannot be faster than this. On the ptest framework - > all batches end up doing this which is a lot of wastage. > Instead, this data generation should be shared across runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13490) Change itests to be part of the main Hive build
[ https://issues.apache.org/jira/browse/HIVE-13490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-13490: -- Status: Patch Available (was: Open) Will update another patch shortly to change the build steps for this. > Change itests to be part of the main Hive build > --- > > Key: HIVE-13490 > URL: https://issues.apache.org/jira/browse/HIVE-13490 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-13490.01.patch > > > Instead of having to build Hive, and then itests separately. > With IntelliJ, this ends up being loaded as two separate dependencies, and > there's a lot of hops involved to make changes. > Does anyone know why these have been kept separate ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13507) Improved logging for ptest
[ https://issues.apache.org/jira/browse/HIVE-13507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251498#comment-15251498 ] Siddharth Seth commented on HIVE-13507: --- Oops. Sorry about that. I'll go back and see what caused this. > Improved logging for ptest > -- > > Key: HIVE-13507 > URL: https://issues.apache.org/jira/browse/HIVE-13507 > Project: Hive > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Sergio Peña > Fix For: 2.1.0 > > Attachments: HIVE-13507.01.patch > > > Include information about batch runtimes, outlier lists, host completion > times, etc. Try identifying tests which cause the build to take a long time > while holding onto resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13490) Change itests to be part of the main Hive build
[ https://issues.apache.org/jira/browse/HIVE-13490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251435#comment-15251435 ] Siddharth Seth commented on HIVE-13490: --- OK. That helps me understand why these have been kept separate. If we want to separate unit tests vs integration tests - I think log4j gives us a good means to achieve this (don't recall if it was categories or something else). That would be a better way to approach this than artificially separating the tests. I'll see if I can work on this. Meanwhile, anyone who wants the IDE fix - can apply this patch. I pretty much have it permanently applied locally. > Change itests to be part of the main Hive build > --- > > Key: HIVE-13490 > URL: https://issues.apache.org/jira/browse/HIVE-13490 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-13490.01.patch, HIVE-13490.02.patch > > > Instead of having to build Hive, and then itests separately. > With IntelliJ, this ends up being loaded as two separate dependencies, and > there's a lot of hops involved to make changes. > Does anyone know why these have been kept separate ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13511) Run clidriver tests from within the qtest dir for the precommit tests
[ https://issues.apache.org/jira/browse/HIVE-13511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-13511: -- Attachment: HIVE-13511.03.patch Thanks for pointing that out [~ashutoshc]. Uploaded a new patch to fix the test compile issues. I don't want to commit this till we have some time to restart the precommit build system and monitor it for a bit. cc [~spena], [~szehon] > Run clidriver tests from within the qtest dir for the precommit tests > - > > Key: HIVE-13511 > URL: https://issues.apache.org/jira/browse/HIVE-13511 > Project: Hive > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-13511.01.patch, HIVE-13511.02.patch, > HIVE-13511.03.patch, example_maven-test.txt, example_testExecution.txt > > > The tests are currently run from the itests directory - which means there's > additional overhead of having to at least check whether files have changed. > Will attach a sample output - this adds up to 40+ seconds per batch. Getting > rid of this should be a reasonable saving overall. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13467) Show llap info on hs2 ui when available
[ https://issues.apache.org/jira/browse/HIVE-13467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251576#comment-15251576 ] Siddharth Seth commented on HIVE-13467: --- Is this hardcoded to use the name "llap0" ? That should come from the hive configuration instead. Would be useful if the URI could be /llap/. The default value of clusterName comes from hive-site.xml - hive.llap.daemon.service.hosts. It is possible for users to override this if a different llap instance exists. Going to /llap/randomClusterName - could attempt to render the UI for that specific cluster. The bit about attempting to fetch information for an unknown cluster could be done later. However, I think the path /llap/clusterName would be useful to have in this jira. > Show llap info on hs2 ui when available > --- > > Key: HIVE-13467 > URL: https://issues.apache.org/jira/browse/HIVE-13467 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Gunther Hagleitner >Assignee: Gunther Hagleitner > Attachments: HIVE-13467.1.patch, HIVE-13467.2.patch, > HIVE-13467.3.patch, HIVE-13467.4.patch, HIVE-13467.5.patch, > screen-shot-llap.png, screen.png > > > When llap is on and hs2 is configured with access to an llap cluster, HS2 UI > should show some status of the daemons and provide a mechanism to click > through to their respective UIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13444) LLAP: add HMAC signatures to LLAP; verify them on LLAP side
[ https://issues.apache.org/jira/browse/HIVE-13444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298741#comment-15298741 ] Siddharth Seth commented on HIVE-13444: --- +1, after addressing two minor comments. > LLAP: add HMAC signatures to LLAP; verify them on LLAP side > --- > > Key: HIVE-13444 > URL: https://issues.apache.org/jira/browse/HIVE-13444 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13444.01.patch, HIVE-13444.02.patch, > HIVE-13444.03.patch, HIVE-13444.WIP.patch, HIVE-13444.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13444) LLAP: add HMAC signatures to LLAP; verify them on LLAP side
[ https://issues.apache.org/jira/browse/HIVE-13444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298740#comment-15298740 ] Siddharth Seth commented on HIVE-13444: --- +1, after addressing two minor comments. > LLAP: add HMAC signatures to LLAP; verify them on LLAP side > --- > > Key: HIVE-13444 > URL: https://issues.apache.org/jira/browse/HIVE-13444 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13444.01.patch, HIVE-13444.02.patch, > HIVE-13444.03.patch, HIVE-13444.WIP.patch, HIVE-13444.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14316) TestLlapTokenChecker.testCheckPermissions, testGetToken fail
[ https://issues.apache.org/jira/browse/HIVE-14316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392881#comment-15392881 ] Siddharth Seth commented on HIVE-14316: --- +1 pending precommit. > TestLlapTokenChecker.testCheckPermissions, testGetToken fail > > > Key: HIVE-14316 > URL: https://issues.apache.org/jira/browse/HIVE-14316 > Project: Hive > Issue Type: Test >Reporter: Siddharth Seth >Assignee: Sergey Shelukhin > Attachments: HIVE-14316.patch > > > cc [~sershe] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14225) Llap slider package should support configuring YARN rolling log aggregation
[ https://issues.apache.org/jira/browse/HIVE-14225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392806#comment-15392806 ] Siddharth Seth commented on HIVE-14225: --- Think we need a full writup on llap logging at some point; more importantly llap configuration. > Llap slider package should support configuring YARN rolling log aggregation > --- > > Key: HIVE-14225 > URL: https://issues.apache.org/jira/browse/HIVE-14225 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Labels: TODOC2.2 > Fix For: 2.2.0 > > Attachments: HIVE-14225.01.patch, HIVE-14225.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14213) Add timeouts for various components in llap status check
[ https://issues.apache.org/jira/browse/HIVE-14213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375348#comment-15375348 ] Siddharth Seth commented on HIVE-14213: --- Setting component configs in the main site files - core-site/yarn-site would affect all components. So 'yarn logs', submitting a query in Hive, etc would all end up using these updated configs which are only meant for the llap client. Making the configs specific to this script is not great. I can at least rename the parameters so that the configs can be used in other llap cli scripts. Does that work? 12 failed tests. None related. W're slowly creeping away from a green build :( > Add timeouts for various components in llap status check > > > Key: HIVE-14213 > URL: https://issues.apache.org/jira/browse/HIVE-14213 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14213.01.patch > > > The llapstatus check connects to various compoennts - YARN, HDFS via Slider, > ZooKeeper. If either of these components are down - the command can take a > long time to exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14213) Add timeouts for various components in llap status check
[ https://issues.apache.org/jira/browse/HIVE-14213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373913#comment-15373913 ] Siddharth Seth commented on HIVE-14213: --- bq. Why do we need a separate set of config settings? That's really only for the case where defaults are incorrect. A bunch of these settings are common to multiple commands. e.g. the yarn logs command, or yarn application -list would use the parameters for retry. Similarly various dfs commands. I don't think the main config settings can be changed in yarn-site/core-site just for this command - hence the new config variables. bq. On the same note, if the component settings are already set, and the new ones are not set, this will override them with defaults. Perhaps we can just have the default constants for the original parameters (from YARN etc.), and set them if not already set? If the user wants to change them they can just set the originals too Didn't quite understand this. If the new configs are set - they'll be used. Otherwise the defaults will be used. The defaults are supposed to be good enough. > Add timeouts for various components in llap status check > > > Key: HIVE-14213 > URL: https://issues.apache.org/jira/browse/HIVE-14213 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14213.01.patch > > > The llapstatus check connects to various compoennts - YARN, HDFS via Slider, > ZooKeeper. If either of these components are down - the command can take a > long time to exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14213) Add timeouts for various components in llap status check
[ https://issues.apache.org/jira/browse/HIVE-14213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14213: -- Attachment: HIVE-14213.01.patch The patch adds various timeouts. Hadoop and timeouts seems to be quite broken at the moment. Have added some configurable properties in the patch - within LlapStatus itself to work around some of these problems if required. Ideally, these are only temporary, and should not be added to HiveConf. The defaults provide between 10-15seconds to communicate with a service (using hadoop 2.7.x) cc [~prasanth_j], [~sershe] for review. > Add timeouts for various components in llap status check > > > Key: HIVE-14213 > URL: https://issues.apache.org/jira/browse/HIVE-14213 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14213.01.patch > > > The llapstatus check connects to various compoennts - YARN, HDFS via Slider, > ZooKeeper. If either of these components are down - the command can take a > long time to exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14213) Add timeouts for various components in llap status check
[ https://issues.apache.org/jira/browse/HIVE-14213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372231#comment-15372231 ] Siddharth Seth edited comment on HIVE-14213 at 7/12/16 6:16 AM: The patch adds various timeouts. Hadoop and timeouts seems to be quite broken at the moment. Have added some configurable properties in the patch - within LlapStatus itself to work around some of these problems if required. Ideally, these are only temporary, and should not be added to HiveConf. The defaults provide between 10-15seconds to communicate with a service (using hadoop 2.7.x) The patch also changes the cli-log4j file to send all output to a log file by default. cc [~prasanth_j], [~sershe] for review. was (Author: sseth): The patch adds various timeouts. Hadoop and timeouts seems to be quite broken at the moment. Have added some configurable properties in the patch - within LlapStatus itself to work around some of these problems if required. Ideally, these are only temporary, and should not be added to HiveConf. The defaults provide between 10-15seconds to communicate with a service (using hadoop 2.7.x) cc [~prasanth_j], [~sershe] for review. > Add timeouts for various components in llap status check > > > Key: HIVE-14213 > URL: https://issues.apache.org/jira/browse/HIVE-14213 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14213.01.patch > > > The llapstatus check connects to various compoennts - YARN, HDFS via Slider, > ZooKeeper. If either of these components are down - the command can take a > long time to exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14213) Add timeouts for various components in llap status check
[ https://issues.apache.org/jira/browse/HIVE-14213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372231#comment-15372231 ] Siddharth Seth edited comment on HIVE-14213 at 7/12/16 6:17 AM: The patch adds various timeouts. Hadoop and timeouts seems to be quite broken at the moment. Have added some configurable properties in the patch - within LlapStatus itself to work around some of these problems if required. Ideally, these are only temporary, and should not be added to HiveConf. The defaults provide between 10-15seconds to communicate with a service (using hadoop 2.7.x) The patch also changes the cli-log4j file to send all output to a log file by default, along with the configured logger. Can move this into a separate patch if required. cc [~prasanth_j], [~sershe] for review. was (Author: sseth): The patch adds various timeouts. Hadoop and timeouts seems to be quite broken at the moment. Have added some configurable properties in the patch - within LlapStatus itself to work around some of these problems if required. Ideally, these are only temporary, and should not be added to HiveConf. The defaults provide between 10-15seconds to communicate with a service (using hadoop 2.7.x) The patch also changes the cli-log4j file to send all output to a log file by default. cc [~prasanth_j], [~sershe] for review. > Add timeouts for various components in llap status check > > > Key: HIVE-14213 > URL: https://issues.apache.org/jira/browse/HIVE-14213 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14213.01.patch > > > The llapstatus check connects to various compoennts - YARN, HDFS via Slider, > ZooKeeper. If either of these components are down - the command can take a > long time to exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14213) Add timeouts for various components in llap status check
[ https://issues.apache.org/jira/browse/HIVE-14213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14213: -- Status: Patch Available (was: Open) > Add timeouts for various components in llap status check > > > Key: HIVE-14213 > URL: https://issues.apache.org/jira/browse/HIVE-14213 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14213.01.patch > > > The llapstatus check connects to various compoennts - YARN, HDFS via Slider, > ZooKeeper. If either of these components are down - the command can take a > long time to exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9756) LLAP: use log4j 2 for llap (log to separate files, etc.)
[ https://issues.apache.org/jira/browse/HIVE-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378441#comment-15378441 ] Siddharth Seth commented on HIVE-9756: -- {code} + Field field = callableWithNdc.getClass().getSuperclass().getDeclaredField("ndcStack"); + field.setAccessible(true); + Stack ndcStack = (Stack) field.get(callableWithNdc); {code} Can this be replaced by "Stack ndcStack = NDC.cloneStack();" ? This is being done because Tez works with the NDC and log4j1, correct? Also {code} ndcStack.push(dagId); ndcStack.push(queryId); ndcStack.push(fragmentId); {code} with NDC.push(dagId); NDC.push ... There's a lot happening in TaskRunnerCallable between the MDC setup and try {} finally {clear} block. Think we should move everything after the NDC/MDC set into it's own try/finally block. (Similar to the CallerWithNDC?). MDC/NDC vs log4j2 ThreadContext (https://logging.apache.org/log4j/2.x/manual/thread-context.html) - does it matter? > LLAP: use log4j 2 for llap (log to separate files, etc.) > > > Key: HIVE-9756 > URL: https://issues.apache.org/jira/browse/HIVE-9756 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Gunther Hagleitner >Assignee: Prasanth Jayachandran > Attachments: HIVE-9756.1.patch, HIVE-9756.2.patch, HIVE-9756.3.patch, > HIVE-9756.4.patch, HIVE-9756.4.patch, HIVE-9756.5.patch, HIVE-9756.6.patch, > HIVE-9756.7.patch > > > For the INFO logging, we'll need to use the log4j-jcl 2.x upgrade-path to get > throughput friendly logging. > http://logging.apache.org/log4j/2.0/manual/async.html#Performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9756) LLAP: use log4j 2 for llap (log to separate files, etc.)
[ https://issues.apache.org/jira/browse/HIVE-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378469#comment-15378469 ] Siddharth Seth commented on HIVE-9756: -- Also the pop followed by push in TaskRunnerCallable could be replaced with a cloneStack as well. > LLAP: use log4j 2 for llap (log to separate files, etc.) > > > Key: HIVE-9756 > URL: https://issues.apache.org/jira/browse/HIVE-9756 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Gunther Hagleitner >Assignee: Prasanth Jayachandran > Attachments: HIVE-9756.1.patch, HIVE-9756.2.patch, HIVE-9756.3.patch, > HIVE-9756.4.patch, HIVE-9756.4.patch, HIVE-9756.5.patch, HIVE-9756.6.patch, > HIVE-9756.7.patch > > > For the INFO logging, we'll need to use the log4j-jcl 2.x upgrade-path to get > throughput friendly logging. > http://logging.apache.org/log4j/2.0/manual/async.html#Performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-9756) LLAP: use log4j 2 for llap (log to separate files, etc.)
[ https://issues.apache.org/jira/browse/HIVE-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378469#comment-15378469 ] Siddharth Seth edited comment on HIVE-9756 at 7/14/16 9:58 PM: --- Also the pop followed by push in TaskRunnerCallable could be replaced with a cloneStack. was (Author: sseth): Also the pop followed by push in TaskRunnerCallable could be replaced with a cloneStack as well. > LLAP: use log4j 2 for llap (log to separate files, etc.) > > > Key: HIVE-9756 > URL: https://issues.apache.org/jira/browse/HIVE-9756 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Gunther Hagleitner >Assignee: Prasanth Jayachandran > Attachments: HIVE-9756.1.patch, HIVE-9756.2.patch, HIVE-9756.3.patch, > HIVE-9756.4.patch, HIVE-9756.4.patch, HIVE-9756.5.patch, HIVE-9756.6.patch, > HIVE-9756.7.patch > > > For the INFO logging, we'll need to use the log4j-jcl 2.x upgrade-path to get > throughput friendly logging. > http://logging.apache.org/log4j/2.0/manual/async.html#Performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14167) Use work directories provided by Tez instead of directly using YARN local dirs
[ https://issues.apache.org/jira/browse/HIVE-14167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389926#comment-15389926 ] Siddharth Seth commented on HIVE-14167: --- This completely broke LLAP. LlapProxy.setDaemon happens in serviceInit. It's checked in main when starting a regular daemon - so work dirs ends up not being set (i.e. null). It does not break the system tests because they don't use LlapDaemon.main() - and instead instantiate LlapDaemon directly as they should. Reverting and re-opening. > Use work directories provided by Tez instead of directly using YARN local dirs > -- > > Key: HIVE-14167 > URL: https://issues.apache.org/jira/browse/HIVE-14167 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.1.0 >Reporter: Siddharth Seth >Assignee: Wei Zheng > Fix For: 2.2.0, 2.1.1 > > Attachments: HIVE-14167.1.patch, HIVE-14167.2.patch, > HIVE-14167.3.patch > > > HIVE-13303 fixed things to use multiple directories instead of a single tmp > directory. However it's using yarn-local-dirs directly. > I'm not sure how well using the yarn-local-dir will work on a secure cluster. > Would be better to use Tez*Context.getWorkDirs. This provides an app specific > directory - writable by the user. > cc [~sershe] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14167) Use work directories provided by Tez instead of directly using YARN local dirs
[ https://issues.apache.org/jira/browse/HIVE-14167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14167: -- Fix Version/s: (was: 2.1.1) (was: 2.2.0) > Use work directories provided by Tez instead of directly using YARN local dirs > -- > > Key: HIVE-14167 > URL: https://issues.apache.org/jira/browse/HIVE-14167 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.1.0 >Reporter: Siddharth Seth >Assignee: Wei Zheng > Attachments: HIVE-14167.1.patch, HIVE-14167.2.patch, > HIVE-14167.3.patch > > > HIVE-13303 fixed things to use multiple directories instead of a single tmp > directory. However it's using yarn-local-dirs directly. > I'm not sure how well using the yarn-local-dir will work on a secure cluster. > Would be better to use Tez*Context.getWorkDirs. This provides an app specific > directory - writable by the user. > cc [~sershe] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HIVE-14167) Use work directories provided by Tez instead of directly using YARN local dirs
[ https://issues.apache.org/jira/browse/HIVE-14167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth reopened HIVE-14167: --- > Use work directories provided by Tez instead of directly using YARN local dirs > -- > > Key: HIVE-14167 > URL: https://issues.apache.org/jira/browse/HIVE-14167 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.1.0 >Reporter: Siddharth Seth >Assignee: Wei Zheng > Attachments: HIVE-14167.1.patch, HIVE-14167.2.patch, > HIVE-14167.3.patch > > > HIVE-13303 fixed things to use multiple directories instead of a single tmp > directory. However it's using yarn-local-dirs directly. > I'm not sure how well using the yarn-local-dir will work on a secure cluster. > Would be better to use Tez*Context.getWorkDirs. This provides an app specific > directory - writable by the user. > cc [~sershe] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14167) Use work directories provided by Tez instead of directly using YARN local dirs
[ https://issues.apache.org/jira/browse/HIVE-14167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14167: -- Status: Patch Available (was: Reopened) > Use work directories provided by Tez instead of directly using YARN local dirs > -- > > Key: HIVE-14167 > URL: https://issues.apache.org/jira/browse/HIVE-14167 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.1.0 >Reporter: Siddharth Seth >Assignee: Wei Zheng > Attachments: HIVE-14167.1.patch, HIVE-14167.2.patch, > HIVE-14167.3.patch, HIVE-14167.4.patch > > > HIVE-13303 fixed things to use multiple directories instead of a single tmp > directory. However it's using yarn-local-dirs directly. > I'm not sure how well using the yarn-local-dir will work on a secure cluster. > Would be better to use Tez*Context.getWorkDirs. This provides an app specific > directory - writable by the user. > cc [~sershe] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14167) Use work directories provided by Tez instead of directly using YARN local dirs
[ https://issues.apache.org/jira/browse/HIVE-14167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390140#comment-15390140 ] Siddharth Seth commented on HIVE-14167: --- +1. Tried it out. Lets wait for jenkins, or run tests locally before committing. > Use work directories provided by Tez instead of directly using YARN local dirs > -- > > Key: HIVE-14167 > URL: https://issues.apache.org/jira/browse/HIVE-14167 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.1.0 >Reporter: Siddharth Seth >Assignee: Wei Zheng > Attachments: HIVE-14167.1.patch, HIVE-14167.2.patch, > HIVE-14167.3.patch, HIVE-14167.4.patch > > > HIVE-13303 fixed things to use multiple directories instead of a single tmp > directory. However it's using yarn-local-dirs directly. > I'm not sure how well using the yarn-local-dir will work on a secure cluster. > Would be better to use Tez*Context.getWorkDirs. This provides an app specific > directory - writable by the user. > cc [~sershe] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14364) Update timeouts for llap comparator tests
[ https://issues.apache.org/jira/browse/HIVE-14364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14364: -- Attachment: HIVE-14364.01.patch Trivial patch. [~sershe] - could you please take a look. > Update timeouts for llap comparator tests > - > > Key: HIVE-14364 > URL: https://issues.apache.org/jira/browse/HIVE-14364 > Project: Hive > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14364.01.patch > > > The tests timeout occasionally. Increasing to 60 seconds from 5 seconds. > NO_PRECOMMIT_TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14321) Improvements to the log uri provided by llap to tez
[ https://issues.apache.org/jira/browse/HIVE-14321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14321: -- Description: Currently none provided - since yarn does not have a listing page to display aggregated logs. > Improvements to the log uri provided by llap to tez > --- > > Key: HIVE-14321 > URL: https://issues.apache.org/jira/browse/HIVE-14321 > Project: Hive > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > > Currently none provided - since yarn does not have a listing page to display > aggregated logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14225) Llap slider package should support configuring YARN rolling log aggregation
[ https://issues.apache.org/jira/browse/HIVE-14225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14225: -- Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) > Llap slider package should support configuring YARN rolling log aggregation > --- > > Key: HIVE-14225 > URL: https://issues.apache.org/jira/browse/HIVE-14225 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 2.2.0 > > Attachments: HIVE-14225.01.patch, HIVE-14225.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14225) Llap slider package should support configuring YARN rolling log aggregation
[ https://issues.apache.org/jira/browse/HIVE-14225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390872#comment-15390872 ] Siddharth Seth commented on HIVE-14225: --- Test failures are not related. Verified "TestMiniTezCliDriver-dynamic_partition_pruning.q-vector_char_mapjoin1.q-unionDistinct_2.q-and-12-more - did not produce a TEST-*.xml file" locally. The rest keep failing. Committing shortly. Thanks for the review [~prasanth_j]. Will create a follow up jira to fix the log link. > Llap slider package should support configuring YARN rolling log aggregation > --- > > Key: HIVE-14225 > URL: https://issues.apache.org/jira/browse/HIVE-14225 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14225.01.patch, HIVE-14225.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14316) TestLlapTokenChecker.testCheckPermissions, testGetToken fail
[ https://issues.apache.org/jira/browse/HIVE-14316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14316: -- Summary: TestLlapTokenChecker.testCheckPermissions, testGetToken fail (was: TestLlapTokenChecker.testCheckPermissions, testGetToken fails) > TestLlapTokenChecker.testCheckPermissions, testGetToken fail > > > Key: HIVE-14316 > URL: https://issues.apache.org/jira/browse/HIVE-14316 > Project: Hive > Issue Type: Test >Reporter: Siddharth Seth > > cc [~sershe] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14386) UGI clone shim also needs to clone credentials
[ https://issues.apache.org/jira/browse/HIVE-14386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400914#comment-15400914 ] Siddharth Seth commented on HIVE-14386: --- +1. > UGI clone shim also needs to clone credentials > -- > > Key: HIVE-14386 > URL: https://issues.apache.org/jira/browse/HIVE-14386 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14386.patch > > > Discovered while testing HADOOP-13081 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14392) llap daemons should try using YARN local dirs, if available
[ https://issues.apache.org/jira/browse/HIVE-14392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14392: -- Status: Patch Available (was: Open) > llap daemons should try using YARN local dirs, if available > --- > > Key: HIVE-14392 > URL: https://issues.apache.org/jira/browse/HIVE-14392 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14392.01.patch > > > LLAP required hive.llap.daemon.work.dirs to be specified. When running as a > YARN app - this can use the local dirs for the container - removing the > requirement to setup this parameter (for secure and non-secure clusters). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14392) llap daemons should try using YARN local dirs, if available
[ https://issues.apache.org/jira/browse/HIVE-14392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14392: -- Attachment: HIVE-14392.01.patch The ordering is as follows. 1. work.dirs specified - they will be used (except for a specific string value). 2. work.dirs not specified, or specific string value set - try using work dirs from YARN container env. 3. Fail Using the work dirs from the yarn container env gets rid of the problems with having to setup explicit directories for secure clusters. YARN will take care of setting up the base dirs for the llap app - which will operate within this dir where it has access. Containers running for apps which actually run the query (mode=map instead of mode=ALL) - access the data via LLAP shuffle, which knows how to deal with these dirs (noone outside of LLAP is accessing these dirs directly) [~sershe], [~gopalv] - please review. > llap daemons should try using YARN local dirs, if available > --- > > Key: HIVE-14392 > URL: https://issues.apache.org/jira/browse/HIVE-14392 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14392.01.patch > > > LLAP required hive.llap.daemon.work.dirs to be specified. When running as a > YARN app - this can use the local dirs for the container - removing the > requirement to setup this parameter (for secure and non-secure clusters). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14364) Update timeouts for llap comparator tests
[ https://issues.apache.org/jira/browse/HIVE-14364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396663#comment-15396663 ] Siddharth Seth edited comment on HIVE-14364 at 7/27/16 11:39 PM: - bq. Why cannot classloader stuff be moved into pre-test by touching the same classes? That way the test timeout doesn't need to increase. -0 The entire test will hang if there's a disk issue (or take a really long time). It's better to fail fast instead of running forever because of bad hardware. Touching classes as a part of a test because of slow classloading is going to be extremely confusing for anyone looking at this code later. Not maintainable at all. It should not be restricted to this test only. Could you please revoke your -0. was (Author: sseth): bq. Why cannot classloader stuff be moved into pre-test by touching the same classes? That way the test timeout doesn't need to increase. -0 The entire test will hang if there's a disk issue (or take a really long time). It's better to fail fast instead of running forever because of bad hardware. Touching classes as a part of a test because of slow classloading is going to be extremely confusing for anyone looking at this code later. Not maintainable at all. It should not be restricted to this test only. > Update timeouts for llap comparator tests > - > > Key: HIVE-14364 > URL: https://issues.apache.org/jira/browse/HIVE-14364 > Project: Hive > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14364.01.patch > > > The tests timeout occasionally. Increasing to 60 seconds from 5 seconds. > NO_PRECOMMIT_TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14364) Update timeouts for llap comparator tests
[ https://issues.apache.org/jira/browse/HIVE-14364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396663#comment-15396663 ] Siddharth Seth commented on HIVE-14364: --- bq. Why cannot classloader stuff be moved into pre-test by touching the same classes? That way the test timeout doesn't need to increase. -0 The entire test will hang if there's a disk issue (or take a really long time). It's better to fail fast instead of running forever because of bad hardware. Touching classes as a part of a test because of slow classloading is going to be extremely confusing for anyone looking at this code later. Not maintainable at all. It should not be restricted to this test only. > Update timeouts for llap comparator tests > - > > Key: HIVE-14364 > URL: https://issues.apache.org/jira/browse/HIVE-14364 > Project: Hive > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14364.01.patch > > > The tests timeout occasionally. Increasing to 60 seconds from 5 seconds. > NO_PRECOMMIT_TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-14364) Update timeouts for llap comparator tests
[ https://issues.apache.org/jira/browse/HIVE-14364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth resolved HIVE-14364. --- Resolution: Fixed Fix Version/s: 2.2.0 Committed to master. > Update timeouts for llap comparator tests > - > > Key: HIVE-14364 > URL: https://issues.apache.org/jira/browse/HIVE-14364 > Project: Hive > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 2.2.0 > > Attachments: HIVE-14364.01.patch > > > The tests timeout occasionally. Increasing to 60 seconds from 5 seconds. > NO_PRECOMMIT_TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14332) Reduce logging from VectorMapOperator
[ https://issues.apache.org/jira/browse/HIVE-14332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394108#comment-15394108 ] Siddharth Seth commented on HIVE-14332: --- Would it be useful to retain this at the DEBUG level? (or is the information available in some other way). I'm +1 on either - i.e. the current patch, or moving it to debug. Your call on whether it is useful for debugging or not. > Reduce logging from VectorMapOperator > - > > Key: HIVE-14332 > URL: https://issues.apache.org/jira/browse/HIVE-14332 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-14332.01.patch > > > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator: VectorMapOperator > path: > hdfs://cn108-10.l42scl.hortonworks.com:8020/apps/hive/warehouse/tpcds_bin_partitioned_orc_200.db/store_sales/ss_sold_date_sk=2451710, > read type VECTORIZED_INPUT_FILE_FORMAT, vector deserialize type NONE, > aliases store_sales > Lines like this repeat all over the log. This gets really big with a large > number of partitions. 6MB of logs per node for a 30 task query running for 20 > seconds on a 3 node cluster. > Instead of logging this line - can we have a consolidated log / logging only > if something abnormal happens ... or a shorter log message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14403) LLAP node specific preemption will only preempt once on a node per AM
[ https://issues.apache.org/jira/browse/HIVE-14403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405003#comment-15405003 ] Siddharth Seth edited comment on HIVE-14403 at 8/2/16 11:21 PM: Minor update to fix an import and re-introduce a break statement which was accidentally deleted (exited out of a loop early, and functionally harmless to skip it) was (Author: sseth): Minor update to fix an import and re-introduce a break statement which was accidentally deleted (exited out of a loop early, and harmless to not run it) > LLAP node specific preemption will only preempt once on a node per AM > - > > Key: HIVE-14403 > URL: https://issues.apache.org/jira/browse/HIVE-14403 > Project: Hive > Issue Type: Bug >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Critical > Attachments: HIVE-14403.01.patch, HIVE-14403.02.patch > > > Query hang reported by [~cartershanklin] > Turns out that once an AM has preempted a task on a node for locality, it > will not be able to preempt another task on the same node (specifically for > local requests) > Manifests as a query hanging. It's possible for a previous query to interfere > with a subsequent query since the AM is shared. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14403) LLAP node specific preemption will only preempt once on a node per AM
[ https://issues.apache.org/jira/browse/HIVE-14403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14403: -- Attachment: HIVE-14403.02.patch Minor update to fix an import and re-introduce a break statement which was accidentally deleted (exited out of a loop early, and harmless to not run it) > LLAP node specific preemption will only preempt once on a node per AM > - > > Key: HIVE-14403 > URL: https://issues.apache.org/jira/browse/HIVE-14403 > Project: Hive > Issue Type: Bug >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Critical > Attachments: HIVE-14403.01.patch, HIVE-14403.02.patch > > > Query hang reported by [~cartershanklin] > Turns out that once an AM has preempted a task on a node for locality, it > will not be able to preempt another task on the same node (specifically for > local requests) > Manifests as a query hanging. It's possible for a previous query to interfere > with a subsequent query since the AM is shared. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14504) tez_join_hash.q test is slow
[ https://issues.apache.org/jira/browse/HIVE-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417668#comment-15417668 ] Siddharth Seth commented on HIVE-14504: --- I'm not sure if pre-commit is configured to ignore patches which only change tests? > tez_join_hash.q test is slow > > > Key: HIVE-14504 > URL: https://issues.apache.org/jira/browse/HIVE-14504 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14504.1.patch, HIVE-14504.1.patch, > HIVE-14504.1.patch > > > tez_join_hash.q also explicitly sets execution engine to mr which slows down > the entire test. Test takes around 7 mins. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14405) Have tests log to the console along with hive.log
[ https://issues.apache.org/jira/browse/HIVE-14405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413880#comment-15413880 ] Siddharth Seth commented on HIVE-14405: --- Thanks for taking a look [~ashutoshc]. My concern is that this doubles the amount of logging. I'll take a look to see if we can disable DEBUG level logging for some of the noisy Hadoop components to cut the overall log size. > Have tests log to the console along with hive.log > - > > Key: HIVE-14405 > URL: https://issues.apache.org/jira/browse/HIVE-14405 > Project: Hive > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14405.01.patch > > > When running tests from the IDE (not itests), logs end up going to hive.log - > making it difficult to debug tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14501) MiniTez test for union_type_chk.q is slow
[ https://issues.apache.org/jira/browse/HIVE-14501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415756#comment-15415756 ] Siddharth Seth commented on HIVE-14501: --- +1 > MiniTez test for union_type_chk.q is slow > - > > Key: HIVE-14501 > URL: https://issues.apache.org/jira/browse/HIVE-14501 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14501.1.patch > > > union_type_chk.q runs on minimr and minitez but the test itself explicitly > sets execution engine as mr. It takes around 10 mins to run this test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14502) Convert MiniTez tests to MiniLlap tests
[ https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415819#comment-15415819 ] Siddharth Seth commented on HIVE-14502: --- I don't think we should move MiniLlap to use MiniHbase - it is not the default at the moment. Maybe retain a few tests on MiniTez which can run with MiniHBase. Setup times for MiniHBase metatstore based tests is 3 minutes. 1 minute for regular tests. The 1 minute will be cut down after HIVE-13496. A similar effort could be taken up for MiniHbase. > Convert MiniTez tests to MiniLlap tests > --- > > Key: HIVE-14502 > URL: https://issues.apache.org/jira/browse/HIVE-14502 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > > Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster > than MiniTezCliDriver because of threaded executors and caching. > MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To > cut down this test time significantly it makes sense to move over mive tez > tests to mini llap tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14439) LlapTaskScheduler should try scheduling tasks when a node is disabled
[ https://issues.apache.org/jira/browse/HIVE-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414369#comment-15414369 ] Siddharth Seth commented on HIVE-14439: --- Thanks for the reviews. Committing. > LlapTaskScheduler should try scheduling tasks when a node is disabled > - > > Key: HIVE-14439 > URL: https://issues.apache.org/jira/browse/HIVE-14439 > Project: Hive > Issue Type: Bug >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14439.01.patch, HIVE-14439.02.patch, > HIVE-14439.03.patch > > > When a node is disabled - try scheduling pending tasks. Tasks which may have > been waiting for the node to become available could become candidates for > scheduling on alternate nodes depending on the locality delay and disable > duration. > This is what is causing an occasional timeout on > testDelayedLocalityNodeCommErrorImmediateAllocation -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14439) LlapTaskScheduler should try scheduling tasks when a node is disabled
[ https://issues.apache.org/jira/browse/HIVE-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14439: -- Resolution: Fixed Fix Version/s: 2.1.1 Status: Resolved (was: Patch Available) > LlapTaskScheduler should try scheduling tasks when a node is disabled > - > > Key: HIVE-14439 > URL: https://issues.apache.org/jira/browse/HIVE-14439 > Project: Hive > Issue Type: Bug >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 2.1.1 > > Attachments: HIVE-14439.01.patch, HIVE-14439.02.patch, > HIVE-14439.03.patch > > > When a node is disabled - try scheduling pending tasks. Tasks which may have > been waiting for the node to become available could become candidates for > scheduling on alternate nodes depending on the locality delay and disable > duration. > This is what is causing an occasional timeout on > testDelayedLocalityNodeCommErrorImmediateAllocation -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14501) MiniTez test for union_type_chk.q is slow
[ https://issues.apache.org/jira/browse/HIVE-14501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14501: -- Fix Version/s: 2.2.0 > MiniTez test for union_type_chk.q is slow > - > > Key: HIVE-14501 > URL: https://issues.apache.org/jira/browse/HIVE-14501 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 2.2.0 > > Attachments: HIVE-14501.1.patch > > > union_type_chk.q runs on minimr and minitez but the test itself explicitly > sets execution engine as mr. It takes around 10 mins to run this test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14403) LLAP node specific preemption will only preempt once on a node per AM
[ https://issues.apache.org/jira/browse/HIVE-14403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14403: -- Resolution: Fixed Fix Version/s: 2.1.1 Status: Resolved (was: Patch Available) Committed to master and branch-2.1 > LLAP node specific preemption will only preempt once on a node per AM > - > > Key: HIVE-14403 > URL: https://issues.apache.org/jira/browse/HIVE-14403 > Project: Hive > Issue Type: Bug >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Critical > Fix For: 2.1.1 > > Attachments: HIVE-14403.01.patch, HIVE-14403.02.patch > > > Query hang reported by [~cartershanklin] > Turns out that once an AM has preempted a task on a node for locality, it > will not be able to preempt another task on the same node (specifically for > local requests) > Manifests as a query hanging. It's possible for a previous query to interfere > with a subsequent query since the AM is shared. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14403) LLAP node specific preemption will only preempt once on a node per AM
[ https://issues.apache.org/jira/browse/HIVE-14403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406180#comment-15406180 ] Siddharth Seth commented on HIVE-14403: --- Test failures are unrelated. Cannot reproduce TestLlapTaskSchedulerService.testDelayedLocalityNodeCommErrorImmediateAllocation on local runs. Will walk through the test in a separate jira to identify flakiness. > LLAP node specific preemption will only preempt once on a node per AM > - > > Key: HIVE-14403 > URL: https://issues.apache.org/jira/browse/HIVE-14403 > Project: Hive > Issue Type: Bug >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Critical > Attachments: HIVE-14403.01.patch, HIVE-14403.02.patch > > > Query hang reported by [~cartershanklin] > Turns out that once an AM has preempted a task on a node for locality, it > will not be able to preempt another task on the same node (specifically for > local requests) > Manifests as a query hanging. It's possible for a previous query to interfere > with a subsequent query since the AM is shared. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14504) tez_join_hash.q test is slow
[ https://issues.apache.org/jira/browse/HIVE-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419579#comment-15419579 ] Siddharth Seth commented on HIVE-14504: --- +1 > tez_join_hash.q test is slow > > > Key: HIVE-14504 > URL: https://issues.apache.org/jira/browse/HIVE-14504 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14504.1.patch, HIVE-14504.1.patch, > HIVE-14504.1.patch > > > tez_join_hash.q also explicitly sets execution engine to mr which slows down > the entire test. Test takes around 7 mins. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14405) Have tests log to the console along with hive.log
[ https://issues.apache.org/jira/browse/HIVE-14405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415812#comment-15415812 ] Siddharth Seth commented on HIVE-14405: --- Looks like all of that is already in place. Re-triggering a jenkins run to see what this looks like. May need to change the console logging to INFO level (and let default debug logs go to hive.log) > Have tests log to the console along with hive.log > - > > Key: HIVE-14405 > URL: https://issues.apache.org/jira/browse/HIVE-14405 > Project: Hive > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14405.01.patch > > > When running tests from the IDE (not itests), logs end up going to hive.log - > making it difficult to debug tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14213) Add timeouts for various components in llap status check
[ https://issues.apache.org/jira/browse/HIVE-14213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14213: -- Description: The llapstatus check connects to various compoennts - YARN, HDFS via Slider, ZooKeeper. If either of these components are down - the command can take a long time to exit. NO PRECOMMIT TESTS was:The llapstatus check connects to various compoennts - YARN, HDFS via Slider, ZooKeeper. If either of these components are down - the command can take a long time to exit. > Add timeouts for various components in llap status check > > > Key: HIVE-14213 > URL: https://issues.apache.org/jira/browse/HIVE-14213 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14213.01.patch > > > The llapstatus check connects to various compoennts - YARN, HDFS via Slider, > ZooKeeper. If either of these components are down - the command can take a > long time to exit. > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14213) Add timeouts for various components in llap status check
[ https://issues.apache.org/jira/browse/HIVE-14213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14213: -- Attachment: HIVE-14213.02.patch Updated patch with the config names changed to use "llapcli" instead of "llapstatus". bq. Can you still only set configs if not already set? At least when replacing with defaults. The intent is to avoid the cluster defaults, and setup values for llapstatus so that it fails fast - rather than re-trying per the cluster default retry policy. > Add timeouts for various components in llap status check > > > Key: HIVE-14213 > URL: https://issues.apache.org/jira/browse/HIVE-14213 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14213.01.patch, HIVE-14213.02.patch > > > The llapstatus check connects to various compoennts - YARN, HDFS via Slider, > ZooKeeper. If either of these components are down - the command can take a > long time to exit. > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14223) beeline should look for jdbc standalone jar in dist/jdbc dir instead of dist/lib
[ https://issues.apache.org/jira/browse/HIVE-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375495#comment-15375495 ] Siddharth Seth commented on HIVE-14223: --- +1 > beeline should look for jdbc standalone jar in dist/jdbc dir instead of > dist/lib > > > Key: HIVE-14223 > URL: https://issues.apache.org/jira/browse/HIVE-14223 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 2.0.1, 2.2.0, 2.1.1 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14223.1.patch, HIVE-14223.2.patch > > > HIVE-13134 changed the jdbc-standalone jar path to dist/jdbc instead of > dist/lib. beeline.sh still looks for the jar in dist/lib which throws the > following error > {code} > ls: cannot access /work/hive2/lib/hive-jdbc-*-standalone.jar: No such file or > directory > {code} > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14224) LLAP rename query specific log files once a query is complete
[ https://issues.apache.org/jira/browse/HIVE-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14224: -- Attachment: HIVE-14224.wip.01.patch WIP patch based on top of HIVE-13258, HIVE-9756.5. Still needs to be tested with LLAP. Have tested it independently in a separate application. cc [~prasanth_j] > LLAP rename query specific log files once a query is complete > - > > Key: HIVE-14224 > URL: https://issues.apache.org/jira/browse/HIVE-14224 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14224.wip.01.patch > > > Once a query is complete, rename the query specific log file so that YARN can > aggregate the logs (once it's configured to do so). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14213) Add timeouts for various components in llap status check
[ https://issues.apache.org/jira/browse/HIVE-14213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14213: -- Resolution: Fixed Fix Version/s: 2.1.1 Status: Resolved (was: Patch Available) Thanks for the review. Committed to master and branch-2.1 > Add timeouts for various components in llap status check > > > Key: HIVE-14213 > URL: https://issues.apache.org/jira/browse/HIVE-14213 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 2.1.1 > > Attachments: HIVE-14213.01.patch, HIVE-14213.02.patch > > > The llapstatus check connects to various compoennts - YARN, HDFS via Slider, > ZooKeeper. If either of these components are down - the command can take a > long time to exit. > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14224) LLAP rename query specific log files once a query is complete
[ https://issues.apache.org/jira/browse/HIVE-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14224: -- Attachment: HIVE-14224.03.patch Updated patch with RB comments addressed. Also added logic to handle the case where a filename collision could happen. > LLAP rename query specific log files once a query is complete > - > > Key: HIVE-14224 > URL: https://issues.apache.org/jira/browse/HIVE-14224 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14224.02.patch, HIVE-14224.03.patch, > HIVE-14224.wip.01.patch > > > Once a query is complete, rename the query specific log file so that YARN can > aggregate the logs (once it's configured to do so). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14225) Llap slider package should support configuring YARN rolling log aggregation
[ https://issues.apache.org/jira/browse/HIVE-14225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386838#comment-15386838 ] Siddharth Seth commented on HIVE-14225: --- Think the query-routing name still makes sense - since this is query based routing. > Llap slider package should support configuring YARN rolling log aggregation > --- > > Key: HIVE-14225 > URL: https://issues.apache.org/jira/browse/HIVE-14225 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14225.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14225) Llap slider package should support configuring YARN rolling log aggregation
[ https://issues.apache.org/jira/browse/HIVE-14225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14225: -- Status: Patch Available (was: Open) > Llap slider package should support configuring YARN rolling log aggregation > --- > > Key: HIVE-14225 > URL: https://issues.apache.org/jira/browse/HIVE-14225 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14225.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14224) LLAP rename query specific log files once a query is complete
[ https://issues.apache.org/jira/browse/HIVE-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14224: -- Attachment: HIVE-14224.05.patch Updated patch with a log message, and some null checks. The exception handler can be a separate jira. > LLAP rename query specific log files once a query is complete > - > > Key: HIVE-14224 > URL: https://issues.apache.org/jira/browse/HIVE-14224 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14224.02.patch, HIVE-14224.03.patch, > HIVE-14224.04.patch, HIVE-14224.05.patch, HIVE-14224.wip.01.patch > > > Once a query is complete, rename the query specific log file so that YARN can > aggregate the logs (once it's configured to do so). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14224) LLAP rename query specific log files once a query is complete
[ https://issues.apache.org/jira/browse/HIVE-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386854#comment-15386854 ] Siddharth Seth commented on HIVE-14224: --- Thanks for the reviews. Committing. The test failures are not related. > LLAP rename query specific log files once a query is complete > - > > Key: HIVE-14224 > URL: https://issues.apache.org/jira/browse/HIVE-14224 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14224.02.patch, HIVE-14224.03.patch, > HIVE-14224.04.patch, HIVE-14224.05.patch, HIVE-14224.wip.01.patch > > > Once a query is complete, rename the query specific log file so that YARN can > aggregate the logs (once it's configured to do so). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14224) LLAP rename query specific log files once a query is complete
[ https://issues.apache.org/jira/browse/HIVE-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14224: -- Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) Created HIVE-14300 to track the race mentioned in the comments. > LLAP rename query specific log files once a query is complete > - > > Key: HIVE-14224 > URL: https://issues.apache.org/jira/browse/HIVE-14224 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 2.2.0 > > Attachments: HIVE-14224.02.patch, HIVE-14224.03.patch, > HIVE-14224.04.patch, HIVE-14224.05.patch, HIVE-14224.wip.01.patch > > > Once a query is complete, rename the query specific log file so that YARN can > aggregate the logs (once it's configured to do so). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14224) LLAP rename query specific log files once a query is complete
[ https://issues.apache.org/jira/browse/HIVE-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14224: -- Attachment: HIVE-14224.04.patch Noticed some issues with the previous patch while testing it more. 1. The filename handling was broken with renames. 2. The appender was getting closed outside of the AsyncLogging thread - which would mean a race in closing it. This patch changes the approach on informing the logging system that a query is done by sending a LOG message with a custom marker. This works better in terms of being invoked on the correct thread - so the Appender.stop() should be called after relevant log messages for the specific context. There's still a race caused by queryComplete messages coming from the AM / wrapping up structures like TaskRunnerCallable locally (we inform the AM of success before cleaning up everything for a task). This can result in the same file sitting around with and without a ".done" flag. Haven't removed the dag-specific logger yet. Will break a subsequent patch. That can be done in a followup. [~prasanth_j] - could you take a quick look at the changes again please. We should probably disable this by default in a subsequent patch (HIVE-14225) due to the race, and the potential of generating a large number of files - test it more before enabling by default. > LLAP rename query specific log files once a query is complete > - > > Key: HIVE-14224 > URL: https://issues.apache.org/jira/browse/HIVE-14224 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14224.02.patch, HIVE-14224.03.patch, > HIVE-14224.04.patch, HIVE-14224.wip.01.patch > > > Once a query is complete, rename the query specific log file so that YARN can > aggregate the logs (once it's configured to do so). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14167) Use work directories provided by Tez instead of directly using YARN local dirs
[ https://issues.apache.org/jira/browse/HIVE-14167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385129#comment-15385129 ] Siddharth Seth commented on HIVE-14167: --- +1. Assuming you've tested it on a cluster, and seen the correct directories being used. > Use work directories provided by Tez instead of directly using YARN local dirs > -- > > Key: HIVE-14167 > URL: https://issues.apache.org/jira/browse/HIVE-14167 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.1.0 >Reporter: Siddharth Seth >Assignee: Wei Zheng > Attachments: HIVE-14167.1.patch, HIVE-14167.2.patch, > HIVE-14167.3.patch > > > HIVE-13303 fixed things to use multiple directories instead of a single tmp > directory. However it's using yarn-local-dirs directly. > I'm not sure how well using the yarn-local-dir will work on a secure cluster. > Would be better to use Tez*Context.getWorkDirs. This provides an app specific > directory - writable by the user. > cc [~sershe] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14225) Llap slider package should support configuring YARN rolling log aggregation
[ https://issues.apache.org/jira/browse/HIVE-14225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14225: -- Attachment: HIVE-14225.01.patch Patch to - configure slider to inform YARN to aggregate files with the name .done. - Removes the query-based routing - Moves to RFA as the default router, since query-routing still requires some work. - Adds a value in HiveConf - similar to other variables like container-size, to access this value at runtime (when present in hive-site.xml) > Llap slider package should support configuring YARN rolling log aggregation > --- > > Key: HIVE-14225 > URL: https://issues.apache.org/jira/browse/HIVE-14225 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14225.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14213) Add timeouts for various components in llap status check
[ https://issues.apache.org/jira/browse/HIVE-14213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15380148#comment-15380148 ] Siddharth Seth commented on HIVE-14213: --- bq. So no documentation is needed, right? No. I don't think we should document these settings. > Add timeouts for various components in llap status check > > > Key: HIVE-14213 > URL: https://issues.apache.org/jira/browse/HIVE-14213 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 2.1.1 > > Attachments: HIVE-14213.01.patch, HIVE-14213.02.patch > > > The llapstatus check connects to various compoennts - YARN, HDFS via Slider, > ZooKeeper. If either of these components are down - the command can take a > long time to exit. > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14167) Use work directories provided by Tez instead of directly using YARN local dirs
[ https://issues.apache.org/jira/browse/HIVE-14167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378453#comment-15378453 ] Siddharth Seth commented on HIVE-14167: --- {code} +if (LlapProxy.isDaemon()) { + localDirList = HiveConf.getVar(conf, HiveConf.ConfVars.LLAP_DAEMON_WORK_DIRS); + if (localDirList != null && !localDirList.isEmpty()) { +return localDirList; + } // otherwise, fall back to use tez work dirs +} {code} This can be improved further for LLAP. WIll create a follow up jira. {code} if (conf.get("hive.execution.engine").equals("tez")) { {code} HiveConf.get( ... instead of conf.get ? {code} return null; {code} I suspect this will not work too well with HiveOnSpark. > Use work directories provided by Tez instead of directly using YARN local dirs > -- > > Key: HIVE-14167 > URL: https://issues.apache.org/jira/browse/HIVE-14167 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.1.0 >Reporter: Siddharth Seth >Assignee: Wei Zheng > Attachments: HIVE-14167.1.patch > > > HIVE-13303 fixed things to use multiple directories instead of a single tmp > directory. However it's using yarn-local-dirs directly. > I'm not sure how well using the yarn-local-dir will work on a secure cluster. > Would be better to use Tez*Context.getWorkDirs. This provides an app specific > directory - writable by the user. > cc [~sershe] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14111) better concurrency handling for TezSessionState - part I
[ https://issues.apache.org/jira/browse/HIVE-14111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373545#comment-15373545 ] Siddharth Seth commented on HIVE-14111: --- +1. Is the test failure on TestMiniTezCliDriver-tez_self_join.q-filter_join_breaktask.q-vector_decimal_precision.q-and-12-more - did not produce a TEST-*.xml file related ? Looks new. > better concurrency handling for TezSessionState - part I > > > Key: HIVE-14111 > URL: https://issues.apache.org/jira/browse/HIVE-14111 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14111.01.patch, HIVE-14111.02.patch, > HIVE-14111.03.patch, HIVE-14111.04.patch, HIVE-14111.05.patch, > HIVE-14111.06.patch, HIVE-14111.patch, sessionPoolNotes.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14224) LLAP rename query specific log files once a query is complete
[ https://issues.apache.org/jira/browse/HIVE-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15380393#comment-15380393 ] Siddharth Seth commented on HIVE-14224: --- Have to check whether any of the other log4j2 config files need updating after moving to log4j 2.6.2 > LLAP rename query specific log files once a query is complete > - > > Key: HIVE-14224 > URL: https://issues.apache.org/jira/browse/HIVE-14224 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14224.02.patch, HIVE-14224.wip.01.patch > > > Once a query is complete, rename the query specific log file so that YARN can > aggregate the logs (once it's configured to do so). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14224) LLAP rename query specific log files once a query is complete
[ https://issues.apache.org/jira/browse/HIVE-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14224: -- Attachment: HIVE-14224.02.patch Updated patch. This has the changes the query-router to log using he queryId and dagId. This is 1) to separate files for multi-stage queries, and 2) to make it easy to identify the dagId associated with a queryId (Eventually Hive will hopefully make this available via HS2). Also updated the HistoryLogger to include a time setup - otherwise log4j2 2.6.x complains about no date pattern despit using a Time+Size policy. The bit about not overwriting files for ext queries needs to be fixed. I'll take care of that in HIVE-14225 or a jira after that which updates the log links on the UI. [~prasanth_j] - could you please review. > LLAP rename query specific log files once a query is complete > - > > Key: HIVE-14224 > URL: https://issues.apache.org/jira/browse/HIVE-14224 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14224.02.patch, HIVE-14224.wip.01.patch > > > Once a query is complete, rename the query specific log file so that YARN can > aggregate the logs (once it's configured to do so). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14224) LLAP rename query specific log files once a query is complete
[ https://issues.apache.org/jira/browse/HIVE-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14224: -- Status: Patch Available (was: Open) > LLAP rename query specific log files once a query is complete > - > > Key: HIVE-14224 > URL: https://issues.apache.org/jira/browse/HIVE-14224 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14224.02.patch, HIVE-14224.wip.01.patch > > > Once a query is complete, rename the query specific log file so that YARN can > aggregate the logs (once it's configured to do so). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14163) LLAP: use different kerberized/unkerberized zk paths for registry
[ https://issues.apache.org/jira/browse/HIVE-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363229#comment-15363229 ] Siddharth Seth commented on HIVE-14163: --- [~sershe] - should we making the namespace configurable ? > LLAP: use different kerberized/unkerberized zk paths for registry > - > > Key: HIVE-14163 > URL: https://issues.apache.org/jira/browse/HIVE-14163 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14163.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14163) LLAP: use different kerberized/unkerberized zk paths for registry
[ https://issues.apache.org/jira/browse/HIVE-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363260#comment-15363260 ] Siddharth Seth commented on HIVE-14163: --- Shared ZK - admins want to control paths maybe. If the same cluster is changed from secure to unsecure or the other way around - there's alternate ways to fix this, rather than selecting different paths. (The secure path breaks if security settings are changed) > LLAP: use different kerberized/unkerberized zk paths for registry > - > > Key: HIVE-14163 > URL: https://issues.apache.org/jira/browse/HIVE-14163 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14163.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14167) Use work directories provided by Tez instead of directly using YARN local dirs
[ https://issues.apache.org/jira/browse/HIVE-14167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14167: -- Affects Version/s: 2.1.0 Target Version/s: 2.2.0 > Use work directories provided by Tez instead of directly using YARN local dirs > -- > > Key: HIVE-14167 > URL: https://issues.apache.org/jira/browse/HIVE-14167 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.1.0 >Reporter: Siddharth Seth > > HIVE-13303 fixed things to use multiple directories instead of a single tmp > directory. However it's using yarn-local-dirs directly. > I'm not sure how well using the yarn-local-dir will work on a secure cluster. > Would be better to use Tez*Context.getWorkDirs. This provides an app specific > directory - writable by the user. > cc [~sershe] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9756) LLAP: use log4j 2 for llap (log to separate files, etc.)
[ https://issues.apache.org/jira/browse/HIVE-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363252#comment-15363252 ] Siddharth Seth commented on HIVE-9756: -- [~prasanth_j] - along with this, I think we need to add an option to the 'hive --service llap' script to provide the logger to be used. It's hardcoded to RFA at the moment. Also it'll be useful to use either the dagId or the queryId in the filename - maybe as a follow up. > LLAP: use log4j 2 for llap (log to separate files, etc.) > > > Key: HIVE-9756 > URL: https://issues.apache.org/jira/browse/HIVE-9756 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Gunther Hagleitner >Assignee: Prasanth Jayachandran > Attachments: HIVE-9756.1.patch, HIVE-9756.2.patch, HIVE-9756.3.patch, > HIVE-9756.4.patch, HIVE-9756.4.patch, HIVE-9756.5.patch, HIVE-9756.6.patch > > > For the INFO logging, we'll need to use the log4j-jcl 2.x upgrade-path to get > throughput friendly logging. > http://logging.apache.org/log4j/2.0/manual/async.html#Performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14111) better concurrency handling for TezSessionState - part I
[ https://issues.apache.org/jira/browse/HIVE-14111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15366550#comment-15366550 ] Siddharth Seth commented on HIVE-14111: --- Lets wait for HiveQA - it's identified a bunch of issues already. That said, it seems to be identifying issues with QTestUtil rather than the main code. Hopefully some tests are covering the main flow. > better concurrency handling for TezSessionState - part I > > > Key: HIVE-14111 > URL: https://issues.apache.org/jira/browse/HIVE-14111 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14111.01.patch, HIVE-14111.02.patch, > HIVE-14111.03.patch, HIVE-14111.04.patch, HIVE-14111.patch, > sessionPoolNotes.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14163) LLAP: use different kerberized/unkerberized zk paths for registry
[ https://issues.apache.org/jira/browse/HIVE-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364793#comment-15364793 ] Siddharth Seth commented on HIVE-14163: --- +1. Looks good, other than a small change to move userPathPrefix before the instance of the inner class which references it (probably does not matter). > LLAP: use different kerberized/unkerberized zk paths for registry > - > > Key: HIVE-14163 > URL: https://issues.apache.org/jira/browse/HIVE-14163 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14163.01.patch, HIVE-14163.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14168) Avoid serializing all parameters from HiveConf.java into in-memory HiveConf instances
[ https://issues.apache.org/jira/browse/HIVE-14168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368644#comment-15368644 ] Siddharth Seth commented on HIVE-14168: --- Any thoughts on this ? > Avoid serializing all parameters from HiveConf.java into in-memory HiveConf > instances > - > > Key: HIVE-14168 > URL: https://issues.apache.org/jira/browse/HIVE-14168 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Priority: Critical > > All non-null parameters from HiveConf.java are explicitly set in each > HiveConf instance. > {code} > // Overlay the ConfVars. Note that this ignores ConfVars with null values > addResource(getConfVarInputStream()); > {code} > This unnecessarily bloats each Configuration object - 400+ conf variables > being set instead of probably <30 which would exist in hive-site.xml. > Looking at a HS2 heapdump - HiveConf is almost always the largest component > by a long way. Conf objects are also serialized very often - transmitting > lots of unneeded variables (serialized Hive conf is typically 1000+ variables > - due to Hadoop injecting it's configs into every config instance). > As long as HiveConf.get() is the approach used to read from a config - this > is avoidable. Hive code itself should be doing this. > This would be a potentially incompatible change for UDFs and other plugins > which have access to a Configuration object. > I'd suggest turning off the insert by default, and adding a flag to control > this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14197) LLAP service driver precondition failure should include the values
[ https://issues.apache.org/jira/browse/HIVE-14197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368460#comment-15368460 ] Siddharth Seth commented on HIVE-14197: --- +1 > LLAP service driver precondition failure should include the values > -- > > Key: HIVE-14197 > URL: https://issues.apache.org/jira/browse/HIVE-14197 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14197.1.patch > > > LLAP service driver's precondition failure message are like below > {code} > Working memory + cache has to be smaller than the container sizing > {code} > It will be better to include the actual values for the sizes in the > precondition failure message. > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14202) Change tez version used to 0.8.4
[ https://issues.apache.org/jira/browse/HIVE-14202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14202: -- Attachment: HIVE-14202.01.patch > Change tez version used to 0.8.4 > > > Key: HIVE-14202 > URL: https://issues.apache.org/jira/browse/HIVE-14202 > Project: Hive > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14202.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14202) Change tez version used to 0.8.4
[ https://issues.apache.org/jira/browse/HIVE-14202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14202: -- Status: Patch Available (was: Open) > Change tez version used to 0.8.4 > > > Key: HIVE-14202 > URL: https://issues.apache.org/jira/browse/HIVE-14202 > Project: Hive > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14202.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14168) Avoid serializing all parameters from HiveConf.java into in-memory HiveConf instances
[ https://issues.apache.org/jira/browse/HIVE-14168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14168: -- Target Version/s: 2.2.0 > Avoid serializing all parameters from HiveConf.java into in-memory HiveConf > instances > - > > Key: HIVE-14168 > URL: https://issues.apache.org/jira/browse/HIVE-14168 > Project: Hive > Issue Type: Bug >Reporter: Siddharth Seth >Priority: Critical > > All non-null parameters from HiveConf.java are explicitly set in each > HiveConf instance. > {code} > // Overlay the ConfVars. Note that this ignores ConfVars with null values > addResource(getConfVarInputStream()); > {code} > This unnecessarily bloats each Configuration object - 400+ conf variables > being set instead of probably <30 which would exist in hive-site.xml. > Looking at a HS2 heapdump - HiveConf is almost always the largest component > by a long way. Conf objects are also serialized very often - transmitting > lots of unneeded variables (serialized Hive conf is typically 1000+ variables > - due to Hadoop injecting it's configs into every config instance). > As long as HiveConf.get() is the approach used to read from a config - this > is avoidable. Hive code itself should be doing this. > This would be a potentially incompatible change for UDFs and other plugins > which have access to a Configuration object. > I'd suggest turning off the insert by default, and adding a flag to control > this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14168) Avoid serializing all parameters from HiveConf.java into in-memory HiveConf instances
[ https://issues.apache.org/jira/browse/HIVE-14168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14168: -- Issue Type: Improvement (was: Bug) > Avoid serializing all parameters from HiveConf.java into in-memory HiveConf > instances > - > > Key: HIVE-14168 > URL: https://issues.apache.org/jira/browse/HIVE-14168 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Priority: Critical > > All non-null parameters from HiveConf.java are explicitly set in each > HiveConf instance. > {code} > // Overlay the ConfVars. Note that this ignores ConfVars with null values > addResource(getConfVarInputStream()); > {code} > This unnecessarily bloats each Configuration object - 400+ conf variables > being set instead of probably <30 which would exist in hive-site.xml. > Looking at a HS2 heapdump - HiveConf is almost always the largest component > by a long way. Conf objects are also serialized very often - transmitting > lots of unneeded variables (serialized Hive conf is typically 1000+ variables > - due to Hadoop injecting it's configs into every config instance). > As long as HiveConf.get() is the approach used to read from a config - this > is avoidable. Hive code itself should be doing this. > This would be a potentially incompatible change for UDFs and other plugins > which have access to a Configuration object. > I'd suggest turning off the insert by default, and adding a flag to control > this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14202) Change tez version used to 0.8.4
[ https://issues.apache.org/jira/browse/HIVE-14202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14202: -- Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) Thanks for the review. API changes in the next release. :) Committed to master. > Change tez version used to 0.8.4 > > > Key: HIVE-14202 > URL: https://issues.apache.org/jira/browse/HIVE-14202 > Project: Hive > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 2.2.0 > > Attachments: HIVE-14202.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14421) FS.deleteOnExit holds references to _tmp_space.db files
[ https://issues.apache.org/jira/browse/HIVE-14421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14421: -- Attachment: HIVE-14421.02.patch Updated patch to address review comments. [~thejas] - could you please take another look. > FS.deleteOnExit holds references to _tmp_space.db files > --- > > Key: HIVE-14421 > URL: https://issues.apache.org/jira/browse/HIVE-14421 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14421.01.patch, HIVE-14421.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14430) More instances of HiveConf and the associated UDFClassLoader than expected
[ https://issues.apache.org/jira/browse/HIVE-14430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15409781#comment-15409781 ] Siddharth Seth commented on HIVE-14430: --- Not sure. I think the thread local instances are cleaned up at session shutdown? cc [~vgumashta] > More instances of HiveConf and the associated UDFClassLoader than expected > -- > > Key: HIVE-14430 > URL: https://issues.apache.org/jira/browse/HIVE-14430 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Siddharth Seth >Priority: Critical > > 841 instances of HiveConf. > 831 instances of UDFClassLoader > This is on a HS2 instance configured to run 10 concurrent queries with LLAP. > 10 SessionState instances. Something is holding on to the additional > HiveConf, UDFClassLoaders - potentially HMSHandler. > This is with an embedded metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14439) LlapTaskScheduler should try scheduling tasks when a node is disabled
[ https://issues.apache.org/jira/browse/HIVE-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14439: -- Attachment: HIVE-14439.01.patch cc [~prasanth_j], [~hagleitn] for review. > LlapTaskScheduler should try scheduling tasks when a node is disabled > - > > Key: HIVE-14439 > URL: https://issues.apache.org/jira/browse/HIVE-14439 > Project: Hive > Issue Type: Bug >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14439.01.patch > > > When a node is disabled - try scheduling pending tasks. Tasks which may have > been waiting for the node to become available could become candidates for > scheduling on alternate nodes depending on the locality delay and disable > duration. > This is what is causing an occasional timeout on > testDelayedLocalityNodeCommErrorImmediateAllocation -- This message was sent by Atlassian JIRA (v6.3.4#6332)