[jira] [Commented] (HIVE-14953) don't use globStatus on S3 in MM tables

2017-11-13 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250738#comment-16250738
 ] 

Lefty Leverenz commented on HIVE-14953:
---

Doc note:  This adds *hive.mm.avoid.s3.globstatus* to HiveConf.java and 
branch-14535 has been merged to master for release 3.0.0 by HIVE-15212, so the 
wiki needs to be updated.

I'm not sure where *hive.mm.avoid.s3.globstatus* belongs in Configuration 
Properties.  Perhaps the Transactions section should have a subsection, 
although so far this is the only new parameter that needs to be documented.

* [Configuration Properties -- Transactions and Compactor | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-TransactionsandCompactor]

Added a TODOC3.0.0 label.

> don't use globStatus on S3 in MM tables
> ---
>
> Key: HIVE-14953
> URL: https://issues.apache.org/jira/browse/HIVE-14953
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
>  Labels: TODOC3.0
> Fix For: hive-14535
>
> Attachments: HIVE-14953.01.patch, HIVE-14953.patch
>
>
> Need to investigate if recursive get is faster. Also, normal listStatus might 
> suffice because MM code handles directory structure in a more definite manner 
> than old code; so it knows where the files of interest are to be found.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-14953) don't use globStatus on S3 in MM tables

2016-10-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596859#comment-15596859
 ] 

Hive QA commented on HIVE-14953:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12834774/HIVE-14953.01.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1744/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1744/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-1744/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2016-10-22 01:08:14.032
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-1744/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2016-10-22 01:08:14.034
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 6cca991 HIVE-14913 : addendum patch
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 6cca991 HIVE-14913 : addendum patch
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2016-10-22 01:08:14.908
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: common/src/java/org/apache/hadoop/hive/common/ValidWriteIds.java: No 
such file or directory
error: patch failed: 
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:3141
error: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: patch does 
not apply
error: patch failed: 
ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java:85
error: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java: patch does 
not apply
error: patch failed: 
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:1589
error: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java: patch does not 
apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12834774 - PreCommit-HIVE-Build

> don't use globStatus on S3 in MM tables
> ---
>
> Key: HIVE-14953
> URL: https://issues.apache.org/jira/browse/HIVE-14953
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Fix For: hive-14535
>
> Attachments: HIVE-14953.01.patch, HIVE-14953.patch
>
>
> Need to investigate if recursive get is faster. Also, normal listStatus might 
> suffice because MM code handles directory structure in a more definite manner 
> than old code; so it knows where the files of interest are to be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14953) don't use globStatus on S3 in MM tables

2016-10-21 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596046#comment-15596046
 ] 

Sergey Shelukhin commented on HIVE-14953:
-

That only returns files, but we can determine directories from those. I will 
add a configurable option for S3. 

> don't use globStatus on S3 in MM tables
> ---
>
> Key: HIVE-14953
> URL: https://issues.apache.org/jira/browse/HIVE-14953
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Fix For: hive-14535
>
> Attachments: HIVE-14953.patch
>
>
> Need to investigate if recursive get is faster. Also, normal listStatus might 
> suffice because MM code handles directory structure in a more definite manner 
> than old code; so it knows where the files of interest are to be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14953) don't use globStatus on S3 in MM tables

2016-10-20 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15593700#comment-15593700
 ] 

Rajesh Balamohan commented on HIVE-14953:
-

[~sershe] - It should be listFiles(path, recursive). I accidentally added as 
listStatus recursive in my earlier comment.

Default FS: 
https://github.com/apache/hadoop/blob/branch-2.8/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L1814
S3A FS which optimizes for bulk listing: 
https://github.com/apache/hadoop/blob/branch-2.8/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2025


 

> don't use globStatus on S3 in MM tables
> ---
>
> Key: HIVE-14953
> URL: https://issues.apache.org/jira/browse/HIVE-14953
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Fix For: hive-14535
>
> Attachments: HIVE-14953.patch
>
>
> Need to investigate if recursive get is faster. Also, normal listStatus might 
> suffice because MM code handles directory structure in a more definite manner 
> than old code; so it knows where the files of interest are to be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14953) don't use globStatus on S3 in MM tables

2016-10-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15593670#comment-15593670
 ] 

Sergey Shelukhin commented on HIVE-14953:
-

[~rajesh.balamohan] but does it actually do that? I can see the implementation 
of listFiles(path, recursive) being a bunch of local code using 
listLocatedStatus for each located directory. listStatus doesn't have a 
recursive overload that I see

> don't use globStatus on S3 in MM tables
> ---
>
> Key: HIVE-14953
> URL: https://issues.apache.org/jira/browse/HIVE-14953
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Fix For: hive-14535
>
> Attachments: HIVE-14953.patch
>
>
> Need to investigate if recursive get is faster. Also, normal listStatus might 
> suffice because MM code handles directory structure in a more definite manner 
> than old code; so it knows where the files of interest are to be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14953) don't use globStatus on S3 in MM tables

2016-10-20 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15593627#comment-15593627
 ] 

Rajesh Balamohan commented on HIVE-14953:
-

[~sershe] - It was in FileSinkOperator.handleMMTable (getMmDirectoryCandidates) 
specifically. I do not see that codepath in the latest codebase in the branch 
now. globStatus with pattern has to be replaced with {{listStatus(path, boolean 
recursive)}} and any additional filtering pattern has to be applied on client 
side. In cloud storage systems, it would be able to do prefix listing and 
reduce the number of calls significantly as compared to globStatus which 
iterates through the files one at a time in client side.

> don't use globStatus on S3 in MM tables
> ---
>
> Key: HIVE-14953
> URL: https://issues.apache.org/jira/browse/HIVE-14953
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Fix For: hive-14535
>
> Attachments: HIVE-14953.patch
>
>
> Need to investigate if recursive get is faster. Also, normal listStatus might 
> suffice because MM code handles directory structure in a more definite manner 
> than old code; so it knows where the files of interest are to be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14953) don't use globStatus on S3 in MM tables

2016-10-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15593624#comment-15593624
 ] 

Hive QA commented on HIVE-14953:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12834579/HIVE-14953.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1709/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1709/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-1709/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2016-10-21 01:29:29.983
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-1709/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2016-10-21 01:29:29.988
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 1da HIVE-14985 : Remove UDF-s created during test runs 
(Peter Vary, reviewed by Sergey Shelukhin)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 1da HIVE-14985 : Remove UDF-s created during test runs 
(Peter Vary, reviewed by Sergey Shelukhin)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2016-10-21 01:29:31.144
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: 
ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java:3816
error: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java: patch does 
not apply
error: patch failed: 
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:1705
error: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java: patch does not 
apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12834579 - PreCommit-HIVE-Build

> don't use globStatus on S3 in MM tables
> ---
>
> Key: HIVE-14953
> URL: https://issues.apache.org/jira/browse/HIVE-14953
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Fix For: hive-14535
>
> Attachments: HIVE-14953.patch
>
>
> Need to investigate if recursive get is faster. Also, normal listStatus might 
> suffice because MM code handles directory structure in a more definite manner 
> than old code; so it knows where the files of interest are to be found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)