[jira] [Created] (HDDS-1190) Fix jdk 11 issue for ozonesecure base image and docker-compose

2019-02-27 Thread Xiaoyu Yao (JIRA)
Xiaoyu Yao created HDDS-1190:


 Summary: Fix jdk 11 issue for ozonesecure base image and 
docker-compose 
 Key: HDDS-1190
 URL: https://issues.apache.org/jira/browse/HDDS-1190
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao


HDDS-1019 changes to use hadoop-runner as base image for ozonesecure 
docker-compose. There are a few issues that need to fixed.

 

1.The hadoop-runner uses jdk11 but the ozonesecure/docker-config assume 
openjdk8 for JAVA_HOME. 

 

2. The KEYTAB_DIR needs to be quoted with '.

 

3. keytab based login failed with Message stream modified (41), [~elek] 
mentioned in HDDS-1019 that we need to add max_renewable_life to 
"docker-image/docker-krb5/krb5.conf" like follows.
[realms]
 EXAMPLE.COM = \{
  kdc = localhost
  admin_server = localhost
  max_renewable_life = 7d
 }
Failures:

{code}

 org.apache.hadoop.security.KerberosAuthException: failure to login: for 
principal: scm/s...@example.com from keytab /etc/security/keytabs/scm.keytab 
javax.security.auth.login.LoginException: Message stream modified (41)

scm_1           | at 
org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1847)

scm_1           |

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1189) Recon DB schema and ORM

2019-02-27 Thread Siddharth Wagle (JIRA)
Siddharth Wagle created HDDS-1189:
-

 Summary: Recon DB schema and ORM
 Key: HDDS-1189
 URL: https://issues.apache.org/jira/browse/HDDS-1189
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Affects Versions: 0.5.0
Reporter: Siddharth Wagle
Assignee: Aravindan Vijayan
 Fix For: 0.5.0


_Objectives_
- Define V1 of the db schema for recon service
- The current proposal is to use jOOQ as the ORM for SQL interaction. For two 
main reasons: a) powerful DSL for querying that abstracts out SQL dialects, b) 
Allows code to schema and schema to code seamless transition, critical for 
creating DDL through the code and unit testing across versions of the 
application.
- Add e2e unit tests suite for Recon entities, created based on the design doc



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1188) Implement a skeleton patch for Recon server with initial set of interfaces

2019-02-27 Thread Siddharth Wagle (JIRA)
Siddharth Wagle created HDDS-1188:
-

 Summary: Implement a skeleton patch for Recon server with initial 
set of interfaces
 Key: HDDS-1188
 URL: https://issues.apache.org/jira/browse/HDDS-1188
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Affects Versions: 0.5.0
Reporter: Siddharth Wagle
Assignee: Siddharth Wagle
 Fix For: 0.5.0


Jira to define package structure, maven module, recon server application and 
initial db schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-451) PutKey failed due to error "Rejecting write chunk request. Chunk overwrite without explicit request"

2019-02-27 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDDS-451.
--
Resolution: Cannot Reproduce

Resolving as "Cannot Reproduce".

> PutKey failed due to error "Rejecting write chunk request. Chunk overwrite 
> without explicit request"
> 
>
> Key: HDDS-451
> URL: https://issues.apache.org/jira/browse/HDDS-451
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.2.1
>Reporter: Nilotpal Nandi
>Assignee: Shashikant Banerjee
>Priority: Blocker
>  Labels: alpha2
> Attachments: all-node-ozone-logs-1536841590.tar.gz
>
>
> steps taken :
> --
>  # Ran Put Key command to write 50GB data. Put Key client operation failed 
> after 17 mins.
> error seen  ozone.log :
> 
>  
> {code}
> 2018-09-13 12:11:53,734 [ForkJoinPool.commonPool-worker-20] DEBUG 
> (ChunkManagerImpl.java:85) - writing 
> chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_1
>  chunk stage:COMMIT_DATA chunk 
> file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_1
>  tmp chunk file
> 2018-09-13 12:11:56,576 [pool-3-thread-60] DEBUG (ChunkManagerImpl.java:85) - 
> writing 
> chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2
>  chunk stage:WRITE_DATA chunk 
> file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2
>  tmp chunk file
> 2018-09-13 12:11:56,739 [ForkJoinPool.commonPool-worker-20] DEBUG 
> (ChunkManagerImpl.java:85) - writing 
> chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2
>  chunk stage:COMMIT_DATA chunk 
> file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2
>  tmp chunk file
> 2018-09-13 12:12:21,410 [Datanode State Machine Thread - 0] DEBUG 
> (DatanodeStateMachine.java:148) - Executing cycle Number : 206
> 2018-09-13 12:12:51,411 [Datanode State Machine Thread - 0] DEBUG 
> (DatanodeStateMachine.java:148) - Executing cycle Number : 207
> 2018-09-13 12:12:53,525 [BlockDeletingService#1] DEBUG 
> (TopNOrderedContainerDeletionChoosingPolicy.java:79) - Stop looking for next 
> container, there is no pending deletion block contained in remaining 
> containers.
> 2018-09-13 12:12:55,048 [Datanode ReportManager Thread - 1] DEBUG 
> (ContainerSet.java:191) - Starting container report iteration.
> 2018-09-13 12:13:02,626 [pool-3-thread-1] ERROR (ChunkUtils.java:244) - 
> Rejecting write chunk request. Chunk overwrite without explicit request. 
> ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2,
>  offset=0, len=16777216}
> 2018-09-13 12:13:03,035 [pool-3-thread-1] INFO (ContainerUtils.java:149) - 
> Operation: WriteChunk : Trace ID: 54834b29-603d-4ba9-9d68-0885215759d8 : 
> Message: Rejecting write chunk request. OverWrite flag 
> required.ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2,
>  offset=0, len=16777216} : Result: OVERWRITE_FLAG_REQUIRED
> 2018-09-13 12:13:03,037 [ForkJoinPool.commonPool-worker-11] ERROR 
> (ChunkUtils.java:244) - Rejecting write chunk request. Chunk overwrite 
> without explicit request. 
> ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2,
>  offset=0, len=16777216}
> 2018-09-13 12:13:03,037 [ForkJoinPool.commonPool-worker-11] INFO 
> (ContainerUtils.java:149) - Operation: WriteChunk : Trace ID: 
> 54834b29-603d-4ba9-9d68-0885215759d8 : Message: Rejecting write chunk 
> request. OverWrite flag 
> required.ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2,
>  offset=0, len=16777216} : Result: OVERWRITE_FLAG_REQUIRED
>  
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1187) Healthy pipeline Chill Mode rule to consider only pipelines with replication factor three

2019-02-27 Thread Bharat Viswanadham (JIRA)
Bharat Viswanadham created HDDS-1187:


 Summary: Healthy pipeline Chill Mode rule to consider only 
pipelines with replication factor three
 Key: HDDS-1187
 URL: https://issues.apache.org/jira/browse/HDDS-1187
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Bharat Viswanadham
Assignee: Bharat Viswanadham


Few offline comments from [~nandakumar131]
 # We should not process pipeline report from datanode again during 
calculations.
 # We should consider only replication factor 3 ratis pipelines.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1180) TestRandomKeyGenerator fails with NPE

2019-02-27 Thread Bharat Viswanadham (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham resolved HDDS-1180.
--
Resolution: Duplicate

> TestRandomKeyGenerator fails with NPE
> -
>
> Key: HDDS-1180
> URL: https://issues.apache.org/jira/browse/HDDS-1180
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: test
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-966) Rename ChunkGroupInputStream to KeyInputStream and ChunkInputStream to BlockInputStream

2019-02-27 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-966.
--
Resolution: Fixed

> Rename ChunkGroupInputStream to KeyInputStream and ChunkInputStream to 
> BlockInputStream
> ---
>
> Key: HDDS-966
> URL: https://issues.apache.org/jira/browse/HDDS-966
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
>
> ChunkGroupInputStream reads the data for a ozone key and ChunkInputStream 
> reads data for a block. It would be more appropriate to rename 
> ChunkGroupInputStream to KeyInputStream and ChunkInputStream to 
> BlockInputStream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1140) TestSCMChillModeManager is failing with NullPointerException

2019-02-27 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain resolved HDDS-1140.
---
Resolution: Duplicate

> TestSCMChillModeManager is failing with NullPointerException
> 
>
> Key: HDDS-1140
> URL: https://issues.apache.org/jira/browse/HDDS-1140
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Lokesh Jain
>Priority: Major
>
> TestSCMChillModeManager is failing with the following exception
> {code}
> [ERROR] 
> testDisableChillMode(org.apache.hadoop.hdds.scm.chillmode.TestSCMChillModeManager)
>   Time elapsed: 0.012 s  <<< ERROR!
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.utils.Scheduler.scheduleWithFixedDelay(Scheduler.java:78)
>   at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils.scheduleFixedIntervalPipelineCreator(RatisPipelineUtils.java:211)
>   at 
> org.apache.hadoop.hdds.scm.chillmode.SCMChillModeManager.exitChillMode(SCMChillModeManager.java:137)
>   at 
> org.apache.hadoop.hdds.scm.chillmode.SCMChillModeManager.(SCMChillModeManager.java:93)
>   at 
> org.apache.hadoop.hdds.scm.chillmode.TestSCMChillModeManager.testDisableChillMode(TestSCMChillModeManager.java:134)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-274) Handle overreplication in ReplicationManager

2019-02-27 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton resolved HDDS-274.
---
Resolution: Duplicate

Thanks to [~nandakumar131] it's implemented in HDDS-896.

> Handle overreplication in ReplicationManager
> 
>
> Key: HDDS-274
> URL: https://issues.apache.org/jira/browse/HDDS-274
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
>
> HDDS-199 provides the framework to handle over/under replicated containers, 
> but it contains implementation only for the under replicated containers.
> The over replicated containers should be handled and should be deleted from 
> the datanodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2019-02-27 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/245/

No changes




-1 overall


The following subsystems voted -1:
asflicense findbugs hadolint pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   hadoop-build-tools/src/main/resources/checkstyle/checkstyle.xml 
   hadoop-build-tools/src/main/resources/checkstyle/suppressions.xml 
   
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/empty-configuration.xml
 
   hadoop-tools/hadoop-azure/src/config/checkstyle.xml 
   hadoop-tools/hadoop-resourceestimator/src/config/checkstyle.xml 
   hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/public/crossdomain.xml 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml
 

FindBugs :

   module:hadoop-common-project/hadoop-common 
   Class org.apache.hadoop.fs.GlobalStorageStatistics defines non-transient 
non-serializable instance field map In GlobalStorageStatistics.java:instance 
field map In GlobalStorageStatistics.java 

FindBugs :

   module:hadoop-hdfs-project/hadoop-hdfs 
   Dead store to state in 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Saver.save(OutputStream,
 INodeSymlink) At 
FSImageFormatPBINode.java:org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Saver.save(OutputStream,
 INodeSymlink) At FSImageFormatPBINode.java:[line 623] 

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client
 
   Boxed value is unboxed and then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:[line 335] 

Failed junit tests :

   hadoop.util.TestBasicDiskValidator 
   hadoop.util.TestDiskCheckerWithDiskIo 
   hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys 
   hadoop.registry.secure.TestSecureLogins 
   hadoop.yarn.server.TestContainerManagerSecurity 
   hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/245/artifact/out/diff-compile-cc-root-jdk1.7.0_95.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/245/artifact/out/diff-compile-javac-root-jdk1.7.0_95.txt
  [328K]

   cc:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/245/artifact/out/diff-compile-cc-root-jdk1.8.0_191.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/245/artifact/out/diff-compile-javac-root-jdk1.8.0_191.txt
  [308K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/245/artifact/out/diff-checkstyle-root.txt
  [16M]

   hadolint:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/245/artifact/out/diff-patch-hadolint.txt
  [4.0K]

   pathlen:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/245/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/245/artifact/out/diff-patch-pylint.txt
  [24K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/245/artifact/out/diff-patch-shellcheck.txt
  [72K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/245/artifact/out/diff-patch-shelldocs.txt
  [8.0K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/245/artifact/out/whitespace-eol.txt
  [12M]
   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/245/artifact/out/whitespace-tabs.txt
  [1.2M]

   xml:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/245/artifact/out/xml.txt
  [20K]

   findbugs:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/245/artifact/out/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html
  [8.0K]
   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/245/artifact/out/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
  [8.0K]
   

[jira] [Created] (HDDS-1186) Ozone S3 gateways

2019-02-27 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1186:
--

 Summary: Ozone S3 gateways
 Key: HDDS-1186
 URL: https://issues.apache.org/jira/browse/HDDS-1186
 Project: Hadoop Distributed Data Store
  Issue Type: Task
  Components: S3
Reporter: Elek, Marton


S3 compatible rest gateway is implemented in HDDS-434 for 0.3.0 

With the second phase (HDDS-763) multi part upload and other improvements are 
added (for release 0.4.0)

I open this jira to collect all the open tasks to improve the s3 gateway for 
0.5.0 release.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1019) Use apache/hadoop-runner image to test ozone secure cluster

2019-02-27 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton resolved HDDS-1019.

   Resolution: Fixed
Fix Version/s: 0.4.0

Thanks a lot to fix this [~xyao]. I also committed the trunk part.

I will create a separated jira to improve the kdc image (the current one issue 
tickets with 0 renewal which is not accepted by jdk 11) 

> Use apache/hadoop-runner image to test ozone secure cluster
> ---
>
> Key: HDDS-1019
> URL: https://issues.apache.org/jira/browse/HDDS-1019
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Elek, Marton
>Assignee: Xiaoyu Yao
>Priority: Critical
> Fix For: 0.4.0
>
> Attachments: HDDS-1019-docker-hadoop-runner.01.patch, 
> HDDS-1019-docker-hadoop-runner.02.patch, 
> HDDS-1019-docker-hadoop-runner.03.patch, HDDS-1019-trunk.01.patch, 
> HDDS-1019-trunk.02.patch, HDDS-1019-trunk.03.patch
>
>
> As of now the secure ozone cluster uses a custom image which is not based on 
> the apache/hadoop-runner image. There are multiple problems with that:
>  1. multiple script files which are maintained in the docker-hadoop-runner 
> branch are copied and duplicated in 
> hadoop-ozone/dist/src/main/compose/ozonesecure/docker-image/runner/scripts
>  2. The user of the image is root. It creates 
> core-site.xml/hdfs-site.xml/ozone-site.xml which root user which prevents to 
> run all the default smoke tests
>  3. To build the base image with each build takes more time
> I propose to check what is missing from the apache/hadoop-ozone base image, 
> add it and use that one. 
> I marked it critical because 2): it breaks the run of the the acceptance test 
> suit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1083) Improve error code when SCM fails to allocate block

2019-02-27 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton resolved HDDS-1083.

Resolution: Duplicate

Agree. I think it's fixed in HDDS-1068.

[~msingh]: Please reopen it if you still see the issue.

> Improve error code when SCM fails to allocate block
> ---
>
> Key: HDDS-1083
> URL: https://issues.apache.org/jira/browse/HDDS-1083
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Priority: Major
> Fix For: 0.4.0
>
>
> The following error, KEY_ALLOCATION_ERROR doesn't display information around 
> the number of replica's. Also the error isn't detailed about whether no 
> pipelines were found or there wasn't enough space on the datanode to create 
> the containers.
> {code}
> 019-02-11 14:56:12 ERROR RandomKeyGenerator:621 - Exception while adding key: 
> key-0-91322 in bucket: org.apache.hadoop.ozone.client.OzoneBucket@24ef95df of 
> volume: org.apache.hadoop.ozone.client.OzoneVolume@6473c338.
> java.io.IOException: Create key failed, error:KEY_ALLOCATION_ERROR
>   at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.openKey(OzoneManagerProtocolClientSideTranslatorPB.java:692)
>   at 
> org.apache.hadoop.ozone.client.rpc.RpcClient.createKey(RpcClient.java:571)
>   at 
> org.apache.hadoop.ozone.client.OzoneBucket.createKey(OzoneBucket.java:274)
>   at 
> org.apache.hadoop.ozone.freon.RandomKeyGenerator$OfflineProcessor.run(RandomKeyGenerator.java:596)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[Ozone][status] 2019.02.27

2019-02-27 Thread Elek, Marton


Hi all,

I didn't write any summary about the recent community calls. Mainly
because on the last few calls developers usually just explained the
current status / progress of ozone related developments (Security, HA,
Tracing, etc.) There was no big news, all the information is available
from the related Jiras but the calls helped to share the current
status/latest findings.


The current summary from the last calls:

* 0.4.0 (badlands) release is closer and closer. Security work is almost
done (HDDS-4) and stability is also highly improved. Expected to have
the release during March. Ajay Kumar is working as the release manager.

* HA works are continued with adopting Apache Ratis based statemachine
in OM (target release is 0.5.0)

* Recon service (~fsck server) work has been started. Design doc just
uploaded to the HDDS-1084.

* There are serious reliability testing with jepsen-like tests (using
blockade) and down-stream applications (eg. Hive). Huge work has been
done on the reliability side

* Thanks to our Hive friends we adopted the async profiler endpoint from
Hive (HIVE-20202)

* Fist version of distributed tracing is working (some tricky context
propagation inside Apache Ratis is missing). First results showed some
client side problems with buffer allocations.

* Wiki page about profiler/tracing will be created soon, and they may be
demonstrated at the next call.

* We have serious build problems. New Jenkinsfile based job is enabled
for Hadoop without proper Ozone support and all the ozone PRs are
commented with false positive findings. Fix is in progress (See comments
in HADOOP-16035)

Closing with the usual lines:

Ozone community calls are 100% open weekly calls [1]. Feel free to join
if you have any questions/suggestions/concerns.

Thanks,
Marton

[1]:
https://cwiki.apache.org/confluence/display/HADOOP/Ozone+Community+Calls

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1185) Optimize GetFileStatus in OzoneFileSystem by reducing the number of rpc call to OM.

2019-02-27 Thread Mukul Kumar Singh (JIRA)
Mukul Kumar Singh created HDDS-1185:
---

 Summary: Optimize GetFileStatus in OzoneFileSystem by reducing the 
number of rpc call to OM.
 Key: HDDS-1185
 URL: https://issues.apache.org/jira/browse/HDDS-1185
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
  Components: Ozone Filesystem
Affects Versions: 0.4.0
Reporter: Mukul Kumar Singh
Assignee: Mukul Kumar Singh
 Fix For: 0.4.0
 Attachments: HDDS-1185.001.patch

GetFileStatus sends multiple rpc calls to Ozone Manager to fetch the file 
status for a given file. This can be optimized by performing all the processing 
on the OzoneManager for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1184) Parallelization of write chunks in datanodes is broken

2019-02-27 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created HDDS-1184:
-

 Summary: Parallelization of write chunks in datanodes is broken 
 Key: HDDS-1184
 URL: https://issues.apache.org/jira/browse/HDDS-1184
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Datanode
Affects Versions: 0.4.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.4.0


After the HDDS-4 branch merge, parallelization in write chunks and 
applyTransaction is broken.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1124) java.lang.IllegalStateException exception in datanode log

2019-02-27 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-1124.
---
   Resolution: Not A Problem
Fix Version/s: 0.5.0

> java.lang.IllegalStateException exception in datanode log
> -
>
> Key: HDDS-1124
> URL: https://issues.apache.org/jira/browse/HDDS-1124
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
>
> steps taken :
> 
>  # created 12 datanodes cluster and running workload on all the nodes
> exception seen :
> ---
>  
> {noformat}
> 2019-02-15 10:15:53,355 INFO org.apache.ratis.server.storage.RaftLogWorker: 
> 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: Rolled log segment from 
> /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3036
>  to 
> /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_3036-3047
> 2019-02-15 10:15:53,367 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 943007c8-4fdd-4926-89e2-2c8c52c05073: set configuration 3048: 
> [a40a7b01-a30b-469c-b373-9fcb20a126ed:172.27.54.212:9858, 
> 8c77b16b-8054-49e3-b669-1ff759cfd271:172.27.23.196:9858, 
> 943007c8-4fdd-4926-89e2-2c8c52c05073:172.27.76.72:9858], old=null at 3048
> 2019-02-15 10:15:53,523 INFO org.apache.ratis.server.storage.RaftLogWorker: 
> 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: created new log segment 
> /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3048
> 2019-02-15 10:15:53,580 ERROR org.apache.ratis.grpc.server.GrpcLogAppender: 
> Failed onNext serverReply {
>  requestorId: "943007c8-4fdd-4926-89e2-2c8c52c05073"
>  replyId: "a40a7b01-a30b-469c-b373-9fcb20a126ed"
>  raftGroupId {
>  id: "\001\323\357*\221,O\300\200\266\001#C\327j\333"
>  }
>  success: true
> }
> term: 3
> nextIndex: 3049
> followerCommit: 3047
> java.lang.IllegalStateException: reply's next index is 3049, request's 
> previous is term: 1
> index: 3047
> at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60)
>  at 
> org.apache.ratis.grpc.server.GrpcLogAppender.onSuccess(GrpcLogAppender.java:285)
>  at 
> org.apache.ratis.grpc.server.GrpcLogAppender$AppendLogResponseHandler.onNextImpl(GrpcLogAppender.java:230)
>  at 
> org.apache.ratis.grpc.server.GrpcLogAppender$AppendLogResponseHandler.onNext(GrpcLogAppender.java:215)
>  at 
> org.apache.ratis.grpc.server.GrpcLogAppender$AppendLogResponseHandler.onNext(GrpcLogAppender.java:197)
>  at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:421)
>  at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
>  at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:519)
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-02-15 10:15:56,442 INFO org.apache.ratis.server.storage.RaftLogWorker: 
> 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: Rolling segment 
> log-3048_3066 to index:3066
> 2019-02-15 10:15:56,442 INFO org.apache.ratis.server.storage.RaftLogWorker: 
> 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: Rolled log segment from 
> /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3048
>  to 
> /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_3048-3066
> 2019-02-15 10:15:56,564 INFO org.apache.ratis.server.storage.RaftLogWorker: 
> 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: created new log segment 
> /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3067
> 2019-02-15 10:16:45,420 INFO org.apache.ratis.server.storage.RaftLogWorker: 
> 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker: Rolling segment 
> log-3067_3077 to index:3077
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Resolved] (HDDS-1125) java.lang.InterruptedException seen in datanode logs

2019-02-27 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-1125.
---
   Resolution: Not A Problem
Fix Version/s: 0.4.0

> java.lang.InterruptedException seen in datanode logs
> 
>
> Key: HDDS-1125
> URL: https://issues.apache.org/jira/browse/HDDS-1125
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
>
> steps taken :
> 
>  # created 12 datanodes cluster and running workload on all the nodes
>  
> exception seen :
> -
>  
> {noformat}
> 2019-02-15 10:16:48,713 ERROR org.apache.ratis.server.impl.LogAppender: 
> 943007c8-4fdd-4926-89e2-2c8c52c05073: Failed readStateMachineData for (t:3, 
> i:3084), STATEMACHINELOGENTRY, client-632E77ADA885, cid=6232
> java.lang.InterruptedException
>  at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:347)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
>  at 
> org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:433)
>  at org.apache.ratis.util.DataQueue.pollList(DataQueue.java:133)
>  at 
> org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:171)
>  at 
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)
>  at 
> org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)
>  at org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:101)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-02-15 10:16:48,714 ERROR org.apache.ratis.server.impl.LogAppender: 
> GrpcLogAppender(943007c8-4fdd-4926-89e2-2c8c52c05073 -> 
> 8c77b16b-8054-49e3-b669-1ff759cfd271) hit IOException while loading raft log
> org.apache.ratis.server.storage.RaftLogIOException: 
> 943007c8-4fdd-4926-89e2-2c8c52c05073: Failed readStateMachineData for (t:3, 
> i:3084), STATEMACHINELOGENTRY, client-632E77ADA885, cid=6232
>  at 
> org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:440)
>  at org.apache.ratis.util.DataQueue.pollList(DataQueue.java:133)
>  at 
> org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:171)
>  at 
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)
>  at 
> org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)
>  at org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:101)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.InterruptedException
>  at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:347)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
>  at 
> org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:433)
>  ... 6 more
> 2019-02-15 10:16:48,715 ERROR org.apache.ratis.server.impl.LogAppender: 
> 943007c8-4fdd-4926-89e2-2c8c52c05073: Failed readStateMachineData for (t:3, 
> i:3084), STATEMACHINELOGENTRY, client-632E77ADA885, cid=6232
> java.lang.InterruptedException
>  at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:347)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
>  at 
> org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:433)
>  at org.apache.ratis.util.DataQueue.pollList(DataQueue.java:133)
>  at 
> org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:171)
>  at 
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)
>  at 
> org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)
>  at org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:101)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-02-15 10:16:48,715 ERROR org.apache.ratis.server.impl.LogAppender: 
> GrpcLogAppender(943007c8-4fdd-4926-89e2-2c8c52c05073 -> 
> a40a7b01-a30b-469c-b373-9fcb20a126ed) hit IOException while loading raft log
> org.apache.ratis.server.storage.RaftLogIOException: 
> 943007c8-4fdd-4926-89e2-2c8c52c05073: Failed readStateMachineData for (t:3, 
> i:3084), STATEMACHINELOGENTRY, client-632E77ADA885, cid=6232
>  at 
> org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:440)
>  at org.apache.ratis.util.DataQueue.pollList(DataQueue.java:133)
>  at 
> org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:171)
>  at 
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)
>  at 
> org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)
>  at 

[jira] [Resolved] (HDDS-582) Remove ChunkOutputStreamEntry class from ChunkGroupOutputStream

2019-02-27 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-582.
--
   Resolution: Won't Do
Fix Version/s: 0.5.0

> Remove ChunkOutputStreamEntry class from ChunkGroupOutputStream
> ---
>
> Key: HDDS-582
> URL: https://issues.apache.org/jira/browse/HDDS-582
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
>
> ChunkOutPutStreamEntry holds the info for the blocks which needs to be 
> wriiten down to Datanodes. This info can also be held in KeyLocationInfo list 
> which will be used to updated the OM once the stream closes. It does not 
> serve much purpose here and hence can be removed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org