[jira] [Created] (HDDS-2178) Support Ozone insight tool in secure cluster

2019-09-25 Thread Elek, Marton (Jira)
Elek, Marton created HDDS-2178:
--

 Summary: Support Ozone insight tool in secure cluster
 Key: HDDS-2178
 URL: https://issues.apache.org/jira/browse/HDDS-2178
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Elek, Marton


SPNEGO should initialized properly for the HTTP requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-2167) Hadoop31-mr acceptance test is failing due to the shading

2019-09-23 Thread Elek, Marton (Jira)
Elek, Marton created HDDS-2167:
--

 Summary: Hadoop31-mr acceptance test is failing due to the shading
 Key: HDDS-2167
 URL: https://issues.apache.org/jira/browse/HDDS-2167
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Elek, Marton
Assignee: Elek, Marton


>From the daily build:

{code}
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/hadoop/ozone/shaded/org/apache/http/client/utils/URIBuilder
at 
org.apache.hadoop.fs.ozone.BasicOzoneFileSystem.initialize(BasicOzoneFileSystem.java:138)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:325)
at 
org.apache.hadoop.fs.shell.CommandWithDestination.getRemoteDestination(CommandWithDestination.java:195)
at 
org.apache.hadoop.fs.shell.CopyCommands$Put.processOptions(CopyCommands.java:259)
at org.apache.hadoop.fs.shell.Command.run(Command.java:175)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:328)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:391)
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.ozone.shaded.org.apache.http.client.utils.URIBuilder
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 15 more
{code}

It can be reproduced locally with executing the tests:

{code}
cd hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/ozone-mr/hadoop31
./test.sh
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-2166) Some RPC metrics are missing from SCM prometheus endpoint

2019-09-23 Thread Elek, Marton (Jira)
Elek, Marton created HDDS-2166:
--

 Summary: Some RPC metrics are missing from SCM prometheus endpoint
 Key: HDDS-2166
 URL: https://issues.apache.org/jira/browse/HDDS-2166
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Elek, Marton
Assignee: Elek, Marton


In Hadoop metrics it's possible to register multiple metrics with the same name 
but with different tags. For example each RpcServere has an own metrics 
instance in SCM.

{code}
"name" : 
"Hadoop:service=StorageContainerManager,name=RpcActivityForPort9860",
"name" : 
"Hadoop:service=StorageContainerManager,name=RpcActivityForPort9863",
{code}

They are converted by PrometheusSink to a prometheus metric line with proper 
name and tags. For example:

{code}
rpc_rpc_queue_time60s_num_ops{port="9860",servername="StorageContainerLocationProtocolService",context="rpc",hostname="72736061cbc5"}
 0
{code}

The PrometheusSink uses a Map to cache all the recent values but unfortunately 
the key contains only the name (rpc_rpc_queue_time60s_num_ops in our example) 
but not the tags (port=...)

For this reason if there are multiple metrics with the same name, only the 
first one will be displayed.

As a result in SCM only the metrics of the first RPC server can be exported to 
the prometheus endpoint. 




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2043) "VOLUME_NOT_FOUND" exception thrown while listing volumes

2019-09-19 Thread Elek, Marton (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton resolved HDDS-2043.

Resolution: Duplicate

Tested and worked well. HDDS-1926 fixed the same problem IMHO.

> "VOLUME_NOT_FOUND" exception thrown while listing volumes
> -
>
> Key: HDDS-2043
> URL: https://issues.apache.org/jira/browse/HDDS-2043
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone CLI, Ozone Manager
>Reporter: Nilotpal Nandi
>Assignee: Bharat Viswanadham
>Priority: Blocker
>
> ozone list volume command throws OMException
> bin/ozone sh volume list --user root
>  VOLUME_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Volume 
> info not found for vol-test-putfile-1566902803
>  
> On enabling DEBUG log , here is the console output :
>  
>  
> {noformat}
> bin/ozone sh volume create /n1 ; echo $?
> 2019-08-27 11:47:16 DEBUG ThriftSenderFactory:33 - Using the UDP Sender to 
> send spans to the agent.
> 2019-08-27 11:47:16 DEBUG SenderResolver:86 - Using sender UdpSender()
> 2019-08-27 11:47:16 DEBUG MutableMetricsFactory:43 - field 
> org.apache.hadoop.metrics2.lib.MutableRate 
> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with 
> annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, 
> always=false, valueName=Time, about=, interval=10, type=DEFAULT, value=[Rate 
> of successful kerberos logins and latency (milliseconds)])
> 2019-08-27 11:47:16 DEBUG MutableMetricsFactory:43 - field 
> org.apache.hadoop.metrics2.lib.MutableRate 
> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with 
> annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, 
> always=false, valueName=Time, about=, interval=10, type=DEFAULT, value=[Rate 
> of failed kerberos logins and latency (milliseconds)])
> 2019-08-27 11:47:16 DEBUG MutableMetricsFactory:43 - field 
> org.apache.hadoop.metrics2.lib.MutableRate 
> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with 
> annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, 
> always=false, valueName=Time, about=, interval=10, type=DEFAULT, 
> value=[GetGroups])
> 2019-08-27 11:47:16 DEBUG MutableMetricsFactory:43 - field private 
> org.apache.hadoop.metrics2.lib.MutableGaugeLong 
> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailuresTotal
>  with annotation 
> @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, always=false, 
> valueName=Time, about=, interval=10, type=DEFAULT, value=[Renewal failures 
> since startup])
> 2019-08-27 11:47:16 DEBUG MutableMetricsFactory:43 - field private 
> org.apache.hadoop.metrics2.lib.MutableGaugeInt 
> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailures 
> with annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, 
> always=false, valueName=Time, about=, interval=10, type=DEFAULT, 
> value=[Renewal failures since last successful login])
> 2019-08-27 11:47:16 DEBUG MetricsSystemImpl:231 - UgiMetrics, User and group 
> related metrics
> 2019-08-27 11:47:16 DEBUG SecurityUtil:124 - Setting 
> hadoop.security.token.service.use_ip to true
> 2019-08-27 11:47:16 DEBUG Shell:821 - setsid exited with exit code 0
> 2019-08-27 11:47:16 DEBUG Groups:449 - Creating new Groups object
> 2019-08-27 11:47:16 DEBUG Groups:151 - Group mapping 
> impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; 
> cacheTimeout=30; warningDeltaMs=5000
> 2019-08-27 11:47:16 DEBUG UserGroupInformation:254 - hadoop login
> 2019-08-27 11:47:16 DEBUG UserGroupInformation:187 - hadoop login commit
> 2019-08-27 11:47:16 DEBUG UserGroupInformation:215 - using local 
> user:UnixPrincipal: root
> 2019-08-27 11:47:16 DEBUG UserGroupInformation:221 - Using user: 
> "UnixPrincipal: root" with name root
> 2019-08-27 11:47:16 DEBUG UserGroupInformation:235 - User entry: "root"
> 2019-08-27 11:47:16 DEBUG UserGroupInformation:766 - UGI loginUser:root 
> (auth:SIMPLE)
> 2019-08-27 11:47:16 DEBUG OzoneClientFactory:287 - Using 
> org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol.
> 2019-08-27 11:47:16 DEBUG Server:280 - rpcKind=RPC_PROTOCOL_BUFFER, 
> rpcRequestWrapperClass=class 
> org.apache.hadoop.ipc.ProtobufRpcEngine$RpcProtobufRequest, 
> rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@710f4dc7
> 2019-08-27 11:47:16 DEBUG Client:63 - getting client out of cache: 
> org.apache.hadoop.ipc.Client@24313fcc
> 2019-08-27 11:47:16 DEBUG Client:487 - The ping interval is 6 ms.
> 2019-08-27 11:47:16 DEBUG Client:785 - Connecting to 
> nnandi-1.gce.cloudera.com/172.31.117.213:9862
> 2019-08-27 11:47:16 DEBUG Client:1064 - IPC Client (580871917) connection to 
> 

[jira] [Created] (HDDS-2154) Fix Checkstyle issues

2019-09-19 Thread Elek, Marton (Jira)
Elek, Marton created HDDS-2154:
--

 Summary: Fix Checkstyle issues
 Key: HDDS-2154
 URL: https://issues.apache.org/jira/browse/HDDS-2154
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Elek, Marton


Unfortunately checkstyle checks didn't work well from HDDS-2106 to HDDS-2119. 

This patch fixes all the issues which are accidentally merged in the mean time. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-2131) Optimize replication type and creation time calculation in S3 MPU list call

2019-09-13 Thread Elek, Marton (Jira)
Elek, Marton created HDDS-2131:
--

 Summary: Optimize replication type and creation time calculation 
in S3 MPU list call
 Key: HDDS-2131
 URL: https://issues.apache.org/jira/browse/HDDS-2131
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Elek, Marton


Based on the review from [~bharatviswa]:

{code}
 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
  metadataManager.getOpenKeyTable();

  OmKeyInfo omKeyInfo =
  openKeyTable.get(upload.getDbKey());
{code}

{quote}Here we are reading openKeyTable only for getting creation time. If we 
can have this information in omMultipartKeyInfo, we could avoid DB calls for 
openKeyTable.

To do this, We can set creationTime in OmMultipartKeyInfo during 
initiateMultipartUpload . In this way, we can get all the required information 
from the MultipartKeyInfo table.

And also StorageClass is missing from the returned OmMultipartUpload, as 
listMultipartUploads shows StorageClass information. For this, if we can return 
replicationType and depending on this value, we can set StorageClass in the 
listMultipartUploads Response.
{quote}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-2130) Add pagniation support to the S3 ListMPU call

2019-09-13 Thread Elek, Marton (Jira)
Elek, Marton created HDDS-2130:
--

 Summary: Add pagniation support to the S3 ListMPU call
 Key: HDDS-2130
 URL: https://issues.apache.org/jira/browse/HDDS-2130
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Elek, Marton


HDDS-1054 introduced a simple implementation for the AWS S3 
ListMultipartUploads REST call.

However the pagination support (key-marker, max-uploads, upload-id-marker...) 
are missing. We should implement them in this jira.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-2120) Remove hadoop classes from ozonefs-current jar

2019-09-12 Thread Elek, Marton (Jira)
Elek, Marton created HDDS-2120:
--

 Summary: Remove hadoop classes from ozonefs-current jar
 Key: HDDS-2120
 URL: https://issues.apache.org/jira/browse/HDDS-2120
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Elek, Marton


We have two kind of ozone file system jars: current and legacy. current is 
designed to work only with exactly the same hadoop version which is used for 
compilation (3.2 as of now).

But as of now the hadoop classes are included in the current jar which is not 
necessary as the jar is expected to be used in an environment where  the hadoop 
classes (exactly the same hadoop classes) are already there. They can be 
excluded.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-2106) Avoid usage of hadoop projects as parent of hdds/ozone

2019-09-09 Thread Elek, Marton (Jira)
Elek, Marton created HDDS-2106:
--

 Summary: Avoid usage of hadoop projects as parent of hdds/ozone
 Key: HDDS-2106
 URL: https://issues.apache.org/jira/browse/HDDS-2106
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Elek, Marton


Ozone uses hadoop as a dependency. The dependency defined on multiple level:

 1. the hadoop artifacts are defined in the  sections
 2. both hadoop-ozone and hadoop-hdds projects uses "hadoop-project" as the 
parent

As we already have a slightly different assembly process it could be more 
resilient to use a dedicated parent project instead of the hadoop one. With 
this approach it will be easier to upgrade the versions as we don't need to be 
careful about the pom contents only about the used dependencies.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent

2019-09-02 Thread Elek, Marton (Jira)
Elek, Marton created HDDS-2075:
--

 Summary: Tracing in OzoneManager call is propagated with wrong 
parent
 Key: HDDS-2075
 URL: https://issues.apache.org/jira/browse/HDDS-2075
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Elek, Marton


As you can see in the attached screenshot the OzoneManager.createBucket (server 
side) tracing information is the children of the freon.createBucket instead of 
the freon OzoneManagerProtocolPB.submitRequest.

To avoid confusion the hierarchy should be fixed (Most probably we generate the 
child span AFTER we already serialized the parent one to the message) 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-2074) Use annotations to define description/filter/required filters of an InsightPoint

2019-09-02 Thread Elek, Marton (Jira)
Elek, Marton created HDDS-2074:
--

 Summary: Use annotations to define description/filter/required 
filters of an InsightPoint
 Key: HDDS-2074
 URL: https://issues.apache.org/jira/browse/HDDS-2074
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Elek, Marton


InsightPoint interface defined the getDescription method to define the human 
readable description of the Insight point.

To have better separation between the provided log/metrics/config information 
and the metadata, it would be better to use an annotation for this which also 
can include the filters (HDDS-2071).

Something like this:

{code}
@InsightPoint(description="Information from the async event queue of the SCM", 
supportedFilters=["eventType"],requiredFilters="")
public class EventQueueInsight extends BaseInsightPoint {

...

}
{code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-2073) Make SCMSecurityProtocol message based

2019-09-02 Thread Elek, Marton (Jira)
Elek, Marton created HDDS-2073:
--

 Summary: Make SCMSecurityProtocol message based
 Key: HDDS-2073
 URL: https://issues.apache.org/jira/browse/HDDS-2073
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
  Components: SCM
Reporter: Elek, Marton


We started to use a generic pattern where we have only one method in the grpc 
service and the main message contains all the required common information (eg. 
tracing).

StorageContainerLocationProtocolService is not yet migrated to this approach. 
To make our generic debug tool more powerful and unify our protocols I suggest 
to transform this protocol as well.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-2072) Make StorageContainerLocationProtocolService message based

2019-09-02 Thread Elek, Marton (Jira)
Elek, Marton created HDDS-2072:
--

 Summary: Make StorageContainerLocationProtocolService message based
 Key: HDDS-2072
 URL: https://issues.apache.org/jira/browse/HDDS-2072
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
  Components: SCM
Reporter: Elek, Marton


We started to use a generic pattern where we have only one method in the grpc 
service and the main message contains all the required common information (eg. 
tracing).

StorageContainerDatanodeProtocolService is not yet migrated to this approach. 
To make our generic debug tool more powerful and unify our protocols I suggest 
to transform this protocol as well.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-2071) Support filters in ozone insight point

2019-09-02 Thread Elek, Marton (Jira)
Elek, Marton created HDDS-2071:
--

 Summary: Support filters in ozone insight point
 Key: HDDS-2071
 URL: https://issues.apache.org/jira/browse/HDDS-2071
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Elek, Marton


With Ozone insight we can print out all the logs / metrics of one specific 
component s (eg. scm.node-manager or scm.node-manager).

It would be great to support additional filtering capabilities where the output 
is filtered based on specific keys.

For example to print out all of the logs related to one datanode or related to 
one type of RPC request.

Filter should be a key value map (eg. --filter 
datanode=sjdhfhf,rpc=createChunk) which can be defined in the ozone insight CLI.

As we have no option to add additional tags to the logs (it may be supported by 
log4j2 but not with slf4k), the first implementation can be implemented by 
pattern matching.

For example in SCMNodeManager.processNodeReport contains trace/debug logs which 
includes the " [datanode={}]" part. This formatting convention can be used to 
print out the only the related information. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-2070) Create insight point to debug one specific pipeline

2019-09-02 Thread Elek, Marton (Jira)
Elek, Marton created HDDS-2070:
--

 Summary: Create insight point to debug one specific pipeline
 Key: HDDS-2070
 URL: https://issues.apache.org/jira/browse/HDDS-2070
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Elek, Marton


During the first demo of ozone insight tool we had a demo insight point to 
debug Ratis pipelines. It was not stable enough to include in the first patch 
but here we can add it.

The goal is to implement a new insight point (eg. datanode.pipeline) which can 
show information about one pipeline.

It can be done with retrieving the hosts of the pipeline and generate the 
loggers metrics (InsightPoint.getRelatedLoggers and InsightPoint.getMetrics) 
based on the pipeline information (same loggers should be displayed from all 
the three datanodes.

The pipeline id can be defined as a filter parameter which (in this case) 
should be required.
 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-2068) Make StorageContainerDatanodeProtocolService message based

2019-09-02 Thread Elek, Marton (Jira)
Elek, Marton created HDDS-2068:
--

 Summary: Make StorageContainerDatanodeProtocolService message based
 Key: HDDS-2068
 URL: https://issues.apache.org/jira/browse/HDDS-2068
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
  Components: SCM
Reporter: Elek, Marton


We started to use a generic pattern where we have only one method in the grpc 
service and the main message contains all the required common information (eg. 
tracing).

StorageContainerDatanodeProtocolService is not yet migrated to this approach. 
To make our generic debug tool more powerful and unify our protocols I suggest 
to transform this protocol as well.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-2067) Create generic service facade with tracing/metrics/logging support

2019-09-02 Thread Elek, Marton (Jira)
Elek, Marton created HDDS-2067:
--

 Summary: Create generic service facade with 
tracing/metrics/logging support
 Key: HDDS-2067
 URL: https://issues.apache.org/jira/browse/HDDS-2067
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Elek, Marton


We started to use a message based GRPC approach. Wen have only one method and 
the requests are routed based on a "type" field in the proto message. 

For example in OM protocol:

{code}
/**
 The OM service that takes care of Ozone namespace.
*/
service OzoneManagerService {
// A client-to-OM RPC to send client requests to OM Ratis server
rpc submitRequest(OMRequest)
  returns(OMResponse);
}
{code}

And 

{code}

message OMRequest {
  required Type cmdType = 1; // Type of the command

...
{code}

This approach makes it possible to use the same code to process incoming 
messages in the server side.

ScmBlockLocationProtocolServerSideTranslatorPB.send method contains the logic 
of:

 * Logging the request/response message (can be displayed with ozone insight)
 * Updated metrics
 * Handle open tracing context propagation.


These functions are generic. For example 
OzoneManagerProtocolServerSideTranslatorPB use the same (=similar) code.

The goal in this jira is to provide a generic utility and move the common code 
for tracing/request logging/response logging/metrics calculation to a common 
utility which can be used from all the ServerSide translators.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-2066) Improve the observability inside Ozone

2019-09-02 Thread Elek, Marton (Jira)
Elek, Marton created HDDS-2066:
--

 Summary: Improve the observability inside Ozone
 Key: HDDS-2066
 URL: https://issues.apache.org/jira/browse/HDDS-2066
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Tools
Reporter: Elek, Marton
Assignee: Elek, Marton


To improve the observability is a key requirement to achieve better correctness 
and performance with Ozone.

This jira collects some of the tasks which can provide better visibility to the 
ozone internals.

We have two main tools:

 * Distributed tracing (opentracing) can help to detected performance 
battlenecks
 * Ozone insight tool (a simple cli frontend for Hadoop metrics and log4j 
logging) can help to get better understanding about the current state/behavior 
of specific components.

Both of them can be improved to make it more powerful.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-2060) Create Ozone specific LICENSE file for the Ozone source package

2019-08-30 Thread Elek, Marton (Jira)
Elek, Marton created HDDS-2060:
--

 Summary: Create Ozone specific LICENSE file for the Ozone source 
package
 Key: HDDS-2060
 URL: https://issues.apache.org/jira/browse/HDDS-2060
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Elek, Marton
Assignee: Elek, Marton


With HDDS-2058 the Ozone (source) release package doesn't contains the hadoop 
sources any more. We need to create an adjusted LICENSE file for the Ozone 
source package (We already created a specific LICENSE file for the binary 
package which is not changed).

In the new LICENSE file we should include entries only for the sources which 
are part of the Ozone release.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDDS-1596) Create service endpoint to download configuration from SCM

2019-08-29 Thread Elek, Marton (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton reopened HDDS-1596:


> Create service endpoint to download configuration from SCM
> --
>
> Key: HDDS-1596
> URL: https://issues.apache.org/jira/browse/HDDS-1596
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> As written in the design doc (see the parent issue) it was proposed to 
> download the configuration from the scm by the other services.
> I propose to create a separated endpoint to provide the ozone configuration. 
> /conf can't be used as it contains *all* the configuration and we need only 
> the modified configuration.
> The easiest way to implement this feature is:
>  * Create a simple rest endpoint which publishes all the configuration
>  * Download the configurations to $HADOOP_CONF_DIR/ozone-global.xml during 
> the service startup.
>  * Add ozone-global.xml as an additional config source (before ozone-site.xml 
> but after ozone-default.xml)
>  * The download can be optional
> With this approach we keep the support of the existing manual configuration 
> (ozone-site.xml has higher priority) but we can download the configuration to 
> a separated file during the startup, which will be loaded.
> There is no magic: the configuration file is saved and it's easy to debug 
> what's going on as the OzoneConfiguration is loaded from the $HADOOP_CONF_DIR 
> as before.
> Possible follow-up steps:
>  * Migrate all the other services (recon, s3g) to the new approach. (possible 
> newbie jiras)
>  * Improve the CLI to define the SCM address. (As of now we use 
> ozone.scm.names)
>  * Create a service/hostname registration mechanism and autofill some of the 
> configuration based on the topology information.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-2030) Generate simplifed reports by the dev-support/checks/*.sh scripts

2019-08-24 Thread Elek, Marton (Jira)
Elek, Marton created HDDS-2030:
--

 Summary: Generate simplifed reports by the dev-support/checks/*.sh 
scripts
 Key: HDDS-2030
 URL: https://issues.apache.org/jira/browse/HDDS-2030
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: build
Reporter: Elek, Marton
Assignee: Elek, Marton


hadoop-ozone/dev-support/checks directory contains shell scripts to execute 
different type of code checks (findbugs, checkstyle, etc.)

Currently the contract is very simple. Every shell script executes one (and 
only one) check and the shell response code is set according to the result 
(non-zero code if failed).

To have better reporting in the github pr build, it would be great to improve 
the scripts to generate simple summary files and save the relevant files for 
archiving.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-2025) Updated the Dockerfile of the official apache/ozone image

2019-08-23 Thread Elek, Marton (Jira)
Elek, Marton created HDDS-2025:
--

 Summary: Updated the Dockerfile of the official apache/ozone image
 Key: HDDS-2025
 URL: https://issues.apache.org/jira/browse/HDDS-2025
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Elek, Marton
Assignee: Elek, Marton


The hadoop-docker-ozone repository contains the definition of the apache/ozone 
image. 

https://github.com/apache/hadoop-docker-ozone/tree/ozone-latest

It creates a docker packaging for the voted and released artifact, therefore it 
can be released after the final vote.

Since the latest release we did some modification in our Dockerfiles. We need 
to apply the changes to the official image as well. Especially:

 1. use ozone-runner as a base image instead of hadoop-runner
 2. rename ozoneManager service to om as we did everywhere
 3. Adjust the starter location (the script is moved to the released tar file)
 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2024) rat.sh: grep: warning: recursive search of stdin

2019-08-23 Thread Elek, Marton (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton resolved HDDS-2024.

Fix Version/s: 0.5.0
   Resolution: Fixed

> rat.sh: grep: warning: recursive search of stdin
> 
>
> Key: HDDS-2024
> URL: https://issues.apache.org/jira/browse/HDDS-2024
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.4.1
>Reporter: Doroszlai, Attila
>Assignee: Doroszlai, Attila
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Running {{rat.sh}} locally fails with the following error message (after the 
> two Maven runs):
> {code:title=./hadoop-ozone/dev-support/checks/rat.sh}
> ...
> grep: warning: recursive search of stdin
> {code}
> This happens if {{grep}} is not the GNU one.
> Further, {{rat.sh}} runs into: {{cat: target/rat-aggregated.txt: No such file 
> or directory}} in subshell due to a typo, and so always exits with success:
> {code:title=./hadoop-ozone/dev-support/checks/rat.sh}
> ...
> cat: target/rat-aggregated.txt: No such file or directory
> $ echo $?
> 0
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-2022) Add additional freon tests

2019-08-23 Thread Elek, Marton (Jira)
Elek, Marton created HDDS-2022:
--

 Summary: Add additional freon tests
 Key: HDDS-2022
 URL: https://issues.apache.org/jira/browse/HDDS-2022
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Tools
Reporter: Elek, Marton
Assignee: Elek, Marton


Freon is a generic load generator tool for ozone (ozone freon) which supports 
multiple generation pattern.

As of now only the random-key-generator is implemented which uses ozone rpc 
client.

It would be great to add additional tests:

 * Test key generation via s3 interface
 * Test key generation via the hadoop fs interface
 * Test key reads (validation)
 * Test OM with direct RPC calls




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-2000) Don't depend on bootstrap/jquery versions from hadoop-trunk snapshot

2019-08-21 Thread Elek, Marton (Jira)
Elek, Marton created HDDS-2000:
--

 Summary: Don't depend on bootstrap/jquery versions from 
hadoop-trunk snapshot
 Key: HDDS-2000
 URL: https://issues.apache.org/jira/browse/HDDS-2000
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: om, SCM
Reporter: Elek, Marton
Assignee: Elek, Marton


The OM/SCM web pages are broken due to the upgrade in HDFS-14729 (which is a 
great patch on the Hadoop side). To have more stability I propose to use our 
own instance from jquery/bootstrap instead of copying the actual version from 
hadoop trunk which is a SNAPSHOT build.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1997) Support copy-source-if-(un)modified-since headers for MPU key creation with copy

2019-08-21 Thread Elek, Marton (Jira)
Elek, Marton created HDDS-1997:
--

 Summary: Support copy-source-if-(un)modified-since headers for MPU 
key creation with copy
 Key: HDDS-1997
 URL: https://issues.apache.org/jira/browse/HDDS-1997
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
  Components: S3
Reporter: Elek, Marton


HDDS-1942 introduces the option to create MPU key part with copy 
(https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPartCopy.html). 
But the x-amz-copy-source-if-(un)modified-since headers are not supported yet. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1951) Wrong symbolic release name on 0.4.1 branc

2019-08-10 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1951:
--

 Summary: Wrong symbolic release name on 0.4.1 branc
 Key: HDDS-1951
 URL: https://issues.apache.org/jira/browse/HDDS-1951
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Elek, Marton


Should be Biscayne instead of Crater lake according to the Roadmap:

https://cwiki.apache.org/confluence/display/HADOOP/Ozone+Road+Map




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1950) S3 MPU part list can't be called if there are no parts

2019-08-10 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1950:
--

 Summary: S3 MPU part list can't be called if there are no parts
 Key: HDDS-1950
 URL: https://issues.apache.org/jira/browse/HDDS-1950
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: S3
Reporter: Elek, Marton


If an S3 multipart upload is created but no part is upload the part list can't 
be called because it throws HTTP 500:

Create an MPU:

{code}
aws s3api --endpoint http://localhost: create-multipart-upload 
--bucket=docker --key=testkeu 
{
"Bucket": "docker",
"Key": "testkeu",
"UploadId": "85343e71-4c16-4a75-bb55-01f56a9339b2-102592678478217234"
}
{code}

List the parts:

{code}
aws s3api --endpoint http://localhost: list-parts  --bucket=docker 
--key=testkeu 
--upload-id=85343e71-4c16-4a75-bb55-01f56a9339b2-102592678478217234
{code}

It throws an exception on the server side, because in the 
KeyManagerImpl.listParts the  ReplicationType is retrieved from the first part:

{code}
HddsProtos.ReplicationType replicationType =
partKeyInfoMap.firstEntry().getValue().getPartKeyInfo().getType();
{code}

Which is not yet available in this use case.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1948) S3 MPU can't be created with octet-stream content-type

2019-08-10 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1948:
--

 Summary: S3 MPU can't be created with octet-stream content-type 
 Key: HDDS-1948
 URL: https://issues.apache.org/jira/browse/HDDS-1948
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Elek, Marton
Assignee: Elek, Marton


This problem is reported offline by [~shaneku...@gmail.com].

When aws-sdk-go is used to access to s3 gateway of Ozone it sends the Multi 
Part Upload initialize message with "application/octet-stream" Content-Type. 

This Content-Type is missing from the aws-cli which is used to reimplement s3 
endpoint.

The problem is that we use the same rest endpoint for initialize and complete 
Multipart Upload request. For the completion we need the 
CompleteMultipartUploadRequest parameter which is parsed from the body.

For initialize we have an empty body which can't be serialized to 
CompleteMultipartUploadRequest.

The workaround is to set a specific content type from a filter which help up to 
create two different REST method for initialize and completion message.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1942) Support copy during S3 multipart upload part creation

2019-08-09 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1942:
--

 Summary: Support copy during S3 multipart upload part creation
 Key: HDDS-1942
 URL: https://issues.apache.org/jira/browse/HDDS-1942
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
  Components: S3
Reporter: Elek, Marton


Uploads a part by copying data from an existing object as data source

Documented here:

https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPartCopy.html



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1937) Acceptance tests fail if scm webui shows invalid json

2019-08-08 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1937:
--

 Summary: Acceptance tests fail if scm webui shows invalid json
 Key: HDDS-1937
 URL: https://issues.apache.org/jira/browse/HDDS-1937
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Elek, Marton
Assignee: Elek, Marton


Acceptance test of a nightly build is failed with the following error:

{code}
Creating ozonesecure_datanode_3 ... 

Creating ozonesecure_kdc_1  ... done

Creating ozonesecure_om_1   ... done

Creating ozonesecure_scm_1  ... done

Creating ozonesecure_datanode_3 ... done

Creating ozonesecure_kms_1  ... done

Creating ozonesecure_s3g_1  ... done

Creating ozonesecure_datanode_2 ... done

Creating ozonesecure_datanode_1 ... done
parse error: Invalid numeric literal at line 2, column 0
{code}

https://raw.githubusercontent.com/elek/ozone-ci/master/byscane/byscane-nightly-5b87q/acceptance/output.log

The problem is in the script which checks the number of available datanodes.

If the HTTP endpoint of the SCM is already started BUT not ready yet it may 
return with a simple HTML error message instead of json. Which can not be 
parsed by jq:

In testlib.sh:

{code}
  37   │   if [[ "${SECURITY_ENABLED}" == 'true' ]]; then
  38   │ docker-compose -f "${compose_file}" exec -T scm bash -c "kinit -k 
HTTP/scm@EXAMPL
   │ E.COM -t /etc/security/keytabs/HTTP.keytab && curl --negotiate -u : -s 
'${jmx_url}'"
  39   │   else
  40   │ docker-compose -f "${compose_file}" exec -T scm curl -s 
"${jmx_url}"
  41   │   fi \
  42   │ | jq -r '.beans[0].NodeCount[] | select(.key=="HEALTHY") | .value'
{code}

One possible fix is to adjust the error handling (set +x / set -x) per method 
instead of using a generic set -x at the beginning. It would provide a more 
predictable behavior. In our case count_datanode should not fail evert (as the 
caller method: wait_for_datanodes can retry anyway).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1935) Improve the visibility with Ozone Insight tool

2019-08-08 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1935:
--

 Summary: Improve the visibility with Ozone Insight tool
 Key: HDDS-1935
 URL: https://issues.apache.org/jira/browse/HDDS-1935
 Project: Hadoop Distributed Data Store
  Issue Type: New Feature
Reporter: Elek, Marton
Assignee: Elek, Marton




Visibility is a key aspect for the operation of any Ozone cluster. We need 
better visibility to improve correctnes and performance. While the distributed 
tracing is a good tool for improving the visibility of performance we have no 
powerful tool which can be used to check the internal state of the Ozone 
cluster and debug certain correctness issues.

To improve the visibility of the internal components I propose to introduce a 
new command line application `ozone insight`.

The new tool will show the selected metrics / logs / configuration for any of 
the internal components (like replication-manager, pipeline, etc.).

For each insight points we can define the required logs and log levels, metrics 
and configuration and the tool can display only the component specific 
information during the debug.

h2. Usage

First we can check the available insight point:

{code}
bash-4.2$ ozone insight list
Available insight points:


  scm.node-manager SCM Datanode management related 
information.
  scm.replica-manager  SCM closed container replication manager
  scm.event-queue  Information about the internal async 
event delivery
  scm.protocol.block-location  SCM Block location protocol endpoint
  scm.protocol.container-location  Planned insight point which is not yet 
implemented.
  scm.protocol.datanodePlanned insight point which is not yet 
implemented.
  scm.protocol.securityPlanned insight point which is not yet 
implemented.
  scm.http Planned insight point which is not yet 
implemented.
  om.key-manager   OM Key Manager
  om.protocol.client   Ozone Manager RPC endpoint
  om.http  Planned insight point which is not yet 
implemented.
  datanode.pipeline[id]More information about one ratis 
datanode ring.
  datanode.rocksdb More information about one ratis 
datanode ring.
  s3g.http Planned insight point which is not yet 
implemented.
{code}

Insight points can define configuration, metrics and/or logs. Configuration can 
be displayed based on the configuration objects:

{code}
ozone insight config scm.protocol.block-location
Configuration for `scm.protocol.block-location` (SCM Block location protocol 
endpoint)

>>> ozone.scm.block.client.bind.host
   default: 0.0.0.0
   current: 0.0.0.0

The hostname or IP address used by the SCM block client  endpoint to bind


>>> ozone.scm.block.client.port
   default: 9863
   current: 9863

The port number of the Ozone SCM block client service.


>>> ozone.scm.block.client.address
   default: ${ozone.scm.client.address}
   current: scm

The address of the Ozone SCM block client service. If not defined value of 
ozone.scm.client.address is used

{code}

Metrics can be retrieved from the prometheus entrypoint:

{code}
ozone insight metrics scm.protocol.block-location
Metrics for `scm.protocol.block-location` (SCM Block location protocol endpoint)

RPC connections

  Open connections: 0
  Dropped connections: 0
  Received bytes: 0
  Sent bytes: 0


RPC queue

  RPC average queue time: 0.0
  RPC call queue length: 0


RPC performance

  RPC processing time average: 0.0
  Number of slow calls: 0


Message type counters

  Number of AllocateScmBlock: 0
  Number of DeleteScmKeyBlocks: 0
  Number of GetScmInfo: 2
  Number of SortDatanodes: 0
{code}

Log levels can be adjusted with the existing logLevel servlet and can be 
collected / streamd via a simple logstream servlet:

{code}
ozone insight log scm.node-manager
[SCM] 2019-08-08 12:42:37,392 
[DEBUG|org.apache.hadoop.hdds.scm.node.SCMNodeManager|SCMNodeManager] 
Processing node report from [datanode=ozone_datanode_1.ozone_default]
[SCM] 2019-08-08 12:43:37,392 
[DEBUG|org.apache.hadoop.hdds.scm.node.SCMNodeManager|SCMNodeManager] 
Processing node report from [datanode=ozone_datanode_1.ozone_default]
[SCM] 2019-08-08 12:44:37,392 
[DEBUG|org.apache.hadoop.hdds.scm.node.SCMNodeManager|SCMNodeManager] 
Processing node report from [datanode=ozone_datanode_1.ozone_default]
[SCM] 2019-08-08 12:45:37,393 
[DEBUG|org.apache.hadoop.hdds.scm.node.SCMNodeManager|SCMNodeManager] 
Processing node report from [datanode=ozone_datanode_1.ozone_default]
[SCM] 2019-08-08 12:46:37,392 
[DEBUG|org.apache.hadoop.hdds.scm.node.SCMNodeManager|SCMNodeManager] 
Processing node report from [datanode=ozone_datanode_1.ozone_default]
{code}

The verbose mode can display the raw messages 

[jira] [Created] (HDDS-1926) The new caching layer is used for old OM requests but not updated

2019-08-07 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1926:
--

 Summary: The new caching layer is used for old OM requests but not 
updated
 Key: HDDS-1926
 URL: https://issues.apache.org/jira/browse/HDDS-1926
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: om
Reporter: Elek, Marton


HDDS-1499 introduced a new caching layer together with a double-buffer based db 
writer to support OM HA.

TLDR: I think the caching layer is not updated for new volume creation. And 
(slightly related to this problem) I suggest to separated the TypedTable and 
the caching layer.

## How to reproduce the problem?

1. Start a docker compose cluster
2. Create one volume (let's say `/vol1`)
3. Restart the om (!)
4. Try to create an _other_ volume twice!

```
bash-4.2$ ozone sh volume create /vol2
2019-08-07 12:29:47 INFO  RpcClient:288 - Creating Volume: vol2, with hadoop as 
owner.
bash-4.2$ ozone sh volume create /vol2
2019-08-07 12:29:50 INFO  RpcClient:288 - Creating Volume: vol2, with hadoop as 
owner.
```

Expected behavior is an error:

{code}
bash-4.2$ ozone sh volume create /vol1
2019-08-07 09:48:39 INFO  RpcClient:288 - Creating Volume: vol1, with hadoop as 
owner.
bash-4.2$ ozone sh volume create /vol1
2019-08-07 09:48:42 INFO  RpcClient:288 - Creating Volume: vol1, with hadoop as 
owner.
VOLUME_ALREADY_EXISTS 
{code}

The problem is that the new cache is used even for the old code path 
(TypedTable):

{code}
 @Override
  public VALUE get(KEY key) throws IOException {
// Here the metadata lock will guarantee that cache is not updated for same
// key during get key.

CacheResult> cacheResult =
cache.lookup(new CacheKey<>(key));

if (cacheResult.getCacheStatus() == EXISTS) {
  return cacheResult.getValue().getCacheValue();
} else if (cacheResult.getCacheStatus() == NOT_EXIST) {
  return null;
} else {
  return getFromTable(key);
}
  }
{code}

For volume table after the FIRST start it always returns with 
`getFromTable(key)` due to the condition in the `TableCacheImpl.lookup`:

{code}

  public CacheResult lookup(CACHEKEY cachekey) {

if (cache.size() == 0) {
  return new CacheResult<>(CacheResult.CacheStatus.MAY_EXIST,
  null);
}
{code}

But after a restart the cache is pre-loaded by the TypedTable.constructor. 
After the restart, the real caching logic will be used (as cache.size()>0), 
which cause a problem as the cache is NOT updated from the old code path.

An additional problem is that the cache is turned on for all the metadata table 
even if the cache is not required... 

## Proposed solution

As I commented at HDDS-1499 this caching layer is not a "traditional cache". 
It's not updated during the typedTable.put() call but updated by a separated 
component during double-buffer flash.

I would suggest to remove the cache related methods from TypedTable (move to a 
separated implementation). I think this kind of caching can be independent from 
the TypedTable implementation. We can continue to use the simple TypedTable 
everywhere where we don't need to use any kind of caching.

For caching we can use a separated object. It would make it more visible that 
the cache should always be updated manually all the time. This separated 
caching utility may include a reference to the original TypedTable/Table. With 
this approach we can separate the different responsibilities but provide the 
same functionality.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1915) Remove hadoop script from ozone distribution

2019-08-06 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1915:
--

 Summary: Remove hadoop script from ozone distribution
 Key: HDDS-1915
 URL: https://issues.apache.org/jira/browse/HDDS-1915
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Elek, Marton
Assignee: Elek, Marton


/bin/hadoop script is included in the ozone distribution even if we a dedicated 
/bin/ozone

[~arp] reported that it can be confusing, for example "hadoop classpath" 
returns with a bad classpath (ozone classpath ) should be used 
instead.

To avoid such confusions I suggest to remove the hadoop script from 
distribution as ozone script already provides all the functionalities.

It also helps as to reduce the dependencies between hadoop 3.2-SNAPSHOT and 
ozone as we use the snapshot hadoop script as of now.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1914) Ozonescript example docker-compose cluster can't be started

2019-08-06 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1914:
--

 Summary: Ozonescript example docker-compose cluster can't be 
started
 Key: HDDS-1914
 URL: https://issues.apache.org/jira/browse/HDDS-1914
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Elek, Marton
Assignee: Elek, Marton


the compose/ozonescripts cluster provides an example environment to test the 
start-ozone.sh and stop-ozone.sh scripts.

It starts containers with sshd daemon but witout starting the ozone which makes 
it possible to start those scripts.

Unfortunately the docker files are broken since:
 * we switched from debian to centos with the base image
 * we started to use /etc/hadoop instead of /opt/hadoop/etc/hadoop for 
configuring the hadoop (workers file should be copied there)
 * we started to use jdk11 to execute ozone (instead of java8)

The configuration files should be updated according to these changes. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDDS-1881) Design doc: decommissioning in Ozone

2019-07-31 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton reopened HDDS-1881:


> Design doc: decommissioning in Ozone
> 
>
> Key: HDDS-1881
> URL: https://issues.apache.org/jira/browse/HDDS-1881
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
>  Labels: design, pull-request-available
>  Time Spent: 33h 50m
>  Remaining Estimate: 0h
>
> Design doc can be attached to the documentation. In this jira the design doc 
> will be attached and merged to the documentation page.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1881) Design doc: decommissioning in Ozone

2019-07-31 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1881:
--

 Summary: Design doc: decommissioning in Ozone
 Key: HDDS-1881
 URL: https://issues.apache.org/jira/browse/HDDS-1881
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Elek, Marton
Assignee: Elek, Marton


Design doc can be attached to the documentation. In this jira the design doc 
will be attached and merged to the documentation page.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1880) Decommissioining and maintenance mode in Ozone

2019-07-31 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1880:
--

 Summary: Decommissioining and maintenance mode in Ozone 
 Key: HDDS-1880
 URL: https://issues.apache.org/jira/browse/HDDS-1880
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: SCM
Reporter: Elek, Marton


This is the umbrella jira for decommissioning support in Ozone. Design doc will 
be attached soon.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1871) Remove anti-affinity rules from k8s minkube example

2019-07-29 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1871:
--

 Summary: Remove anti-affinity rules from k8s minkube example
 Key: HDDS-1871
 URL: https://issues.apache.org/jira/browse/HDDS-1871
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: kubernetes
Reporter: Elek, Marton
Assignee: Elek, Marton


HDDS-1646 introduced real persistence for k8s example deployment files which 
means that we need anti-affinity scheduling rules: Even if we use statefulset 
instead of daemonset we would like to start one datanode per real nodes.

With minikube we have only one node therefore the scheduling rule should be 
removed to enable at least 3 datanodes on the same physical nodes.

How to test:

{code}
 mvn clean install -DskipTests -f pom.ozone.xml
cd hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/kubernetes/examples/minikube
minikube start
kubectl apply -f .
kc get pod
{code}

You should see 3 datanode instances.




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1811) Prometheus metrics are broken for datanodes due tue a wrong metrics

2019-07-16 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1811:
--

 Summary: Prometheus metrics are broken for datanodes due tue a 
wrong metrics
 Key: HDDS-1811
 URL: https://issues.apache.org/jira/browse/HDDS-1811
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Reporter: Elek, Marton


Datanodes can't be monitored with prometheus any more:

{code}
level=warn ts=2019-07-16T16:29:55.876Z caller=scrape.go:937 component="scrape 
manager" scrape_pool=pods target=http://192.168.69.76:9882/prom msg="append 
failed" err="invalid metric type 
\"apache.hadoop.ozone.container.common.transport.server.ratis._csm_metrics_delete_container_avg_time
 gauge\""
{code}






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1673) smoketests are failing because an acl error

2019-07-16 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton resolved HDDS-1673.

Resolution: Cannot Reproduce

Seems to be fixed by the latest ACL patches. I am closing this for now. Feel 
free to reopen if you see the problem again.

> smoketests are failing because an acl error
> ---
>
> Key: HDDS-1673
> URL: https://issues.apache.org/jira/browse/HDDS-1673
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Elek, Marton
>Assignee: Anu Engineer
>Priority: Blocker
>
> After executing this command:
> {code}
> yarn jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0.jar pi 3 3 2
> {code}
> The result is:
> {code}
>   Number of Maps  = 3
> Samples per Map = 3
> 2019-06-12 03:16:20 ERROR OzoneClientFactory:294 - Couldn't create protocol 
> class org.apache.hadoop.ozone.client.rpc.RpcClient exception: 
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:291)
>   at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:169)
>   at 
> org.apache.hadoop.fs.ozone.BasicOzoneClientAdapterImpl.(BasicOzoneClientAdapterImpl.java:134)
>   at 
> org.apache.hadoop.fs.ozone.OzoneClientAdapterImpl.(OzoneClientAdapterImpl.java:50)
>   at 
> org.apache.hadoop.fs.ozone.OzoneFileSystem.createAdapter(OzoneFileSystem.java:103)
>   at 
> org.apache.hadoop.fs.ozone.BasicOzoneFileSystem.initialize(BasicOzoneFileSystem.java:143)
>   at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
>   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
>   at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
>   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:227)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:463)
>   at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
>   at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:522)
>   at 
> org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:275)
>   at 
> org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:360)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at 
> org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:368)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>   at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
>   at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
> Caused by: org.apache.hadoop.hdds.conf.ConfigurationException: Can't inject 
> configuration to class 
> org.apache.hadoop.ozone.security.acl.OzoneAclConfig.setUserDefaultRights
>   at 
> org.apache.hadoop.hdds.conf.OzoneConfiguration.getObject(OzoneConfiguration.java:160)
>   at 
> org.apache.hadoop.ozone.client.rpc.RpcClient.(RpcClient.java:148)
>   ... 36 more
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> 

[jira] [Created] (HDDS-1800) Result of author check is inverted

2019-07-15 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1800:
--

 Summary: Result of author check is inverted
 Key: HDDS-1800
 URL: https://issues.apache.org/jira/browse/HDDS-1800
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Elek, Marton


## What changes were proposed in this pull request?

Fix:

 1. author check fails when no violations are found
 2. author check violations are duplicated in the output

Eg. https://ci.anzix.net/job/ozone-nightly/173/consoleText says that:


{code:java}
The following tests are FAILED:

[author]: author check is failed 
(https://ci.anzix.net/job/ozone-nightly/173//artifact/build/author.out/*view*/){code}


but no actual `@author` tags were found:

```
$ curl -s 
'https://ci.anzix.net/job/ozone-nightly/173//artifact/build/author.out/*view*/' 
| wc
   0   0   0
```

## How was this patch tested?

{code}
$ bash -o pipefail -c 'hadoop-ozone/dev-support/checks/author.sh | tee 
build/author.out'; echo $?
0

$ wc build/author.out
   0   0   0 build/author.out

$ echo '// @author Tolkien' >> 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/BucketManager.java

$ bash -o pipefail -c 'hadoop-ozone/dev-support/checks/author.sh | tee 
build/author.out'; echo $?
./hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/BucketManager.java://
 @author Tolkien
1

$ wc build/author.out
   1   3 108 build/author.out
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1799) Add goofyfs to the ozone-runner docker image

2019-07-15 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1799:
--

 Summary: Add goofyfs to the ozone-runner docker image
 Key: HDDS-1799
 URL: https://issues.apache.org/jira/browse/HDDS-1799
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Elek, Marton
Assignee: Elek, Marton


Goofys is a s3 fuse driver which is required for the ozone csi setup.

As of now it's installed in hadoop-ozone/dist/src/main/docker/Dockerfile from a 
non-standard location (because it couldn't be part of hadoop-runner earlier as 
it's ozone specific).

It should be installed to the ozone-runner from a canonical goffys release URL.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1793) Acceptance test of ozone-topology cluster is failing

2019-07-12 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1793:
--

 Summary: Acceptance test of ozone-topology cluster is failing
 Key: HDDS-1793
 URL: https://issues.apache.org/jira/browse/HDDS-1793
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Elek, Marton


Since HDDS-1586 the smoketests of the ozone-topology compose file is broken:
{code:java}
Output:  
/tmp/smoketest/ozone-topology/result/robot-ozone-topology-ozone-topology-basic-scm.xml
must specify at least one container source
Stopping datanode_2 ... 
Stopping datanode_3 ... 
Stopping datanode_4 ... 
Stopping scm... 
Stopping om ... 
Stopping datanode_1 ... 

Stopping datanode_2 ... done

Stopping datanode_4 ... done

Stopping datanode_1 ... done

Stopping datanode_3 ... done

Stopping scm... done

Stopping om ... done
Removing datanode_2 ... 
Removing datanode_3 ... 
Removing datanode_4 ... 
Removing scm... 
Removing om ... 
Removing datanode_1 ... 

Removing datanode_1 ... done

Removing om ... done

Removing datanode_3 ... done

Removing datanode_4 ... done

Removing datanode_2 ... done

Removing scm... done
Removing network ozone-topology_net
[ ERROR ] Reading XML source 
'/var/jenkins_home/workspace/ozone/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/ozone-topology/result/robot-*.xml'
 failed: No such file or directory

Try --help for usage information.
ERROR: Test execution of 
/var/jenkins_home/workspace/ozone/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/ozone-topology
 is FAILED{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1785) OOM error in Freon due to the concurrency handling

2019-07-11 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1785:
--

 Summary: OOM error in Freon due to the concurrency handling
 Key: HDDS-1785
 URL: https://issues.apache.org/jira/browse/HDDS-1785
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Elek, Marton


HDDS-1530 modified the concurrent framework usage of Freon (RandomKeyGenerator).

The new approach uses separated tasks (Runnable) to create the 
volumes/buckets/keys.

Unfortunately it doesn't work very well in some cases.
 # When Freon starts it creates an executor with fixed number of threads (10)
 # The first loop submits numOfVolumes (10) VolumeProcessor tasks to the 
executor
 # The 10 threads starts to execute the 10 VolumeProcessor tasks
 # Each VolumeProcessor tasks creates numOfBuckets (1000) BucketProcessor 
tasks. All together 1 tasks are submitted to the executor.
 # The 10 threads starts to execute the first 10 BucketProcessor tasks, they 
starts to create the KeyProcessor tasks: 500 000 * 10 tasks are submitted.
 # At this point of the time no keys are generated, but the next 10 
BucketProcessor tasks are started to execute..
 # To execute the first key creation we should process all the BucketProcessor 
tasks which means that all the Key creation tasks (10 * 1000 * 500 000) are 
created and added to the executor
 # Which requires a huge amount of time and memory



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1459) Docker compose of ozonefs has older hadoop image for hadoop 3.2

2019-07-10 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton resolved HDDS-1459.

Resolution: Duplicate

Thanks the report [~vivekratnavel]

It's fixed with HDDS-1525:

ozonefs compose files are removed because ozone-mr tests are improved and they 
include the same functionality (ozone fs test with hdfs cli AND with mr client).

Versions are fixed (2.7, 3.1, 3.2)

> Docker compose of ozonefs has older hadoop image for hadoop 3.2
> ---
>
> Key: HDDS-1459
> URL: https://issues.apache.org/jira/browse/HDDS-1459
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Affects Versions: 0.4.0
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1014) hadoop-ozone-filesystem is missing required jars

2019-07-09 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton resolved HDDS-1014.

Resolution: Duplicate

Thanks the report [~bharatviswa]

For mapreduce please use hadoop-ozone-filesystem-lib-current-0.5.0-SNAPSHOT.jar 
or hadoop-ozone-filesystem-lib-legacy-0.5.0-SNAPSHOT.jar instead of 
hadoop-ozone-filesystem-0.5.0-SNAPSHOT.jar.

Legacy and current jar files are the shaded jar files. the simple filesystem 
includes only the ozonefs jar files to make it work with "ozone fs" command.

 

BTW, legacy/current jar files were broken at the time of this report which made 
harder to find the right jars. but they will be fixed by HDDS-1525 and 
HDDS-1717 very soon...

 

> hadoop-ozone-filesystem is missing required jars
> 
>
> Key: HDDS-1014
> URL: https://issues.apache.org/jira/browse/HDDS-1014
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>
> https://hadoop.apache.org/ozone/docs/0.3.0-alpha/ozonefs.html
> After following the steps mentioned, I still get below error:
> {code:java}
> 19/01/25 17:15:28 ERROR client.OzoneClientFactory: Couldn't create protocol 
> class org.apache.hadoop.ozone.client.rpc.RpcClient exception: 
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:291)
>   at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:169)
>   at 
> org.apache.hadoop.fs.ozone.OzoneFileSystem.initialize(OzoneFileSystem.java:128)
>   at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3354)
>   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
>   at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3403)
>   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3371)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:477)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:226)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:461)
>   at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
>   at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:352)
>   at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:250)
>   at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:233)
>   at 
> org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:104)
>   at org.apache.hadoop.fs.shell.Command.run(Command.java:177)
>   at org.apache.hadoop.fs.FsShell.run(FsShell.java:328)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>   at org.apache.hadoop.fs.FsShell.main(FsShell.java:391)
> Caused by: java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: 
> org/apache/ratis/thirdparty/com/google/protobuf/ByteString
>   at org.apache.ratis.protocol.RaftId.(RaftId.java:64)
>   at org.apache.ratis.protocol.ClientId.(ClientId.java:47)
>   at org.apache.ratis.protocol.ClientId.randomId(ClientId.java:31)
>   at org.apache.hadoop.ozone.client.rpc.RpcClient.(RpcClient.java:115)
>   ... 24 more
> Caused by: java.lang.NoClassDefFoundError: 
> org/apache/ratis/thirdparty/com/google/protobuf/ByteString
>   ... 28 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.ratis.thirdparty.com.google.protobuf.ByteString
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   ... 28 more
> {code}
> So, I proceeded and added ratis-thirdparty-misc jar.
> After that I got error related to missing RatisProto, and then next missing 
> bouncy castle.
> After adding all of those jars I am able to run dfs and map red jobs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1431) Linkage error thrown if ozone-fs-legacy jar is on the classpath

2019-07-09 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton resolved HDDS-1431.

Resolution: Duplicate
  Assignee: Elek, Marton

Thanks [~swagle] the report this issue.
 # classpath issues related the usage of the lib-legacy jar will be fixed by 
HDDS-1525 (I will close this as a duplicate)
 # ozonefs-lib-legacy/current jar files are shaded all-in-one files. It's not 
supported to add them to the classpath together with any other ozone jar. It's 
enough to add just the fat jars to the HADOOP_CLASSPATH. (But if you do it, you 
won't get the this strange exception after HDDS-1525)

> Linkage error thrown if ozone-fs-legacy jar is on the classpath
> ---
>
> Key: HDDS-1431
> URL: https://issues.apache.org/jira/browse/HDDS-1431
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Reporter: Siddharth Wagle
>Assignee: Elek, Marton
>Priority: Major
>
> With hadoop-ozone-filesystem-lib-legacy-0.5.0-SNAPSHOT.jar on the classpath 
> along with current jar results in classloader throwing an error on fs write 
> operation as below:
> {code}
> 2019-04-11 16:06:54,127 ERROR [OzoneClientAdapterFactory] Can't initialize 
> the ozoneClientAdapter
> java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at 
> org.apache.hadoop.fs.ozone.OzoneClientAdapterFactory.lambda$createAdapter$1(OzoneClientAdapterFactory.java:66)
> at 
> org.apache.hadoop.fs.ozone.OzoneClientAdapterFactory.createAdapter(OzoneClientAdapterFactory.java:116)
> at 
> org.apache.hadoop.fs.ozone.OzoneClientAdapterFactory.createAdapter(OzoneClientAdapterFactory.java:62)
> at 
> org.apache.hadoop.fs.ozone.OzoneFileSystem.createAdapter(OzoneFileSystem.java:92)
> at 
> org.apache.hadoop.fs.ozone.BasicOzoneFileSystem.initialize(BasicOzoneFileSystem.java:146)
> at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
> at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
> at 
> org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:3326)
> at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:532)
> at org.notmysock.repl.Works$CopyWorker.run(Works.java:252)
> at org.notmysock.repl.Works$CopyWorker.call(Works.java:287)
> at org.notmysock.repl.Works$CopyWorker.call(Works.java:207)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.LinkageError: loader constraint violation: loader 
> (instance of org/apache/hadoop/fs/ozone/FilteredClassLoader) previously 
> initiated loading for a different t
> ype with name "org/apache/hadoop/crypto/key/KeyProvider"
> at java.lang.ClassLoader.defineClass1(Native Method)
> at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
> at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
> at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at 
> org.apache.hadoop.fs.ozone.FilteredClassLoader.loadClass(FilteredClassLoader.java:72)
> at java.lang.Class.getDeclaredMethods0(Native Method)
> at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
> at java.lang.Class.privateGetPublicMethods(Class.java:2902)
> at java.lang.Class.getMethods(Class.java:1615)
> at sun.misc.ProxyGenerator.generateClassFile(ProxyGenerator.java:451)
> at sun.misc.ProxyGenerator.generateProxyClass(ProxyGenerator.java:339)
> at 

[jira] [Resolved] (HDDS-1338) ozone shell commands are throwing InvocationTargetException

2019-07-08 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton resolved HDDS-1338.

Resolution: Duplicate

> ozone shell commands are throwing InvocationTargetException
> ---
>
> Key: HDDS-1338
> URL: https://issues.apache.org/jira/browse/HDDS-1338
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Priority: Major
>
> ozone version
> {noformat}
> Source code repository g...@github.com:hortonworks/ozone.git -r 
> 310ebf5dc83b6c9e68d09246ed6c6f7cf6370fde
> Compiled by jenkins on 2019-03-21T22:06Z
> Compiled with protoc 2.5.0
> From source with checksum 9c367143ad43b81ca84bfdaafd1c3f
> Using HDDS 0.4.0.3.0.100.0-388
> Source code repository g...@github.com:hortonworks/ozone.git -r 
> 310ebf5dc83b6c9e68d09246ed6c6f7cf6370fde
> Compiled by jenkins on 2019-03-21T22:06Z
> Compiled with protoc 2.5.0
> From source with checksum f3297cbd3a5f59fb4e5fd551afa05ba9
> {noformat}
> Here is the ozone volume create failure output :
> {noformat}
> hdfs@ctr-e139-1542663976389-91321-01-02 ~]$ ozone sh volume create 
> testvolume11
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.0.100.0-388/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.0.100.0-388/hadoop-ozone/share/ozone/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 19/03/26 17:31:37 ERROR client.OzoneClientFactory: Couldn't create protocol 
> class org.apache.hadoop.ozone.client.rpc.RpcClient exception:
> java.lang.reflect.InvocationTargetException
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:291)
>  at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:169)
>  at 
> org.apache.hadoop.ozone.web.ozShell.OzoneAddress.createClient(OzoneAddress.java:111)
>  at 
> org.apache.hadoop.ozone.web.ozShell.volume.CreateVolumeHandler.call(CreateVolumeHandler.java:70)
>  at 
> org.apache.hadoop.ozone.web.ozShell.volume.CreateVolumeHandler.call(CreateVolumeHandler.java:38)
>  at picocli.CommandLine.execute(CommandLine.java:919)
>  at picocli.CommandLine.access$700(CommandLine.java:104)
>  at picocli.CommandLine$RunLast.handle(CommandLine.java:1083)
>  at picocli.CommandLine$RunLast.handle(CommandLine.java:1051)
>  at 
> picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:959)
>  at picocli.CommandLine.parseWithHandlers(CommandLine.java:1242)
>  at picocli.CommandLine.parseWithHandler(CommandLine.java:1181)
>  at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:61)
>  at org.apache.hadoop.ozone.web.ozShell.Shell.execute(Shell.java:82)
>  at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:52)
>  at org.apache.hadoop.ozone.web.ozShell.Shell.main(Shell.java:93)
> Caused by: java.lang.VerifyError: Cannot inherit from final class
>  at java.lang.ClassLoader.defineClass1(Native Method)
>  at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
>  at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>  at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
>  at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
>  at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
>  at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>  at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.(OzoneManagerProtocolClientSideTranslatorPB.java:169)
>  at org.apache.hadoop.ozone.client.rpc.RpcClient.(RpcClient.java:142)
>  ... 20 more
> Couldn't create protocol class org.apache.hadoop.ozone.client.rpc.RpcClient
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For 

[jira] [Resolved] (HDDS-1305) Robot test containers: hadoop client can't access o3fs

2019-07-08 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton resolved HDDS-1305.

Resolution: Duplicate

Thanks to report this issue. It will be fixed in HDDS-1717

 

(Based on the timeline that one is the duplicate, but we have a working patch 
there, so I am closing this one).

> Robot test containers: hadoop client can't access o3fs
> --
>
> Key: HDDS-1305
> URL: https://issues.apache.org/jira/browse/HDDS-1305
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Sandeep Nemuri
>Assignee: Anu Engineer
>Priority: Major
> Attachments: run.log
>
>
> Run the robot test using:
> {code:java}
> ./test.sh --keep --env ozonefs
> {code}
> login to OM container and check if we have desired volume/bucket/key got 
> created with robot tests.
> {code:java}
> [root@o3new ~]$ docker exec -it ozonefs_om_1 /bin/bash
> bash-4.2$ ozone fs -ls o3fs://bucket1.fstest/
> Found 3 items
> -rw-rw-rw-   1 hadoop hadoop  22990 2019-03-15 17:28 
> o3fs://bucket1.fstest/KEY.txt
> drwxrwxrwx   - hadoop hadoop  0 1970-01-01 00:00 
> o3fs://bucket1.fstest/testdir
> drwxrwxrwx   - hadoop hadoop  0 2019-03-15 17:27 
> o3fs://bucket1.fstest/testdir1
> {code}
> {code:java}
> [root@o3new ~]$ docker exec -it ozonefs_hadoop3_1 /bin/bash
> bash-4.4$ hadoop classpath
> /opt/hadoop/etc/hadoop:/opt/hadoop/share/hadoop/common/lib/*:/opt/hadoop/share/hadoop/common/*:/opt/hadoop/share/hadoop/hdfs:/opt/hadoop/share/hadoop/hdfs/lib/*:/opt/hadoop/share/hadoop/hdfs/*:/opt/hadoop/share/hadoop/mapreduce/*:/opt/hadoop/share/hadoop/yarn:/opt/hadoop/share/hadoop/yarn/lib/*:/opt/hadoop/share/hadoop/yarn/*:/opt/ozone/share/ozone/lib/hadoop-ozone-filesystem-lib-current-0.5.0-SNAPSHOT.jar
> bash-4.4$ hadoop fs -ls o3fs://bucket1.fstest/
> 2019-03-18 19:12:42 INFO  Configuration:3204 - Removed undeclared tags:
> 2019-03-18 19:12:42 ERROR OzoneClientFactory:294 - Couldn't create protocol 
> class org.apache.hadoop.ozone.client.rpc.RpcClient exception:
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:291)
>   at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:169)
>   at 
> org.apache.hadoop.fs.ozone.OzoneClientAdapterImpl.(OzoneClientAdapterImpl.java:127)
>   at 
> org.apache.hadoop.fs.ozone.OzoneFileSystem.initialize(OzoneFileSystem.java:189)
>   at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3354)
>   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
>   at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3403)
>   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3371)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:477)
>   at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
>   at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:325)
>   at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:249)
>   at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:232)
>   at 
> org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:104)
>   at org.apache.hadoop.fs.shell.Command.run(Command.java:176)
>   at org.apache.hadoop.fs.FsShell.run(FsShell.java:328)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>   at org.apache.hadoop.fs.FsShell.main(FsShell.java:391)
> Caused by: java.lang.VerifyError: Cannot inherit from final class
>   at java.lang.ClassLoader.defineClass1(Native Method)
>   at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
>   at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at 

[jira] [Created] (HDDS-1764) Fix hidden errors in acceptance tests

2019-07-04 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1764:
--

 Summary: Fix hidden errors in acceptance tests
 Key: HDDS-1764
 URL: https://issues.apache.org/jira/browse/HDDS-1764
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Bharat Viswanadham
Assignee: Elek, Marton


[~bharatviswa] pinged me offline with the problem that in some cases the 
smoketest is failing even if the reports are green:
{code:java}
All smoke tests are passed, but CI is showing as Failed.

https://ci.anzix.net/job/ozone/17284/RobotTests/log.html
https://github.com/apache/hadoop/pull/1048{code}
The root cause is a few typo after HDDS-1698, which can be fixed with the 
uploaded PR.

*What is the problem?*

In case of any error during the test execution the smoketest is failed. In this 
case because the typo in two docker-compose.yaml files two of the tests can't 
be started.

But there is no separated robot test report and the error is visible only in 
the console.

*How did it happen?*

The ACL work improved some intermittency in the acceptance tests. HDDS-1698 is 
committed because the acceptance tests were failed with ACL errors which hide 
the real error (the test was red anyway).

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1763) Use vendor neutral s3 logo in ozone doc

2019-07-04 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1763:
--

 Summary: Use vendor neutral s3 logo in ozone doc
 Key: HDDS-1763
 URL: https://issues.apache.org/jira/browse/HDDS-1763
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: documentation
Reporter: Bharat Viswanadham
Assignee: Elek, Marton


In HDDS-1639 we restructured the ozone documentation and a new overview page is 
added to the main page.

This page contains an official aws logo, As [~bharatviswa] reported we are not 
sure about the exact condition to use logos / trademarks from Amazon. It's 
better to remain on the safe side and use a neutral S3 label.

In this patch the aws logo is replaced with an orange cloud + s3 text.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1747) Support override of configuration annotations

2019-07-01 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1747:
--

 Summary: Support override of configuration annotations
 Key: HDDS-1747
 URL: https://issues.apache.org/jira/browse/HDDS-1747
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Elek, Marton
Assignee: Stephen O'Donnell


To support HDDS-1744 we need a way to override existing configuration defaults. 
For example given a main HttpConfiguration:
{code:java}

public class OzoneHttpServerConfig {

  private int httpBindPort;

  @Config(key = "http-bind-port",
  defaultValue = "9874",
  description =
  "The actual port the web server will listen on for HTTP "
  + "communication. If the "
  + "port is 0 then the server will start on a free port.",
  tags = {ConfigTag.OM, ConfigTag.MANAGEMENT})
  public void setHttpBindPort(int httpBindPort) {
    this.httpBindPort = httpBindPort;
  }
{code}
We need an option to extend  this class and override the default value:
{code:java}
  @ConfigGroup(prefix = "hdds.datanode")
  public static class HttpConfig extends OzoneHttpServerConfig {

    @Override
    @ConfigOverride(defaultValue = "9882")
    public void setHttpBindPort(int httpBindPort) {
  super.setHttpBindPort(httpBindPort);
    }


  }

{code}
The expected behavior is a generated hdds.datanode.http-bind-port where the 
default is 9882.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1744) Improve BaseHttpServer to use typesafe configuration.

2019-07-01 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1744:
--

 Summary: Improve BaseHttpServer to use typesafe configuration.
 Key: HDDS-1744
 URL: https://issues.apache.org/jira/browse/HDDS-1744
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Elek, Marton
Assignee: Stephen O'Donnell


As it's defined in the parent task, we have a new typesafe way to define 
configuration based on annotations instead of constants.

The next step is to replace existing code to use the new code.

In this Jira I propose to improve the 
org.apache.hadoop.hdds.server.BaseHttpServer to use configuration object 
instead of constants.

We need to create a generic configuration object with the right annotation:
{code:java}
public class OzoneHttpServerConfig{
   
   private String httpBindHost

   @Config(key = "http-bind-host",
   defaultValue = "0.0.0.0",
   description = "The actual address the web server will bind to. If "
  + "this optional address is set, it overrides only the hostname"
  + " portion of http-address configuration value.",
  tags = {ConfigTag.OM, ConfigTag.MANAGEMENT})
 public void setHttpBindHost(String httpBindHost) {
    this.httpBindHost = httpBindHost;
 }


}{code}
And we need to extend this basic configuration in all the HttpServer 
implementation:
{code:java}

public class OzoneManagerHttpServer extends BaseHttpServer{
   
   @ConfigGroup(prefix = "ozone.om")
   public static class HttpConfig extends OzoneHttpServerConfig {
 
    @Override
    @ConfigOverride(defaultValue = "9874")
    public void setHttpBindPort(int httpBindPort) {
  super.setHttpBindPort(httpBindPort);
    }

 
   }
}{code}
Note: configuration keys used by HttpServer2 can't be replaced easly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1743) Create service catalog endpoint in the SCM

2019-07-01 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1743:
--

 Summary: Create service catalog endpoint in the SCM
 Key: HDDS-1743
 URL: https://issues.apache.org/jira/browse/HDDS-1743
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
  Components: SCM
Reporter: Elek, Marton
Assignee: Stephen O'Donnell


Based on the the design doc in the parent pom, we need a Service Catalog 
endpoint in the SCM.

 
{code:java}
public interface ServiceRegistry {

   void register(ServiceEndpoint endpoint) throws IOException;

   ServiceEndpoint findEndpoint(String serviceName, int instanceId);

   Collection getAllServices();
}{code}
Where the ServiceEndpoint is something like this:
{code:java}
public class ServiceEndpoint {

  private String host;

  private String ip;

  private ServicePort port;

  private String serviceName;

  private int instanceId;

...

}


public class ServicePort {
   
   private ServiceProtocol protocol;

   private String name;

   private int port;

...

}

public enum ServiceProtocol {
   RPC, HTTP, GRPC
}{code}
The ServiceRegistry may have multiple implementation, but as a first step we 
need a simple implementation which calls a new endpoint on SCM via REST.

The endpoint should persist the data to a local Rocksdb with the help of 
DBStore.

This task is about to create the server and client implementation. In a 
follow-up Jira we can start to use the client on the om/datanode/client side to 
mix the service discovery data with the existing configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1742) Merge ozone-perf and ozonetrace example clusters

2019-07-01 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1742:
--

 Summary: Merge ozone-perf and ozonetrace example clusters
 Key: HDDS-1742
 URL: https://issues.apache.org/jira/browse/HDDS-1742
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: docker
Reporter: Elek, Marton
Assignee: Istvan Fajth


We have multiple example clusters in hadoop-ozone/dist/src/main/compose to 
demonstrate how different type of configuration can be set with ozone.

But some of them can be consolidated. I propose to combine ozonetrace to 
ozoneperf to one ozoneperf which includes all the required components for a 
local performance testing:
 # opentracing (jaeger component in docker-compose + environment variables)
 # monitoring (grafana + prometheus)
 # perf profile (as of now it's enabled only in the ozone cluster[1])

 

[1]
{code:java}
cat compose/ozone/docker-config | grep prof

OZONE-SITE.XML_hdds.profiler.endpoint.enabled=true
ASYNC_PROFILER_HOME=/opt/profiler
{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1741) Fix prometheus configuration in ozoneperf example cluster

2019-07-01 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1741:
--

 Summary: Fix prometheus configuration in ozoneperf example cluster
 Key: HDDS-1741
 URL: https://issues.apache.org/jira/browse/HDDS-1741
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: docker
Affects Versions: 0.4.0
Reporter: Elek, Marton
Assignee: Elek, Marton


HDDS-1216 renamed the ozoneManager components to om in the docker-compose file. 
But the prometheus configuration of the compose/ozoneperf environment is not 
updated.

We need to updated it to get meaningful metrics from om.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1735) Create separate unit and integration test executor dev-support script

2019-06-28 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1735:
--

 Summary: Create separate unit and integration test executor 
dev-support script
 Key: HDDS-1735
 URL: https://issues.apache.org/jira/browse/HDDS-1735
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Elek, Marton
Assignee: Elek, Marton


hadoop-ozone/dev-support/checks directory contains multiple helper script to 
execute different type of testing (findbugs, rat, unit, build).

They easily define how tests should be executed, with the following contract:

 * The problems should be printed out to the console

 * in case of test failure a non zero exit code should be used

 

The tests are working well (in fact I have some experiments with executing 
these scripts on k8s and argo where all the shell scripts are executed 
parallel) but we need some update:

 1. Most important: the unit tests and integration tests can be separated. 
Integration tests are more flaky and it's better to have a way to run only the 
normal unit tests

 2. As HDDS-1115 introduced a pom.ozone.xml it's better to use them instead of 
the magical "am pl hadoop-ozone-dist" trick--

 3. To make it possible to run blockade test in containers we should use - T 
flag with docker-compose

 4. checkstyle violations are printed out to the console



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1725) pv-test example to test csi is not working

2019-06-24 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1725:
--

 Summary: pv-test example to test csi is not working
 Key: HDDS-1725
 URL: https://issues.apache.org/jira/browse/HDDS-1725
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Ratish Maruthiyodan
Assignee: Elek, Marton


[~rmaruthiyodan] reported two problems regarding to the pv-test example in csi 
examples folder.

pv-test folder contains an example nginx deployment which can use an ozone 
PVC/PV to publish content of a folder via http.

Two problems are identified:
 * The label based matching filter of service doesn't point to the nginx 
deployment
 * The configmap mounting is missing from nginx deployment



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1716) Smoketest results are generated with an internal user

2019-06-21 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1716:
--

 Summary: Smoketest results are generated with an internal user
 Key: HDDS-1716
 URL: https://issues.apache.org/jira/browse/HDDS-1716
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Elek, Marton
Assignee: Elek, Marton


[~eyang] reported the problem in HDDS-1609 that the smoketest results are 
generated a user (the user inside the docker container) which can be different 
from the host user.

There is a minimal risk that the test results can be deleted/corrupted by an 
other users if the current user is different from uid=1000

I opened this issue because [~eyang] said me during an offline discussion that 
HDDS-1609 is a more complex issue and not only about the ownership of the test 
results.

I suggest to handle the two problems in different way. With this patch, the 
permission of the test result files can be fixed easily.

In HDDS-1609 we can discuss about general security problems and try to find 
generic solution for them.

Steps to reproduce _this_ the problem:
 # Use a user which is different from uid=1000
 # Create a new ozone build (mvn clean install -f pom.ozone.xml -DskipTests)
 # Go to a compose directory (cd 
hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/)
 # Execute tests (./test.sh)
 # check the ownership of the results (ls -lah ./results)

Current result: the owner of the result files are the user uid=1000

Expected result: the owner of the files should be always the current user (even 
if the current uid is different)

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1715) Update the Intellij runner definitition of SCM to use the new class name

2019-06-21 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1715:
--

 Summary: Update the Intellij runner definitition of SCM to use the 
new class name 
 Key: HDDS-1715
 URL: https://issues.apache.org/jira/browse/HDDS-1715
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Tools
Reporter: Elek, Marton
Assignee: Stephen O'Donnell


HDDS-1622 changed the CLI framework of SCM and with a new additional class 
(StorageContainerMangerStarter) it made it more testable.

But the intellij runner definitions are not (yet) updated to use the new class 
name for SCM/SCM-init (they are updated for OM in HDDS-1660).

We need to adjust the main class names in:
{code:java}
hadoop-ozone/dev-support/intellij/runConfigurations/StorageContainerManager.xml
hadoop-ozone/dev-support/intellij/runConfigurations/StorageContainerManagerInit.xml{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1710) Publish JVM metrics via Hadoop metrics

2019-06-20 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1710:
--

 Summary: Publish JVM metrics via Hadoop metrics
 Key: HDDS-1710
 URL: https://issues.apache.org/jira/browse/HDDS-1710
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: om, Ozone Datanode, SCM
Reporter: Elek, Marton
Assignee: Elek, Marton


In ozone metrics can be published with the help of hadoop metrics (for example 
via PrometheusMetricsSink)

The basic jvm metrics are not published by the metrics system (just with JMX)

I am very interested about the basic JVM metrics (gc count, heap memory usage) 
to identify possible problems in the test environment.

Fortunately it's very easy to turn it on with the help of 
org.apache.hadoop.metrics2.source.JvmMetrics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1709) TestScmSafeNode is flaky

2019-06-20 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1709:
--

 Summary: TestScmSafeNode is flaky
 Key: HDDS-1709
 URL: https://issues.apache.org/jira/browse/HDDS-1709
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM, test
Reporter: Elek, Marton
Assignee: Elek, Marton


org.apache.hadoop.ozone.om.TestScmSafeMode.testSCMSafeMode is failed at last 
night with the following error:
{code:java}
java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at 
org.junit.Assert.assertTrue(Assert.java:41) at 
org.junit.Assert.assertTrue(Assert.java:52) at 
org.apache.hadoop.ozone.om.TestScmSafeMode.testSCMSafeMode(TestScmSafeMode.java:285)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74){code}
Locally it can be tested but it's very easy to reproduce by adding an 
additional sleep DataNodeSafeModeRule:
{code:java}
+++ 
b/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/DataNodeSafeModeRule.java
@@ -63,7 +63,11 @@ protected boolean validate() {
 
   @Override
   protected void process(NodeRegistrationContainerReport reportsProto) {
-
+    try {
+  Thread.sleep(3000);
+    } catch (InterruptedException e) {
+  e.printStackTrace();
+    }{code}
This is a clear race condition:

DatanodeSafeModeRule and ContainerSafeModeRule are processing the same events 
but it can be possible (in case of an accidental sleep) that the container safe 
mode rule is done, but DatanodeSafeModeRule didn't process the new event (yet).

As a result the test execution will continue:
{code:java}
GenericTestUtils
.waitFor(() -> scm.getCurrentContainerThreshold() == 1.0, 100, 2);
{code}
(This line is waiting ONLY for the ContainerSafeModeRule).

The fix is easy, let's wait for the processing of all the async events:
{code:java}
EventQueue eventQueue =
(EventQueue) cluster.getStorageContainerManager().getEventQueue();
eventQueue.processAll(5000L);{code}
As we are sure that the events are already sent to the EventQueue (because we 
have the previous waitFor), it should be enough.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1694) TestNodeReportHandler is failing with NPE

2019-06-18 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton resolved HDDS-1694.

Resolution: Fixed

> TestNodeReportHandler is failing with NPE
> -
>
> Key: HDDS-1694
> URL: https://issues.apache.org/jira/browse/HDDS-1694
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {code:java}
> FAILURE in 
> ozone-unit-076618677d39x4h9/unit/hadoop-hdds/server-scm/org.apache.hadoop.hdds.scm.node.TestNodeReportHandler.txt
> ---
> Test set: org.apache.hadoop.hdds.scm.node.TestNodeReportHandler
> ---
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.43 s <<< 
> FAILURE! - in org.apache.hadoop.hdds.scm.node.TestNodeReportHandler
> testNodeReport(org.apache.hadoop.hdds.scm.node.TestNodeReportHandler)  Time 
> elapsed: 0.288 s  <<< ERROR!
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.hdds.scm.node.SCMNodeManager.(SCMNodeManager.java:122)
>     at 
> org.apache.hadoop.hdds.scm.node.TestNodeReportHandler.resetEventCollector(TestNodeReportHandler.java:53)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>     at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>     at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>     at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>     at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>     at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>     at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>     at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>     at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>     at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>     at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>     at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>     at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> 2019-06-16 23:52:29,345 INFO  node.SCMNodeManager 
> (SCMNodeManager.java:(119)) - Entering startup safe mode.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1698) Switch to use apache/ozone-runner in the compose/Dockerfile

2019-06-17 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1698:
--

 Summary: Switch to use apache/ozone-runner in the 
compose/Dockerfile
 Key: HDDS-1698
 URL: https://issues.apache.org/jira/browse/HDDS-1698
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: docker
Reporter: Elek, Marton
Assignee: Elek, Marton


Since HDDS-1634 we have an ozone specific runner image to run ozone with 
docker-compose based pseudo clusters.

As the new apache/ozone-runner image is uploaded to the dockerhub we can switch 
our scripts and use the new image.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1694) TestNodeReportHandler is failing with NPE

2019-06-17 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1694:
--

 Summary: TestNodeReportHandler is failing with NPE
 Key: HDDS-1694
 URL: https://issues.apache.org/jira/browse/HDDS-1694
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Reporter: Elek, Marton
Assignee: Elek, Marton


{code:java}
FAILURE in 
ozone-unit-076618677d39x4h9/unit/hadoop-hdds/server-scm/org.apache.hadoop.hdds.scm.node.TestNodeReportHandler.txt
---
Test set: org.apache.hadoop.hdds.scm.node.TestNodeReportHandler
---
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.43 s <<< 
FAILURE! - in org.apache.hadoop.hdds.scm.node.TestNodeReportHandler
testNodeReport(org.apache.hadoop.hdds.scm.node.TestNodeReportHandler)  Time 
elapsed: 0.288 s  <<< ERROR!
java.lang.NullPointerException
    at 
org.apache.hadoop.hdds.scm.node.SCMNodeManager.(SCMNodeManager.java:122)
    at 
org.apache.hadoop.hdds.scm.node.TestNodeReportHandler.resetEventCollector(TestNodeReportHandler.java:53)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
    at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
    at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
    at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
    at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
    at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
    at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
    at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
    at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
    at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
    at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
    at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
    at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)

2019-06-16 23:52:29,345 INFO  node.SCMNodeManager 
(SCMNodeManager.java:(119)) - Entering startup safe mode.

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1682) TestEventWatcher.testMetrics is flaky

2019-06-13 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1682:
--

 Summary: TestEventWatcher.testMetrics is flaky
 Key: HDDS-1682
 URL: https://issues.apache.org/jira/browse/HDDS-1682
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Elek, Marton
Assignee: Elek, Marton


TestEventWatcher is intermittent. (Failed twice out of 44 executions).

Error is:

{code}
Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.764 s <<< 
FAILURE! - in org.apache.hadoop.hdds.server.events.TestEventWatcher
testMetrics(org.apache.hadoop.hdds.server.events.TestEventWatcher)  Time 
elapsed: 2.384 s  <<< FAILURE!
java.lang.AssertionError: expected:<2> but was:<3>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.hdds.server.events.TestEventWatcher.testMetrics(TestEventWatcher.java:197)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{code}

In the test we do the following:

 1. fire start-event1
 2. fire start-event2
 3. fire start-event3
 4. fire end-event1
 5. wait

Usually the event2 and event3 are timed out and event1 is completed but in case 
of an accidental time between 3 and 4 (in fact between 1 and 4) the event1 also 
can be timed out.

I improved the unit test and fixed the metrics calculation (completed message 
should be incremented only if it's not yet timed out).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1680) Create missing parent directories during the creation of HddsVolume dirs

2019-06-13 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1680:
--

 Summary: Create missing parent directories during the creation of 
HddsVolume dirs
 Key: HDDS-1680
 URL: https://issues.apache.org/jira/browse/HDDS-1680
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Elek, Marton
Assignee: Elek, Marton


I started to execute all the unit tests continuously (in kubernetes with argo 
workflow).

Until now I got the following failures (number of failures / unit test name):

```
  1 org.apache.hadoop.fs.ozone.contract.ITestOzoneContractMkdir
  1 org.apache.hadoop.fs.ozone.contract.ITestOzoneContractRename
  3 
org.apache.hadoop.hdds.scm.container.placement.algorithms.TestSCMContainerPlacementRackAware
 31 org.apache.hadoop.ozone.container.common.TestDatanodeStateMachine
 31 org.apache.hadoop.ozone.container.common.volume.TestVolumeSet
  1 org.apache.hadoop.ozone.freon.TestDataValidateWithSafeByteOperations
```

TestVolumeSet is also failed locally:

{code}
2019-06-13 14:23:18,637 ERROR volume.VolumeSet 
(VolumeSet.java:initializeVolumeSet(184)) - Failed to parse the storage 
location: 
/home/elek/projects/hadoop/hadoop-hdds/container-service/target/test-dir/dfs
java.io.IOException: Cannot create directory 
/home/elek/projects/hadoop/hadoop-hdds/container-service/target/test-dir/dfs/hdds
at 
org.apache.hadoop.ozone.container.common.volume.HddsVolume.initialize(HddsVolume.java:208)
at 
org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:179)
at 
org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:72)
at 
org.apache.hadoop.ozone.container.common.volume.HddsVolume$Builder.build(HddsVolume.java:156)
at 
org.apache.hadoop.ozone.container.common.volume.VolumeSet.createVolume(VolumeSet.java:311)
at 
org.apache.hadoop.ozone.container.common.volume.VolumeSet.initializeVolumeSet(VolumeSet.java:165)
at 
org.apache.hadoop.ozone.container.common.volume.VolumeSet.(VolumeSet.java:130)
at 
org.apache.hadoop.ozone.container.common.volume.VolumeSet.(VolumeSet.java:109)
at 
org.apache.hadoop.ozone.container.common.volume.TestVolumeSet.testFailVolumes(TestVolumeSet.java:232)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}

The problem here is that the parent directory of the volume dir is missing. I 
propose to use hddsRootDir.mkdirs() instead of hddsRootDir.mkdir() which 
creates the missing parent directories.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1678) Default image name for kubernetes examples should be ozone and not hadoop

2019-06-13 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1678:
--

 Summary: Default image name for kubernetes examples should be 
ozone and not hadoop
 Key: HDDS-1678
 URL: https://issues.apache.org/jira/browse/HDDS-1678
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: docker
Reporter: Elek, Marton
Assignee: Elek, Marton


During the build the kubernetes example files are adjusted to use a specific 
docker image name.

By default it should be the apache/ozone:${VERSION} to make it possible to use 
the examples without any build from the release artifact. With the examples of 
the release artifact the user can use the latest released 
apache/ozone:${VERSION} from docker hub.

For development build the image can be set with -Ddocker.image (or 
-Dozone.docker.image with HDDS-1667).

Unfortunately -- due to a small typo -- apace/hadoop image is used by default 
instead of apache/ozone.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1677) Auditparser robot test shold use a world writable working directory

2019-06-13 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1677:
--

 Summary: Auditparser robot test shold use a world writable working 
directory
 Key: HDDS-1677
 URL: https://issues.apache.org/jira/browse/HDDS-1677
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Elek, Marton
Assignee: Elek, Marton


When I tried to reproduce a problem which is reported by [~eyang], I found that 
the auditparser robot test uses the /opt/hadoop directory as a working 
directory to generate the audit.db export.

/opt/hadoop is may or may not be writable, it's better to use /tmp instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1674) Make ScmBlockLocationProtocol message type based

2019-06-12 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1674:
--

 Summary: Make ScmBlockLocationProtocol message type based
 Key: HDDS-1674
 URL: https://issues.apache.org/jira/browse/HDDS-1674
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Elek, Marton
Assignee: Shweta


Most of the Ozone protocols are "message type based" and not "method based".

For example in OzoneManagerProtocol.proto there is only one method:

{code}
service OzoneManagerService {
// A client-to-OM RPC to send client requests to OM Ratis server
rpc submitRequest(OMRequest)
  returns(OMResponse);
}
{code}

And the exact method is determined by the type of the message:

{code}

message OMResponse {
  required Type cmdType = 1; // Type of the command

  // A string that identifies this command, we generate  Trace ID in Ozone
  // frontend and this allows us to trace that command all over ozone.
  optional string traceID = 2;

  optional bool success = 3 [default=true];

  optional string message = 4;

  required Status status = 5;

  optional string leaderOMNodeId = 6;

  optional CreateVolumeResponse  createVolumeResponse  = 11;
  optional SetVolumePropertyResponse setVolumePropertyResponse = 12;
  optional CheckVolumeAccessResponse checkVolumeAccessResponse = 13;

}


enum Type {
  CreateVolume = 11;
  SetVolumeProperty = 12;
  CheckVolumeAccess = 13;
  InfoVolume = 14;
  DeleteVolume = 15;
  ListVolume = 16;

{code}

This is not the most natural way to use protobuf services but it has the 
additional benefit that we can propagate traceId / exception in a common way.

Earlier there was an agreement to modify all the protocols to use this "message 
type based" approach to make it possible to provide proper error handling.

In this issue  the ScmBlockLocationProtocol.proto should be modified to use 
only one message:

{code}
service ScmBlockLocationProtocolService {

  rpc send (SCMBlockLocationRequest) returns (SCMBlockLocationResponse);
}
{code}

It also requires to create the common request and response objects (with the 
common fields like type, traceId, success, message, status as they are used in 
the OzoneManagerProtocol.proto).

To make it work, the ScmBlockLocationProtocolClientSideTranslatorPB and the 
ScmBlockLocationProtocolServerSideTranslatorPB should be improved to 
wrap/unwrap the original message to/from the generic message. 

I propose to only the protocol change here (if possible) we can keep the 
message/status fields empty and fix the error propagation in HDDS-1258





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDDS-1659) Define the process to add proposal/design docs to the Ozone subproject

2019-06-12 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton reopened HDDS-1659:


> Define the process to add proposal/design docs to the Ozone subproject
> --
>
> Key: HDDS-1659
> URL: https://issues.apache.org/jira/browse/HDDS-1659
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> We think that it would be more effective to collect all the design docs in 
> one place and make it easier to review them by the community.
> We propose to follow an approach where the proposals are committed to the 
> hadoop-hdds/docs project and the review can be the same as a review of a PR



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1673) mapreduce smoketests are failig because an acl error

2019-06-12 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1673:
--

 Summary: mapreduce smoketests are failig because an acl error
 Key: HDDS-1673
 URL: https://issues.apache.org/jira/browse/HDDS-1673
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Elek, Marton


After executing this command:

{code}
yarn jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0.jar pi 3 3 2
{code}

The result is:

{code}
Number of Maps  = 3
Samples per Map = 3
2019-06-12 03:16:20 ERROR OzoneClientFactory:294 - Couldn't create protocol 
class org.apache.hadoop.ozone.client.rpc.RpcClient exception: 
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:291)
at 
org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:169)
at 
org.apache.hadoop.fs.ozone.BasicOzoneClientAdapterImpl.(BasicOzoneClientAdapterImpl.java:134)
at 
org.apache.hadoop.fs.ozone.OzoneClientAdapterImpl.(OzoneClientAdapterImpl.java:50)
at 
org.apache.hadoop.fs.ozone.OzoneFileSystem.createAdapter(OzoneFileSystem.java:103)
at 
org.apache.hadoop.fs.ozone.BasicOzoneFileSystem.initialize(BasicOzoneFileSystem.java:143)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:227)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:463)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:522)
at 
org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:275)
at 
org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:360)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at 
org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:368)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: org.apache.hadoop.hdds.conf.ConfigurationException: Can't inject 
configuration to class 
org.apache.hadoop.ozone.security.acl.OzoneAclConfig.setUserDefaultRights
at 
org.apache.hadoop.hdds.conf.OzoneConfiguration.getObject(OzoneConfiguration.java:160)
at 
org.apache.hadoop.ozone.client.rpc.RpcClient.(RpcClient.java:148)
... 36 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hdds.conf.OzoneConfiguration.getObject(OzoneConfiguration.java:137)
... 37 more
Caused by: java.lang.NullPointerException: Name is null
at java.lang.Enum.valueOf(Enum.java:236)
at 
org.apache.hadoop.ozone.security.acl.IAccessAuthorizer$ACLType.valueOf(IAccessAuthorizer.java:48)
at 
org.apache.hadoop.ozone.security.acl.OzoneAclConfig.setUserDefaultRights(OzoneAclConfig.java:43)
... 42 more
java.io.IOException: 

[jira] [Created] (HDDS-1669) SCM startup is failing if network-topology-default.xml is part of a jar

2019-06-11 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1669:
--

 Summary: SCM startup is failing if network-topology-default.xml is 
part of a jar
 Key: HDDS-1669
 URL: https://issues.apache.org/jira/browse/HDDS-1669
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Elek, Marton
Assignee: Elek, Marton


network-topology-default.xml can be loaded from file or classpath. But the 
NodeSchemaLoader assumes that the files on the classpath can be opened as a 
file. It's true if the file is in etc/hadoop (which is part of the classpath) 
but not true if the file is packaged to a jajr file:

{code}
scm_1 | 2019-06-11 13:18:03 INFO  NodeSchemaLoader:118 - Loading file 
from 
jar:file:/opt/hadoop/share/ozone/lib/hadoop-hdds-common-0.5.0-SNAPSHOT.jar!/network-topology-default.xml
scm_1 | 2019-06-11 13:18:03 ERROR NodeSchemaManager:74 - Failed to load 
schema file:network-topology-default.xml, error:
scm_1 | java.lang.IllegalArgumentException: URI is not hierarchical
scm_1 | at java.io.File.(File.java:418)
scm_1 | at 
org.apache.hadoop.hdds.scm.net.NodeSchemaLoader.loadSchemaFromFile(NodeSchemaLoader.java:119)
scm_1 | at 
org.apache.hadoop.hdds.scm.net.NodeSchemaManager.init(NodeSchemaManager.java:67)
scm_1 | at 
org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.(NetworkTopologyImpl.java:63)
scm_1 | at 
org.apache.hadoop.hdds.scm.server.StorageContainerManager.initializeSystemManagers(StorageContainerManager.java:382)
scm_1 | at 
org.apache.hadoop.hdds.scm.server.StorageContainerManager.(StorageContainerManager.java:275)
scm_1 | at 
org.apache.hadoop.hdds.scm.server.StorageContainerManager.(StorageContainerManager.java:208)
scm_1 | at 
org.apache.hadoop.hdds.scm.server.StorageContainerManager.createSCM(StorageContainerManager.java:586)
scm_1 | at 
org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter$SCMStarterHelper.start(StorageContainerManagerStarter.java:139)
scm_1 | at 
org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.startScm(StorageContainerManagerStarter.java:115)
scm_1 | at 
org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:67)
scm_1 | at 
org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:42)
scm_1 | at picocli.CommandLine.execute(CommandLine.java:1173)
scm_1 | at picocli.CommandLine.access$800(CommandLine.java:141)
scm_1 | at 
picocli.CommandLine$RunLast.handle(CommandLine.java:1367)
scm_1 | at 
picocli.CommandLine$RunLast.handle(CommandLine.java:1335)
scm_1 | at 
picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243)
scm_1 | at 
picocli.CommandLine.parseWithHandlers(CommandLine.java:1526)
scm_1 | at 
picocli.CommandLine.parseWithHandler(CommandLine.java:1465)
scm_1 | at 
org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65)
scm_1 | at 
org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56)
scm_1 | at 
org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.main(StorageContainerManagerStarter.java:56)
scm_1 | Failed to load schema file:network-topology-default.xml, error:
{code}

The quick fix is to keep the current behaviour but read the file from 
classloader.getResourceAsStream() instead of classloader.getResource().toURI()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1668) Add liveness probe to the example k8s resources files

2019-06-11 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1668:
--

 Summary: Add liveness probe to the example k8s resources files
 Key: HDDS-1668
 URL: https://issues.apache.org/jira/browse/HDDS-1668
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Elek, Marton
Assignee: Elek, Marton


In kubernetes resources we can define livebess probes which can help to detect 
any failure. If the define port is not available the pod will be rescheduled.

We need to add the liveness probes to our k8s resource files.

Note: We shouldn't add readiness probes. Readiness probe is about the service 
availability. The service/dns can be available only after the service is 
restarted. This is not good for us as:

 * We need DNS resolution during the startup (See OzoneManager.loadOMHAConfigs)
 * We already implemented retry in case of missing DNS entries



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1622) Use picocli for StorageContainerManager

2019-06-07 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton resolved HDDS-1622.

Resolution: Fixed

> Use picocli for StorageContainerManager
> ---
>
> Key: HDDS-1622
> URL: https://issues.apache.org/jira/browse/HDDS-1622
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Elek, Marton
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Recently we switched to use PicoCli with (almost) all of our daemons (eg. s3 
> Gateway, Freon, etc.)
> PicoCli has better output, it can generate nice help, and easier to use as 
> it's enough to put a few annotations and we don't need to add all the 
> boilerplate code to print out help, etc.
> StorageContainerManager and OzoneManager is not yet  supported. The previous 
> issue was closed HDDS-453 but since then we improved the GenericCli parser 
> (eg. in HDDS-1192), so I think we are ready to move.
> The main idea is to create a starter java similar to 
> org.apache.hadoop.ozone.s3.Gateway and we can start StorageContainerManager 
> from there.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1659) Define the process to add proposal/design docs to the Ozone subproject

2019-06-07 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1659:
--

 Summary: Define the process to add proposal/design docs to the 
Ozone subproject
 Key: HDDS-1659
 URL: https://issues.apache.org/jira/browse/HDDS-1659
 Project: Hadoop Distributed Data Store
  Issue Type: Task
Reporter: Elek, Marton
Assignee: Elek, Marton


We think that it would be more effective to collect all the design docs in one 
place and make it easier to review them by the community.

We propose to follow an approach where the proposals are committed to the 
hadoop-hdds/docs project and the review can be the same as a review of a PR



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1646) Support real persistent in the k8s example files

2019-06-05 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1646:
--

 Summary: Support real persistent in the k8s example files 
 Key: HDDS-1646
 URL: https://issues.apache.org/jira/browse/HDDS-1646
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Elek, Marton
Assignee: Elek, Marton


Ozone release contains example k8s deployment files to make it easier to deploy 
Ozone to kubernetes. As of now we use emptyDir everywhere, we should support 
the configuration of host volumes (hostPath or Local Persistent volumes).

The big question here is the default:

* Make the examples easy to start and ephemeral
* Make the examples more safe, by default (but couldn't be started without 
additional administration).

(Note this conversation is started in the review of HDDS-1508)

Xiaoyu:  Can we support mount hostVolume for datanode daemons?

Marton: Yes, we can.

AFAIK there are two options:
 * using 
[hostPath](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath)
 * or with [Local 
PersistentVolumes](https://kubernetes.io/blog/2018/04/13/local-persistent-volumes-beta/)

The first one requires the knowledge of directory names on the host.
The second one is recommended but it requires the creation of PersistentVolumes 
or install a PersistentVolume provider

I am not sure what is the best approach, my current proposal is:

 * Use empty dir everywhere to make it easier to start simple ozone cluster
 * Provide simple option to turn on any of theses persistence (the kubernetes 
files are generated and the generation can be parametrized)
 * Document how to customize the kubernetes resources files

Summary: it's question of the defaults:
 
  1. Use a complex, but persistent solution, which may not work out of the box  
 
  2. Use a simple, but ephemeral solution (as default)

I started to use (2) but I am open to change.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1640) Reduce the size of recon jar file

2019-06-04 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton resolved HDDS-1640.

Resolution: Fixed

> Reduce the size of recon jar file
> -
>
> Key: HDDS-1640
> URL: https://issues.apache.org/jira/browse/HDDS-1640
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Recon
>Reporter: Elek, Marton
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> hadoop-ozone-recon-0.5.0-SNAPSHOT.jar is 73 MB, mainly because the 
> node_modules are included (full typescript source, eslint, babel, etc.):
> {code}
> unzip -l hadoop-ozone-recon-0.5.0-SNAPSHOT.jar | grep node_modules | wc
> {code}
> Fix me if I am wrong, but I think node_modules is not required in the 
> distribution as the dependencies are already included in the compiled 
> javascript files.
> I propose to remove the node_modules from the jar file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1642) Avoid shell references relative to the current script path

2019-06-04 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1642:
--

 Summary: Avoid shell references relative to the current script path
 Key: HDDS-1642
 URL: https://issues.apache.org/jira/browse/HDDS-1642
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Eric Yang


This is based on the review comment from [~eyang]:

bq. You might need pwd -P to resolve symlinks. I don't recommend to use script 
location to make decision of where binaries are supposed to be because someone 
else can make newbie mistake and refactor your script to invalid the original 
coding intend. See this blog to explain the right way to get the directory of a 
bash script. This is the reason that I used OZONE_HOME as base reference 
frequently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1641) Csi server fails because transitive Netty dependency

2019-06-04 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1641:
--

 Summary: Csi server fails because transitive Netty dependency
 Key: HDDS-1641
 URL: https://issues.apache.org/jira/browse/HDDS-1641
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Elek, Marton
Assignee: Elek, Marton


CSI server can't be started because an ClassNotFound exception.

It turned out that with using the new configuration api we got old netty jar 
files as transitive dependencies. (hdds-configuration depends on hadoop-common, 
hadoop-commons depends on the word)

We should exclude all the old netty version from the classpath of the CSI 
server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDDS-1628) Fix the execution and return code of smoketest executor shell script

2019-06-04 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton reopened HDDS-1628:


> Fix the execution and return code of smoketest executor shell script
> 
>
> Key: HDDS-1628
> URL: https://issues.apache.org/jira/browse/HDDS-1628
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Problem: Some of the smoketest executions were reported to green even if they 
> contained failed tests.
> Root cause: the legacy test executor 
> (hadoop-ozone/dist/src/main/smoketest/test.sh) which just calls the new 
> executor script (hadoop-ozone/dist/src/main/compose/test-all.sh) didn't 
> handle the return code well (the failure of the smoketests should be 
> signalled by the bash return code)
> This patch:
>  * Fixes the error code handling in smoketest/test.sh
>  * Fixes the test execution in compose/test-all.sh (should work from any 
> other directories)
>  * Updates hadoop-ozone/dev-support/checks/acceptance.sh to use the newer 
> test-all.sh executor instead of the old one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1639) Restructure documentation pages for better understanding

2019-06-04 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1639:
--

 Summary: Restructure documentation pages for better understanding
 Key: HDDS-1639
 URL: https://issues.apache.org/jira/browse/HDDS-1639
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Elek, Marton
Assignee: Elek, Marton


Documentation page should be updated according to the recent changes:

In the uploaded PR I modified the following:

 #  Pages are restructured to use a similar structure what is intruced on the 
wiki by [~anu]. (Getting started guides are separated for different 
environments)
 # The width of the menu is increased (to make it more readable)
 # The logo is moved from the main page from the menu (to get more space for 
the menu items)
 # 'Requirements' section is added to each 'Getting started' page
 # Test tools / docker image / kubernetes pages are imported from the wiki. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1636) Tracing id is not propagated via async datanode grpc call

2019-06-03 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1636:
--

 Summary: Tracing id is not propagated via async datanode grpc call
 Key: HDDS-1636
 URL: https://issues.apache.org/jira/browse/HDDS-1636
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Elek, Marton


Recently a new exception become visible in the datanode logs, using standard 
freon (STANDLAONE)

{code}
datanode_2  | 2019-06-03 12:18:21 WARN  
PropagationRegistry$ExceptionCatchingExtractorDecorator:60 - Error when 
extracting SpanContext from carrier. Handling gracefully.
datanode_2  | 
io.jaegertracing.internal.exceptions.MalformedTracerStateStringException: 
String does not match tracer state format: 
7576cabf-37a4-4232-9729-939a3fdb68c4WriteChunk150a8a848a951784256ca0801f7d9cf8b_stream_ed583cee-9552-4f1a-8c77-63f7d07b755f_chunk_1
datanode_2  |   at 
org.apache.hadoop.hdds.tracing.StringCodec.extract(StringCodec.java:49)
datanode_2  |   at 
org.apache.hadoop.hdds.tracing.StringCodec.extract(StringCodec.java:34)
datanode_2  |   at 
io.jaegertracing.internal.PropagationRegistry$ExceptionCatchingExtractorDecorator.extract(PropagationRegistry.java:57)
datanode_2  |   at 
io.jaegertracing.internal.JaegerTracer.extract(JaegerTracer.java:208)
datanode_2  |   at 
io.jaegertracing.internal.JaegerTracer.extract(JaegerTracer.java:61)
datanode_2  |   at 
io.opentracing.util.GlobalTracer.extract(GlobalTracer.java:143)
datanode_2  |   at 
org.apache.hadoop.hdds.tracing.TracingUtil.importAndCreateScope(TracingUtil.java:102)
datanode_2  |   at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:148)
datanode_2  |   at 
org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:73)
datanode_2  |   at 
org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:61)
datanode_2  |   at 
org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:248)
datanode_2  |   at 
org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
datanode_2  |   at 
org.apache.ratis.thirdparty.io.grpc.Contexts$ContextualizedServerCallListener.onMessage(Contexts.java:76)
datanode_2  |   at 
org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
datanode_2  |   at 
org.apache.hadoop.hdds.tracing.GrpcServerInterceptor$1.onMessage(GrpcServerInterceptor.java:46)
datanode_2  |   at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:263)
datanode_2  |   at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:686)
datanode_2  |   at 
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
datanode_2  |   at 
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
datanode_2  |   at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
datanode_2  |   at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
{code}

It turned out that the tracingId propagation between XCeiverClient and Server 
doesn't work very well (in case of Standalone and async commands)

 1. there are many places (on the client side) where the traceId filled with  
UUID.randomUUID().toString();  
 2. This random id is propagated between the Output/InputStream and different 
part of the clients
 3. It is unnecessary, because in the XceiverClientGrpc and XceiverClientGrpc 
the traceId field is overridden with the real opentracing id anyway 
(sendCommand/sendCommandAsync)
 4. Except in the XceiverClientGrpc.sendCommandAsync where this part is 
accidentally missing.

Things to fix:

 1. fix XceiverClientGrpc.sendCommandAsync (replace any existing traceId with 
the good one)
 2. remove the usage of the UUID based traceId (it's not used)
 3. Improve the error logging in case of an invalid traceId on the server side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1635) Maintain docker entrypoint and envtoconf inside ozone project

2019-06-03 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1635:
--

 Summary: Maintain docker entrypoint and envtoconf inside ozone 
project
 Key: HDDS-1635
 URL: https://issues.apache.org/jira/browse/HDDS-1635
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Elek, Marton
Assignee: Elek, Marton


During an offline discussion with [~eyang] and [~arp], Eric suggested to 
maintain the source of the docker specific start images inside the main ozone 
branch (trunk) instead of the branch of the docker image.

With this approach the ozone-runner image can be a very lightweight image and 
the entrypoint logic can be versioned together with the ozone itself.

An other use case is a container creation script. Recently we 
[documented|https://cwiki.apache.org/confluence/display/HADOOP/Ozone+Docker+images]
 that hadoop-runner/ozone-runner/ozone images are not for production (for 
example because they contain development tools).

We can create a helper tool (similar what Spark provides) to create Ozone 
container images from any production ready base image. But this tool requires 
the existence of the scripts inside the distribution.

(ps: I think sooner or later the functionality of envtoconf.py can be added to 
the OzoneConfiguration java class and we can parse the configuration values 
directly from environment variables.

In this patch I copied the required scripts to the ozone source tree and the 
new ozone-runner image (HDDS-1634) is designed to use it from this specific 
location.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1634) Introduce a new ozone specific runner image

2019-06-03 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1634:
--

 Summary: Introduce a new ozone specific runner image
 Key: HDDS-1634
 URL: https://issues.apache.org/jira/browse/HDDS-1634
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Elek, Marton
Assignee: Elek, Marton


Ozone compose files use apache/hadoop-runner to provide a fixed environment to 
run any Ozone distribution.

 It can be better to use separated hadoop-runner and ozone-runner:

 1. To make it easier to include Ozone specific behaviour (For example goofys 
install, scm/om initialization)
 2. To make it clean which feature is required by all the subprojects of Hadoop 
and which one is Ozone specific (base on the comment from [~eyang] in 
HADOOP-16092)
 3. for hadoop-runner we maintain two tags (jdk11/jdk8/latest). And it seems to 
be hard to maintain all of them. jdk8 is required only for hadoop and with 
separating hadoop-runner/ozone-runner we can use only one simple branch for 
ozone-runner development (and we can create incremental fixed tags very easily)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1633) Update rat from 0.12 to 0.13 in hadoop-runner build script

2019-06-03 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1633:
--

 Summary: Update rat from 0.12 to 0.13 in hadoop-runner build script
 Key: HDDS-1633
 URL: https://issues.apache.org/jira/browse/HDDS-1633
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Elek, Marton
Assignee: Elek, Marton


We have a new rat, the old one is not available. The url should be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1632) Make the hadoop home word readble and avoid sudo in hadoop-runner

2019-06-03 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1632:
--

 Summary: Make the hadoop home word readble and avoid sudo in 
hadoop-runner
 Key: HDDS-1632
 URL: https://issues.apache.org/jira/browse/HDDS-1632
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Elek, Marton
Assignee: Elek, Marton


[~eyang] reporeted in HDDS-1609 that the hadoop-runner image can be started 
*without* mounting a real hadoop (usually, it's ounted) AND using a different 
uid:

{code}
docker run -it  -u $(id -u):$(id -g) apache/hadoop-runner bash
docker: Error response from daemon: OCI runtime create failed: 
container_linux.go:345: starting container process caused "chdir to cwd 
(\"/opt/hadoop\") set in config.json failed: permission denied": unknown.
{code}

There are two blocking problems here:

 * the /opt/hadoop directory (which is the CWD inside the container) is 700 
instead of 755
 * The usage of sudo in started scripts (sudo is not possible if the real user 
is not added to the /etc/passwd)

Both of them are addressed by this patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1631) Fix auditparser smoketests

2019-06-03 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1631:
--

 Summary: Fix auditparser smoketests
 Key: HDDS-1631
 URL: https://issues.apache.org/jira/browse/HDDS-1631
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Elek, Marton
Assignee: Elek, Marton


In HDDS-1518 we modified the location of the var and config files inside the 
container.

There are three problems with the current auditparser smokest:

 1. The default audit log4j files are not part of the new config directory 
(fixed with HDDS-1630)
 2. The smoketest is executed in scm container instead of om
 3. The log directory is hard coded

The 2 and 3 will be fined in this patch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1630) Copy default configuration files to the writeable directory

2019-06-03 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1630:
--

 Summary: Copy default configuration files to the writeable 
directory
 Key: HDDS-1630
 URL: https://issues.apache.org/jira/browse/HDDS-1630
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Elek, Marton
Assignee: Elek, Marton


HDDS-1518 separates the read-only directories (/opt/ozone, /opt/hadoop) from 
the read-write directories (/etc/hadoop, /var/log/hadoop). 

The configuration directory and log directory should be writeable and to make 
it easier to run the docker-compose based pseudo clusters with *different* host 
uid we started to use different config dir.

But we need all the defaults in the configuration dir. In this patch I add a 
small fragments to the hadoop-runner image to copy the default files (if 
available).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDDS-1597) Remove hdds-server-scm dependency from ozone-common

2019-06-03 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton reopened HDDS-1597:


> Remove hdds-server-scm dependency from ozone-common
> ---
>
> Key: HDDS-1597
> URL: https://issues.apache.org/jira/browse/HDDS-1597
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1
>
> Attachments: ozone-dependency.png
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> I noticed that the hadoop-ozone/common project depends on 
> hadoop-hdds-server-scm project.
> The common projects are designed to be a shared artifacts between client and 
> server side. Adding additional dependency to the common pom means that the 
> dependency will be available for all the clients as well.
> (See the attached artifact about the current, desired structure).
> We definitely don't need scm server dependency on the client side.
> The code dependency is just one class (ScmUtils) and the shared code can be 
> easily moved to the common.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1629) Tar file creation can be option for non-dist builds

2019-06-03 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1629:
--

 Summary: Tar file creation can be option for non-dist builds
 Key: HDDS-1629
 URL: https://issues.apache.org/jira/browse/HDDS-1629
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Elek, Marton
Assignee: Elek, Marton


Ozone tar file creation is a very time consuming step. I propose to make it 
optional and create the tar file only if the dist profile is enabled (-Pdist)

The tar file is not required to test ozone as the same content is available 
from hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT which is enough to run 
docker-compose pseudo clusters, smoketests. 

If it's required, the tar file creation can be requested by the dist profile.
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1628) Fix the execution and retur code of smoketest executor shell script

2019-06-03 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1628:
--

 Summary: Fix the execution and retur code of smoketest executor 
shell script
 Key: HDDS-1628
 URL: https://issues.apache.org/jira/browse/HDDS-1628
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Elek, Marton


Problem: Some of the smoketest executions were reported to green even if they 
contained failed tests.

Root cause: the legacy test executor 
(hadoop-ozone/dist/src/main/smoketest/test.sh) which just calls the new 
executor script (hadoop-ozone/dist/src/main/compose/test-all.sh) didn't handle 
the return code well (the failure of the smoketests should be signalled by the 
bash return code)

This patch:
 * Fixes the error code handling in smoketest/test.sh
 * Fixes the test execution in compose/test-all.sh (should work from any other 
directories)
 * Updates hadoop-ozone/dev-support/checks/acceptance.sh to use the newer 
test-all.sh executor instead of the old one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1627) Make the version of the used hadoop-runner configurable

2019-06-02 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1627:
--

 Summary: Make the version of the used hadoop-runner configurable
 Key: HDDS-1627
 URL: https://issues.apache.org/jira/browse/HDDS-1627
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Elek, Marton


During an offline discussion with [~arp] and [~eyang] we agreed that it could 
be more safe to fix the tag of the used hadoop-runner images during the 
releases.

It also requires fix tags from hadoop-runner, but after that it's possible to 
use the fixed tags.

This patch makes it possible to define the required version/tag in pom.xml

 1. the default hadoop-runner.version is added to all .env files  during the 
build
 2. If a variable is added to the .env, it can be used from docker-compose 
files AND can be overridden by environment variables (it makes it possible to 
define custom version during a local run) 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1622) Use picocli for StorageContainerManager

2019-05-31 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1622:
--

 Summary: Use picocli for StorageContainerManager
 Key: HDDS-1622
 URL: https://issues.apache.org/jira/browse/HDDS-1622
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Elek, Marton
Assignee: Stephen O'Donnell


Recently we switched to use PicoCli with (almost) all of our daemons (eg. s3 
Gateway, Freon, etc.)

PicoCli has better output, it can generate nice help, and easier to use as it's 
enough to put a few annotations and we don't need to add all the boilerplate 
code to print out help, etc.

StorageContainerManager and OzoneManager is not yet  supported. The previous 
issue was closed HDDS-453 but since then we improved the GenericCli parser (eg. 
in HDDS-1192), so I think we are ready to move.

The main idea is to create a starter java similar to 
org.apache.hadoop.ozone.s3.Gateway and we can start StorageContainerManager 
from there.


 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1607) Create smoketest for non-secure mapreduce example

2019-05-29 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1607:
--

 Summary: Create smoketest for non-secure mapreduce example
 Key: HDDS-1607
 URL: https://issues.apache.org/jira/browse/HDDS-1607
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Elek, Marton
Assignee: Elek, Marton


We had multiple problems earlier with the classpath separation and the internal 
ozonefs classloader. Before fixing all the issues I propose to create a 
smoketest to detect if the classpath separation is broken again .

As a first step I created an smoketest/ozone-mr environment (based on the  work 
of [~xyao], which is secure) and a smoketest 

Possible follow-up works:

 * Adapt the test.sh for the ozonesecure-mr
 * Include test runs with older hadoop versions 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1598) Fix Ozone checkstyle issues on trunk

2019-05-27 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1598:
--

 Summary: Fix Ozone checkstyle issues on trunk
 Key: HDDS-1598
 URL: https://issues.apache.org/jira/browse/HDDS-1598
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Elek, Marton
Assignee: Elek, Marton


Some small checkstyle issues are accidentally committed with HDDS-700.

Trivial fixes are coming here...




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1597) Remove hdds-server-scm dependency from ozone-common

2019-05-27 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1597:
--

 Summary: Remove hdds-server-scm dependency from ozone-common
 Key: HDDS-1597
 URL: https://issues.apache.org/jira/browse/HDDS-1597
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Elek, Marton
Assignee: Elek, Marton


I noticed that the hadoop-ozone/common project depends on 
hadoop-hdds-server-scm project.

The common projects are designed to be a shared artifacts between client and 
server side. Adding additional dependency to the common pom means that the 
dependency will be available for all the clients as well.

We definitely don't need scm server dependency on the client side.

The code dependency is just one class (ScmUtils) and the shared code can be 
easily moved to the common.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1596) Create service endpoint to download configuration from SCM

2019-05-27 Thread Elek, Marton (JIRA)
Elek, Marton created HDDS-1596:
--

 Summary: Create service endpoint to download configuration from SCM
 Key: HDDS-1596
 URL: https://issues.apache.org/jira/browse/HDDS-1596
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Elek, Marton
Assignee: Elek, Marton


As written in the design doc (see the parent issue) it was proposed to download 
the configuration from the scm by the other services.

I propose to create a separated endpoint to provide the ozone configuration. 
/conf can't be used as it contains *all* the configuration and we need only the 
modified configuration.

The easiest way to implement this feature is:

 * Create a simple rest endpoint which publishes all the configuration
 * Download the configurations to $HADOOP_CONF_DIR/ozone-global.xml during the 
service startup.
 * Add ozone-global.xml as an additional config source (before ozone-site.xml 
but after ozone-default.xml)
 * The download can be optional

With this approach we keep the support of the existing manual configuration 
(ozone-site.xml has higher priority) but we can download the configuration to a 
separated file during the startup, which will be loaded.

There is no magic: the configuration file is saved and it's easy to debug 
what's going on as the OzoneConfiguration is loaded from the $HADOOP_CONF_DIR 
as before.

Possible follow-up steps:

 * Migrate all the other services (recon, s3g) to the new approach. (possible 
newbie jiras)
 * Improve the CLI to define the SCM address. (As of now we use ozone.scm.names)
 * Create a service/hostname registration mechanism and autofill some of the 
configuration based on the topology information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1565) Rename k8s-dev and k8s-dev-push profiles to docker-build and docker-push

2019-05-22 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton resolved HDDS-1565.

Resolution: Fixed

> Rename k8s-dev and k8s-dev-push profiles to docker-build and docker-push
> 
>
> Key: HDDS-1565
> URL: https://issues.apache.org/jira/browse/HDDS-1565
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Based on the feedback from [~eyang] I realized that the names of the k8s-dev 
> and k8s-dev-push profiles are not expressive enough as the created containers 
> can be used not only for kubernetes but can be used together with any other 
> container orchestrator.
> I propose to rename them to docker-build/docker-push.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



  1   2   3   4   5   >