[jira] [Created] (HDDS-2178) Support Ozone insight tool in secure cluster
Elek, Marton created HDDS-2178: -- Summary: Support Ozone insight tool in secure cluster Key: HDDS-2178 URL: https://issues.apache.org/jira/browse/HDDS-2178 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Elek, Marton SPNEGO should initialized properly for the HTTP requests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2167) Hadoop31-mr acceptance test is failing due to the shading
Elek, Marton created HDDS-2167: -- Summary: Hadoop31-mr acceptance test is failing due to the shading Key: HDDS-2167 URL: https://issues.apache.org/jira/browse/HDDS-2167 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Elek, Marton Assignee: Elek, Marton >From the daily build: {code} Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/ozone/shaded/org/apache/http/client/utils/URIBuilder at org.apache.hadoop.fs.ozone.BasicOzoneFileSystem.initialize(BasicOzoneFileSystem.java:138) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361) at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:325) at org.apache.hadoop.fs.shell.CommandWithDestination.getRemoteDestination(CommandWithDestination.java:195) at org.apache.hadoop.fs.shell.CopyCommands$Put.processOptions(CopyCommands.java:259) at org.apache.hadoop.fs.shell.Command.run(Command.java:175) at org.apache.hadoop.fs.FsShell.run(FsShell.java:328) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.fs.FsShell.main(FsShell.java:391) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.ozone.shaded.org.apache.http.client.utils.URIBuilder at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 15 more {code} It can be reproduced locally with executing the tests: {code} cd hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/ozone-mr/hadoop31 ./test.sh {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2166) Some RPC metrics are missing from SCM prometheus endpoint
Elek, Marton created HDDS-2166: -- Summary: Some RPC metrics are missing from SCM prometheus endpoint Key: HDDS-2166 URL: https://issues.apache.org/jira/browse/HDDS-2166 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Elek, Marton Assignee: Elek, Marton In Hadoop metrics it's possible to register multiple metrics with the same name but with different tags. For example each RpcServere has an own metrics instance in SCM. {code} "name" : "Hadoop:service=StorageContainerManager,name=RpcActivityForPort9860", "name" : "Hadoop:service=StorageContainerManager,name=RpcActivityForPort9863", {code} They are converted by PrometheusSink to a prometheus metric line with proper name and tags. For example: {code} rpc_rpc_queue_time60s_num_ops{port="9860",servername="StorageContainerLocationProtocolService",context="rpc",hostname="72736061cbc5"} 0 {code} The PrometheusSink uses a Map to cache all the recent values but unfortunately the key contains only the name (rpc_rpc_queue_time60s_num_ops in our example) but not the tags (port=...) For this reason if there are multiple metrics with the same name, only the first one will be displayed. As a result in SCM only the metrics of the first RPC server can be exported to the prometheus endpoint. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2043) "VOLUME_NOT_FOUND" exception thrown while listing volumes
[ https://issues.apache.org/jira/browse/HDDS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton resolved HDDS-2043. Resolution: Duplicate Tested and worked well. HDDS-1926 fixed the same problem IMHO. > "VOLUME_NOT_FOUND" exception thrown while listing volumes > - > > Key: HDDS-2043 > URL: https://issues.apache.org/jira/browse/HDDS-2043 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone CLI, Ozone Manager >Reporter: Nilotpal Nandi >Assignee: Bharat Viswanadham >Priority: Blocker > > ozone list volume command throws OMException > bin/ozone sh volume list --user root > VOLUME_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Volume > info not found for vol-test-putfile-1566902803 > > On enabling DEBUG log , here is the console output : > > > {noformat} > bin/ozone sh volume create /n1 ; echo $? > 2019-08-27 11:47:16 DEBUG ThriftSenderFactory:33 - Using the UDP Sender to > send spans to the agent. > 2019-08-27 11:47:16 DEBUG SenderResolver:86 - Using sender UdpSender() > 2019-08-27 11:47:16 DEBUG MutableMetricsFactory:43 - field > org.apache.hadoop.metrics2.lib.MutableRate > org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with > annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, > always=false, valueName=Time, about=, interval=10, type=DEFAULT, value=[Rate > of successful kerberos logins and latency (milliseconds)]) > 2019-08-27 11:47:16 DEBUG MutableMetricsFactory:43 - field > org.apache.hadoop.metrics2.lib.MutableRate > org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with > annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, > always=false, valueName=Time, about=, interval=10, type=DEFAULT, value=[Rate > of failed kerberos logins and latency (milliseconds)]) > 2019-08-27 11:47:16 DEBUG MutableMetricsFactory:43 - field > org.apache.hadoop.metrics2.lib.MutableRate > org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with > annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, > always=false, valueName=Time, about=, interval=10, type=DEFAULT, > value=[GetGroups]) > 2019-08-27 11:47:16 DEBUG MutableMetricsFactory:43 - field private > org.apache.hadoop.metrics2.lib.MutableGaugeLong > org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailuresTotal > with annotation > @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, always=false, > valueName=Time, about=, interval=10, type=DEFAULT, value=[Renewal failures > since startup]) > 2019-08-27 11:47:16 DEBUG MutableMetricsFactory:43 - field private > org.apache.hadoop.metrics2.lib.MutableGaugeInt > org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailures > with annotation @org.apache.hadoop.metrics2.annotation.Metric(sampleName=Ops, > always=false, valueName=Time, about=, interval=10, type=DEFAULT, > value=[Renewal failures since last successful login]) > 2019-08-27 11:47:16 DEBUG MetricsSystemImpl:231 - UgiMetrics, User and group > related metrics > 2019-08-27 11:47:16 DEBUG SecurityUtil:124 - Setting > hadoop.security.token.service.use_ip to true > 2019-08-27 11:47:16 DEBUG Shell:821 - setsid exited with exit code 0 > 2019-08-27 11:47:16 DEBUG Groups:449 - Creating new Groups object > 2019-08-27 11:47:16 DEBUG Groups:151 - Group mapping > impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; > cacheTimeout=30; warningDeltaMs=5000 > 2019-08-27 11:47:16 DEBUG UserGroupInformation:254 - hadoop login > 2019-08-27 11:47:16 DEBUG UserGroupInformation:187 - hadoop login commit > 2019-08-27 11:47:16 DEBUG UserGroupInformation:215 - using local > user:UnixPrincipal: root > 2019-08-27 11:47:16 DEBUG UserGroupInformation:221 - Using user: > "UnixPrincipal: root" with name root > 2019-08-27 11:47:16 DEBUG UserGroupInformation:235 - User entry: "root" > 2019-08-27 11:47:16 DEBUG UserGroupInformation:766 - UGI loginUser:root > (auth:SIMPLE) > 2019-08-27 11:47:16 DEBUG OzoneClientFactory:287 - Using > org.apache.hadoop.ozone.client.rpc.RpcClient as client protocol. > 2019-08-27 11:47:16 DEBUG Server:280 - rpcKind=RPC_PROTOCOL_BUFFER, > rpcRequestWrapperClass=class > org.apache.hadoop.ipc.ProtobufRpcEngine$RpcProtobufRequest, > rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@710f4dc7 > 2019-08-27 11:47:16 DEBUG Client:63 - getting client out of cache: > org.apache.hadoop.ipc.Client@24313fcc > 2019-08-27 11:47:16 DEBUG Client:487 - The ping interval is 6 ms. > 2019-08-27 11:47:16 DEBUG Client:785 - Connecting to > nnandi-1.gce.cloudera.com/172.31.117.213:9862 > 2019-08-27 11:47:16 DEBUG Client:1064 - IPC Client (580871917) connection to >
[jira] [Created] (HDDS-2154) Fix Checkstyle issues
Elek, Marton created HDDS-2154: -- Summary: Fix Checkstyle issues Key: HDDS-2154 URL: https://issues.apache.org/jira/browse/HDDS-2154 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Elek, Marton Unfortunately checkstyle checks didn't work well from HDDS-2106 to HDDS-2119. This patch fixes all the issues which are accidentally merged in the mean time. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2131) Optimize replication type and creation time calculation in S3 MPU list call
Elek, Marton created HDDS-2131: -- Summary: Optimize replication type and creation time calculation in S3 MPU list call Key: HDDS-2131 URL: https://issues.apache.org/jira/browse/HDDS-2131 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Elek, Marton Based on the review from [~bharatviswa]: {code} hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java metadataManager.getOpenKeyTable(); OmKeyInfo omKeyInfo = openKeyTable.get(upload.getDbKey()); {code} {quote}Here we are reading openKeyTable only for getting creation time. If we can have this information in omMultipartKeyInfo, we could avoid DB calls for openKeyTable. To do this, We can set creationTime in OmMultipartKeyInfo during initiateMultipartUpload . In this way, we can get all the required information from the MultipartKeyInfo table. And also StorageClass is missing from the returned OmMultipartUpload, as listMultipartUploads shows StorageClass information. For this, if we can return replicationType and depending on this value, we can set StorageClass in the listMultipartUploads Response. {quote} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2130) Add pagniation support to the S3 ListMPU call
Elek, Marton created HDDS-2130: -- Summary: Add pagniation support to the S3 ListMPU call Key: HDDS-2130 URL: https://issues.apache.org/jira/browse/HDDS-2130 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Elek, Marton HDDS-1054 introduced a simple implementation for the AWS S3 ListMultipartUploads REST call. However the pagination support (key-marker, max-uploads, upload-id-marker...) are missing. We should implement them in this jira. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2120) Remove hadoop classes from ozonefs-current jar
Elek, Marton created HDDS-2120: -- Summary: Remove hadoop classes from ozonefs-current jar Key: HDDS-2120 URL: https://issues.apache.org/jira/browse/HDDS-2120 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Elek, Marton We have two kind of ozone file system jars: current and legacy. current is designed to work only with exactly the same hadoop version which is used for compilation (3.2 as of now). But as of now the hadoop classes are included in the current jar which is not necessary as the jar is expected to be used in an environment where the hadoop classes (exactly the same hadoop classes) are already there. They can be excluded. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2106) Avoid usage of hadoop projects as parent of hdds/ozone
Elek, Marton created HDDS-2106: -- Summary: Avoid usage of hadoop projects as parent of hdds/ozone Key: HDDS-2106 URL: https://issues.apache.org/jira/browse/HDDS-2106 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Elek, Marton Ozone uses hadoop as a dependency. The dependency defined on multiple level: 1. the hadoop artifacts are defined in the sections 2. both hadoop-ozone and hadoop-hdds projects uses "hadoop-project" as the parent As we already have a slightly different assembly process it could be more resilient to use a dedicated parent project instead of the hadoop one. With this approach it will be easier to upgrade the versions as we don't need to be careful about the pom contents only about the used dependencies. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2075) Tracing in OzoneManager call is propagated with wrong parent
Elek, Marton created HDDS-2075: -- Summary: Tracing in OzoneManager call is propagated with wrong parent Key: HDDS-2075 URL: https://issues.apache.org/jira/browse/HDDS-2075 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Elek, Marton As you can see in the attached screenshot the OzoneManager.createBucket (server side) tracing information is the children of the freon.createBucket instead of the freon OzoneManagerProtocolPB.submitRequest. To avoid confusion the hierarchy should be fixed (Most probably we generate the child span AFTER we already serialized the parent one to the message) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2074) Use annotations to define description/filter/required filters of an InsightPoint
Elek, Marton created HDDS-2074: -- Summary: Use annotations to define description/filter/required filters of an InsightPoint Key: HDDS-2074 URL: https://issues.apache.org/jira/browse/HDDS-2074 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Elek, Marton InsightPoint interface defined the getDescription method to define the human readable description of the Insight point. To have better separation between the provided log/metrics/config information and the metadata, it would be better to use an annotation for this which also can include the filters (HDDS-2071). Something like this: {code} @InsightPoint(description="Information from the async event queue of the SCM", supportedFilters=["eventType"],requiredFilters="") public class EventQueueInsight extends BaseInsightPoint { ... } {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2073) Make SCMSecurityProtocol message based
Elek, Marton created HDDS-2073: -- Summary: Make SCMSecurityProtocol message based Key: HDDS-2073 URL: https://issues.apache.org/jira/browse/HDDS-2073 Project: Hadoop Distributed Data Store Issue Type: Sub-task Components: SCM Reporter: Elek, Marton We started to use a generic pattern where we have only one method in the grpc service and the main message contains all the required common information (eg. tracing). StorageContainerLocationProtocolService is not yet migrated to this approach. To make our generic debug tool more powerful and unify our protocols I suggest to transform this protocol as well. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2072) Make StorageContainerLocationProtocolService message based
Elek, Marton created HDDS-2072: -- Summary: Make StorageContainerLocationProtocolService message based Key: HDDS-2072 URL: https://issues.apache.org/jira/browse/HDDS-2072 Project: Hadoop Distributed Data Store Issue Type: Sub-task Components: SCM Reporter: Elek, Marton We started to use a generic pattern where we have only one method in the grpc service and the main message contains all the required common information (eg. tracing). StorageContainerDatanodeProtocolService is not yet migrated to this approach. To make our generic debug tool more powerful and unify our protocols I suggest to transform this protocol as well. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2071) Support filters in ozone insight point
Elek, Marton created HDDS-2071: -- Summary: Support filters in ozone insight point Key: HDDS-2071 URL: https://issues.apache.org/jira/browse/HDDS-2071 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Elek, Marton With Ozone insight we can print out all the logs / metrics of one specific component s (eg. scm.node-manager or scm.node-manager). It would be great to support additional filtering capabilities where the output is filtered based on specific keys. For example to print out all of the logs related to one datanode or related to one type of RPC request. Filter should be a key value map (eg. --filter datanode=sjdhfhf,rpc=createChunk) which can be defined in the ozone insight CLI. As we have no option to add additional tags to the logs (it may be supported by log4j2 but not with slf4k), the first implementation can be implemented by pattern matching. For example in SCMNodeManager.processNodeReport contains trace/debug logs which includes the " [datanode={}]" part. This formatting convention can be used to print out the only the related information. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2070) Create insight point to debug one specific pipeline
Elek, Marton created HDDS-2070: -- Summary: Create insight point to debug one specific pipeline Key: HDDS-2070 URL: https://issues.apache.org/jira/browse/HDDS-2070 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Elek, Marton During the first demo of ozone insight tool we had a demo insight point to debug Ratis pipelines. It was not stable enough to include in the first patch but here we can add it. The goal is to implement a new insight point (eg. datanode.pipeline) which can show information about one pipeline. It can be done with retrieving the hosts of the pipeline and generate the loggers metrics (InsightPoint.getRelatedLoggers and InsightPoint.getMetrics) based on the pipeline information (same loggers should be displayed from all the three datanodes. The pipeline id can be defined as a filter parameter which (in this case) should be required. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2068) Make StorageContainerDatanodeProtocolService message based
Elek, Marton created HDDS-2068: -- Summary: Make StorageContainerDatanodeProtocolService message based Key: HDDS-2068 URL: https://issues.apache.org/jira/browse/HDDS-2068 Project: Hadoop Distributed Data Store Issue Type: Sub-task Components: SCM Reporter: Elek, Marton We started to use a generic pattern where we have only one method in the grpc service and the main message contains all the required common information (eg. tracing). StorageContainerDatanodeProtocolService is not yet migrated to this approach. To make our generic debug tool more powerful and unify our protocols I suggest to transform this protocol as well. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2067) Create generic service facade with tracing/metrics/logging support
Elek, Marton created HDDS-2067: -- Summary: Create generic service facade with tracing/metrics/logging support Key: HDDS-2067 URL: https://issues.apache.org/jira/browse/HDDS-2067 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Elek, Marton We started to use a message based GRPC approach. Wen have only one method and the requests are routed based on a "type" field in the proto message. For example in OM protocol: {code} /** The OM service that takes care of Ozone namespace. */ service OzoneManagerService { // A client-to-OM RPC to send client requests to OM Ratis server rpc submitRequest(OMRequest) returns(OMResponse); } {code} And {code} message OMRequest { required Type cmdType = 1; // Type of the command ... {code} This approach makes it possible to use the same code to process incoming messages in the server side. ScmBlockLocationProtocolServerSideTranslatorPB.send method contains the logic of: * Logging the request/response message (can be displayed with ozone insight) * Updated metrics * Handle open tracing context propagation. These functions are generic. For example OzoneManagerProtocolServerSideTranslatorPB use the same (=similar) code. The goal in this jira is to provide a generic utility and move the common code for tracing/request logging/response logging/metrics calculation to a common utility which can be used from all the ServerSide translators. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2066) Improve the observability inside Ozone
Elek, Marton created HDDS-2066: -- Summary: Improve the observability inside Ozone Key: HDDS-2066 URL: https://issues.apache.org/jira/browse/HDDS-2066 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: Tools Reporter: Elek, Marton Assignee: Elek, Marton To improve the observability is a key requirement to achieve better correctness and performance with Ozone. This jira collects some of the tasks which can provide better visibility to the ozone internals. We have two main tools: * Distributed tracing (opentracing) can help to detected performance battlenecks * Ozone insight tool (a simple cli frontend for Hadoop metrics and log4j logging) can help to get better understanding about the current state/behavior of specific components. Both of them can be improved to make it more powerful. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2060) Create Ozone specific LICENSE file for the Ozone source package
Elek, Marton created HDDS-2060: -- Summary: Create Ozone specific LICENSE file for the Ozone source package Key: HDDS-2060 URL: https://issues.apache.org/jira/browse/HDDS-2060 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Elek, Marton Assignee: Elek, Marton With HDDS-2058 the Ozone (source) release package doesn't contains the hadoop sources any more. We need to create an adjusted LICENSE file for the Ozone source package (We already created a specific LICENSE file for the binary package which is not changed). In the new LICENSE file we should include entries only for the sources which are part of the Ozone release. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDDS-1596) Create service endpoint to download configuration from SCM
[ https://issues.apache.org/jira/browse/HDDS-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton reopened HDDS-1596: > Create service endpoint to download configuration from SCM > -- > > Key: HDDS-1596 > URL: https://issues.apache.org/jira/browse/HDDS-1596 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 4h 10m > Remaining Estimate: 0h > > As written in the design doc (see the parent issue) it was proposed to > download the configuration from the scm by the other services. > I propose to create a separated endpoint to provide the ozone configuration. > /conf can't be used as it contains *all* the configuration and we need only > the modified configuration. > The easiest way to implement this feature is: > * Create a simple rest endpoint which publishes all the configuration > * Download the configurations to $HADOOP_CONF_DIR/ozone-global.xml during > the service startup. > * Add ozone-global.xml as an additional config source (before ozone-site.xml > but after ozone-default.xml) > * The download can be optional > With this approach we keep the support of the existing manual configuration > (ozone-site.xml has higher priority) but we can download the configuration to > a separated file during the startup, which will be loaded. > There is no magic: the configuration file is saved and it's easy to debug > what's going on as the OzoneConfiguration is loaded from the $HADOOP_CONF_DIR > as before. > Possible follow-up steps: > * Migrate all the other services (recon, s3g) to the new approach. (possible > newbie jiras) > * Improve the CLI to define the SCM address. (As of now we use > ozone.scm.names) > * Create a service/hostname registration mechanism and autofill some of the > configuration based on the topology information. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2030) Generate simplifed reports by the dev-support/checks/*.sh scripts
Elek, Marton created HDDS-2030: -- Summary: Generate simplifed reports by the dev-support/checks/*.sh scripts Key: HDDS-2030 URL: https://issues.apache.org/jira/browse/HDDS-2030 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: build Reporter: Elek, Marton Assignee: Elek, Marton hadoop-ozone/dev-support/checks directory contains shell scripts to execute different type of code checks (findbugs, checkstyle, etc.) Currently the contract is very simple. Every shell script executes one (and only one) check and the shell response code is set according to the result (non-zero code if failed). To have better reporting in the github pr build, it would be great to improve the scripts to generate simple summary files and save the relevant files for archiving. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2025) Updated the Dockerfile of the official apache/ozone image
Elek, Marton created HDDS-2025: -- Summary: Updated the Dockerfile of the official apache/ozone image Key: HDDS-2025 URL: https://issues.apache.org/jira/browse/HDDS-2025 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Elek, Marton Assignee: Elek, Marton The hadoop-docker-ozone repository contains the definition of the apache/ozone image. https://github.com/apache/hadoop-docker-ozone/tree/ozone-latest It creates a docker packaging for the voted and released artifact, therefore it can be released after the final vote. Since the latest release we did some modification in our Dockerfiles. We need to apply the changes to the official image as well. Especially: 1. use ozone-runner as a base image instead of hadoop-runner 2. rename ozoneManager service to om as we did everywhere 3. Adjust the starter location (the script is moved to the released tar file) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2024) rat.sh: grep: warning: recursive search of stdin
[ https://issues.apache.org/jira/browse/HDDS-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton resolved HDDS-2024. Fix Version/s: 0.5.0 Resolution: Fixed > rat.sh: grep: warning: recursive search of stdin > > > Key: HDDS-2024 > URL: https://issues.apache.org/jira/browse/HDDS-2024 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: build >Affects Versions: 0.4.1 >Reporter: Doroszlai, Attila >Assignee: Doroszlai, Attila >Priority: Minor > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Running {{rat.sh}} locally fails with the following error message (after the > two Maven runs): > {code:title=./hadoop-ozone/dev-support/checks/rat.sh} > ... > grep: warning: recursive search of stdin > {code} > This happens if {{grep}} is not the GNU one. > Further, {{rat.sh}} runs into: {{cat: target/rat-aggregated.txt: No such file > or directory}} in subshell due to a typo, and so always exits with success: > {code:title=./hadoop-ozone/dev-support/checks/rat.sh} > ... > cat: target/rat-aggregated.txt: No such file or directory > $ echo $? > 0 > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2022) Add additional freon tests
Elek, Marton created HDDS-2022: -- Summary: Add additional freon tests Key: HDDS-2022 URL: https://issues.apache.org/jira/browse/HDDS-2022 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: Tools Reporter: Elek, Marton Assignee: Elek, Marton Freon is a generic load generator tool for ozone (ozone freon) which supports multiple generation pattern. As of now only the random-key-generator is implemented which uses ozone rpc client. It would be great to add additional tests: * Test key generation via s3 interface * Test key generation via the hadoop fs interface * Test key reads (validation) * Test OM with direct RPC calls -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2000) Don't depend on bootstrap/jquery versions from hadoop-trunk snapshot
Elek, Marton created HDDS-2000: -- Summary: Don't depend on bootstrap/jquery versions from hadoop-trunk snapshot Key: HDDS-2000 URL: https://issues.apache.org/jira/browse/HDDS-2000 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: om, SCM Reporter: Elek, Marton Assignee: Elek, Marton The OM/SCM web pages are broken due to the upgrade in HDFS-14729 (which is a great patch on the Hadoop side). To have more stability I propose to use our own instance from jquery/bootstrap instead of copying the actual version from hadoop trunk which is a SNAPSHOT build. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1997) Support copy-source-if-(un)modified-since headers for MPU key creation with copy
Elek, Marton created HDDS-1997: -- Summary: Support copy-source-if-(un)modified-since headers for MPU key creation with copy Key: HDDS-1997 URL: https://issues.apache.org/jira/browse/HDDS-1997 Project: Hadoop Distributed Data Store Issue Type: Sub-task Components: S3 Reporter: Elek, Marton HDDS-1942 introduces the option to create MPU key part with copy (https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPartCopy.html). But the x-amz-copy-source-if-(un)modified-since headers are not supported yet. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1951) Wrong symbolic release name on 0.4.1 branc
Elek, Marton created HDDS-1951: -- Summary: Wrong symbolic release name on 0.4.1 branc Key: HDDS-1951 URL: https://issues.apache.org/jira/browse/HDDS-1951 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Elek, Marton Should be Biscayne instead of Crater lake according to the Roadmap: https://cwiki.apache.org/confluence/display/HADOOP/Ozone+Road+Map -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1950) S3 MPU part list can't be called if there are no parts
Elek, Marton created HDDS-1950: -- Summary: S3 MPU part list can't be called if there are no parts Key: HDDS-1950 URL: https://issues.apache.org/jira/browse/HDDS-1950 Project: Hadoop Distributed Data Store Issue Type: Bug Components: S3 Reporter: Elek, Marton If an S3 multipart upload is created but no part is upload the part list can't be called because it throws HTTP 500: Create an MPU: {code} aws s3api --endpoint http://localhost: create-multipart-upload --bucket=docker --key=testkeu { "Bucket": "docker", "Key": "testkeu", "UploadId": "85343e71-4c16-4a75-bb55-01f56a9339b2-102592678478217234" } {code} List the parts: {code} aws s3api --endpoint http://localhost: list-parts --bucket=docker --key=testkeu --upload-id=85343e71-4c16-4a75-bb55-01f56a9339b2-102592678478217234 {code} It throws an exception on the server side, because in the KeyManagerImpl.listParts the ReplicationType is retrieved from the first part: {code} HddsProtos.ReplicationType replicationType = partKeyInfoMap.firstEntry().getValue().getPartKeyInfo().getType(); {code} Which is not yet available in this use case. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1948) S3 MPU can't be created with octet-stream content-type
Elek, Marton created HDDS-1948: -- Summary: S3 MPU can't be created with octet-stream content-type Key: HDDS-1948 URL: https://issues.apache.org/jira/browse/HDDS-1948 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Elek, Marton Assignee: Elek, Marton This problem is reported offline by [~shaneku...@gmail.com]. When aws-sdk-go is used to access to s3 gateway of Ozone it sends the Multi Part Upload initialize message with "application/octet-stream" Content-Type. This Content-Type is missing from the aws-cli which is used to reimplement s3 endpoint. The problem is that we use the same rest endpoint for initialize and complete Multipart Upload request. For the completion we need the CompleteMultipartUploadRequest parameter which is parsed from the body. For initialize we have an empty body which can't be serialized to CompleteMultipartUploadRequest. The workaround is to set a specific content type from a filter which help up to create two different REST method for initialize and completion message. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1942) Support copy during S3 multipart upload part creation
Elek, Marton created HDDS-1942: -- Summary: Support copy during S3 multipart upload part creation Key: HDDS-1942 URL: https://issues.apache.org/jira/browse/HDDS-1942 Project: Hadoop Distributed Data Store Issue Type: Sub-task Components: S3 Reporter: Elek, Marton Uploads a part by copying data from an existing object as data source Documented here: https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPartCopy.html -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1937) Acceptance tests fail if scm webui shows invalid json
Elek, Marton created HDDS-1937: -- Summary: Acceptance tests fail if scm webui shows invalid json Key: HDDS-1937 URL: https://issues.apache.org/jira/browse/HDDS-1937 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Elek, Marton Assignee: Elek, Marton Acceptance test of a nightly build is failed with the following error: {code} Creating ozonesecure_datanode_3 ... [7A[2K Creating ozonesecure_kdc_1 ... [32mdone[0m [7B[6A[2K Creating ozonesecure_om_1 ... [32mdone[0m [6B[8A[2K Creating ozonesecure_scm_1 ... [32mdone[0m [8B[1A[2K Creating ozonesecure_datanode_3 ... [32mdone[0m [1B[5A[2K Creating ozonesecure_kms_1 ... [32mdone[0m [5B[4A[2K Creating ozonesecure_s3g_1 ... [32mdone[0m [4B[2A[2K Creating ozonesecure_datanode_2 ... [32mdone[0m [2B[3A[2K Creating ozonesecure_datanode_1 ... [32mdone[0m [3Bparse error: Invalid numeric literal at line 2, column 0 {code} https://raw.githubusercontent.com/elek/ozone-ci/master/byscane/byscane-nightly-5b87q/acceptance/output.log The problem is in the script which checks the number of available datanodes. If the HTTP endpoint of the SCM is already started BUT not ready yet it may return with a simple HTML error message instead of json. Which can not be parsed by jq: In testlib.sh: {code} 37 │ if [[ "${SECURITY_ENABLED}" == 'true' ]]; then 38 │ docker-compose -f "${compose_file}" exec -T scm bash -c "kinit -k HTTP/scm@EXAMPL │ E.COM -t /etc/security/keytabs/HTTP.keytab && curl --negotiate -u : -s '${jmx_url}'" 39 │ else 40 │ docker-compose -f "${compose_file}" exec -T scm curl -s "${jmx_url}" 41 │ fi \ 42 │ | jq -r '.beans[0].NodeCount[] | select(.key=="HEALTHY") | .value' {code} One possible fix is to adjust the error handling (set +x / set -x) per method instead of using a generic set -x at the beginning. It would provide a more predictable behavior. In our case count_datanode should not fail evert (as the caller method: wait_for_datanodes can retry anyway). -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1935) Improve the visibility with Ozone Insight tool
Elek, Marton created HDDS-1935: -- Summary: Improve the visibility with Ozone Insight tool Key: HDDS-1935 URL: https://issues.apache.org/jira/browse/HDDS-1935 Project: Hadoop Distributed Data Store Issue Type: New Feature Reporter: Elek, Marton Assignee: Elek, Marton Visibility is a key aspect for the operation of any Ozone cluster. We need better visibility to improve correctnes and performance. While the distributed tracing is a good tool for improving the visibility of performance we have no powerful tool which can be used to check the internal state of the Ozone cluster and debug certain correctness issues. To improve the visibility of the internal components I propose to introduce a new command line application `ozone insight`. The new tool will show the selected metrics / logs / configuration for any of the internal components (like replication-manager, pipeline, etc.). For each insight points we can define the required logs and log levels, metrics and configuration and the tool can display only the component specific information during the debug. h2. Usage First we can check the available insight point: {code} bash-4.2$ ozone insight list Available insight points: scm.node-manager SCM Datanode management related information. scm.replica-manager SCM closed container replication manager scm.event-queue Information about the internal async event delivery scm.protocol.block-location SCM Block location protocol endpoint scm.protocol.container-location Planned insight point which is not yet implemented. scm.protocol.datanodePlanned insight point which is not yet implemented. scm.protocol.securityPlanned insight point which is not yet implemented. scm.http Planned insight point which is not yet implemented. om.key-manager OM Key Manager om.protocol.client Ozone Manager RPC endpoint om.http Planned insight point which is not yet implemented. datanode.pipeline[id]More information about one ratis datanode ring. datanode.rocksdb More information about one ratis datanode ring. s3g.http Planned insight point which is not yet implemented. {code} Insight points can define configuration, metrics and/or logs. Configuration can be displayed based on the configuration objects: {code} ozone insight config scm.protocol.block-location Configuration for `scm.protocol.block-location` (SCM Block location protocol endpoint) >>> ozone.scm.block.client.bind.host default: 0.0.0.0 current: 0.0.0.0 The hostname or IP address used by the SCM block client endpoint to bind >>> ozone.scm.block.client.port default: 9863 current: 9863 The port number of the Ozone SCM block client service. >>> ozone.scm.block.client.address default: ${ozone.scm.client.address} current: scm The address of the Ozone SCM block client service. If not defined value of ozone.scm.client.address is used {code} Metrics can be retrieved from the prometheus entrypoint: {code} ozone insight metrics scm.protocol.block-location Metrics for `scm.protocol.block-location` (SCM Block location protocol endpoint) RPC connections Open connections: 0 Dropped connections: 0 Received bytes: 0 Sent bytes: 0 RPC queue RPC average queue time: 0.0 RPC call queue length: 0 RPC performance RPC processing time average: 0.0 Number of slow calls: 0 Message type counters Number of AllocateScmBlock: 0 Number of DeleteScmKeyBlocks: 0 Number of GetScmInfo: 2 Number of SortDatanodes: 0 {code} Log levels can be adjusted with the existing logLevel servlet and can be collected / streamd via a simple logstream servlet: {code} ozone insight log scm.node-manager [SCM] 2019-08-08 12:42:37,392 [DEBUG|org.apache.hadoop.hdds.scm.node.SCMNodeManager|SCMNodeManager] Processing node report from [datanode=ozone_datanode_1.ozone_default] [SCM] 2019-08-08 12:43:37,392 [DEBUG|org.apache.hadoop.hdds.scm.node.SCMNodeManager|SCMNodeManager] Processing node report from [datanode=ozone_datanode_1.ozone_default] [SCM] 2019-08-08 12:44:37,392 [DEBUG|org.apache.hadoop.hdds.scm.node.SCMNodeManager|SCMNodeManager] Processing node report from [datanode=ozone_datanode_1.ozone_default] [SCM] 2019-08-08 12:45:37,393 [DEBUG|org.apache.hadoop.hdds.scm.node.SCMNodeManager|SCMNodeManager] Processing node report from [datanode=ozone_datanode_1.ozone_default] [SCM] 2019-08-08 12:46:37,392 [DEBUG|org.apache.hadoop.hdds.scm.node.SCMNodeManager|SCMNodeManager] Processing node report from [datanode=ozone_datanode_1.ozone_default] {code} The verbose mode can display the raw messages
[jira] [Created] (HDDS-1926) The new caching layer is used for old OM requests but not updated
Elek, Marton created HDDS-1926: -- Summary: The new caching layer is used for old OM requests but not updated Key: HDDS-1926 URL: https://issues.apache.org/jira/browse/HDDS-1926 Project: Hadoop Distributed Data Store Issue Type: Bug Components: om Reporter: Elek, Marton HDDS-1499 introduced a new caching layer together with a double-buffer based db writer to support OM HA. TLDR: I think the caching layer is not updated for new volume creation. And (slightly related to this problem) I suggest to separated the TypedTable and the caching layer. ## How to reproduce the problem? 1. Start a docker compose cluster 2. Create one volume (let's say `/vol1`) 3. Restart the om (!) 4. Try to create an _other_ volume twice! ``` bash-4.2$ ozone sh volume create /vol2 2019-08-07 12:29:47 INFO RpcClient:288 - Creating Volume: vol2, with hadoop as owner. bash-4.2$ ozone sh volume create /vol2 2019-08-07 12:29:50 INFO RpcClient:288 - Creating Volume: vol2, with hadoop as owner. ``` Expected behavior is an error: {code} bash-4.2$ ozone sh volume create /vol1 2019-08-07 09:48:39 INFO RpcClient:288 - Creating Volume: vol1, with hadoop as owner. bash-4.2$ ozone sh volume create /vol1 2019-08-07 09:48:42 INFO RpcClient:288 - Creating Volume: vol1, with hadoop as owner. VOLUME_ALREADY_EXISTS {code} The problem is that the new cache is used even for the old code path (TypedTable): {code} @Override public VALUE get(KEY key) throws IOException { // Here the metadata lock will guarantee that cache is not updated for same // key during get key. CacheResult> cacheResult = cache.lookup(new CacheKey<>(key)); if (cacheResult.getCacheStatus() == EXISTS) { return cacheResult.getValue().getCacheValue(); } else if (cacheResult.getCacheStatus() == NOT_EXIST) { return null; } else { return getFromTable(key); } } {code} For volume table after the FIRST start it always returns with `getFromTable(key)` due to the condition in the `TableCacheImpl.lookup`: {code} public CacheResult lookup(CACHEKEY cachekey) { if (cache.size() == 0) { return new CacheResult<>(CacheResult.CacheStatus.MAY_EXIST, null); } {code} But after a restart the cache is pre-loaded by the TypedTable.constructor. After the restart, the real caching logic will be used (as cache.size()>0), which cause a problem as the cache is NOT updated from the old code path. An additional problem is that the cache is turned on for all the metadata table even if the cache is not required... ## Proposed solution As I commented at HDDS-1499 this caching layer is not a "traditional cache". It's not updated during the typedTable.put() call but updated by a separated component during double-buffer flash. I would suggest to remove the cache related methods from TypedTable (move to a separated implementation). I think this kind of caching can be independent from the TypedTable implementation. We can continue to use the simple TypedTable everywhere where we don't need to use any kind of caching. For caching we can use a separated object. It would make it more visible that the cache should always be updated manually all the time. This separated caching utility may include a reference to the original TypedTable/Table. With this approach we can separate the different responsibilities but provide the same functionality. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1915) Remove hadoop script from ozone distribution
Elek, Marton created HDDS-1915: -- Summary: Remove hadoop script from ozone distribution Key: HDDS-1915 URL: https://issues.apache.org/jira/browse/HDDS-1915 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Elek, Marton Assignee: Elek, Marton /bin/hadoop script is included in the ozone distribution even if we a dedicated /bin/ozone [~arp] reported that it can be confusing, for example "hadoop classpath" returns with a bad classpath (ozone classpath ) should be used instead. To avoid such confusions I suggest to remove the hadoop script from distribution as ozone script already provides all the functionalities. It also helps as to reduce the dependencies between hadoop 3.2-SNAPSHOT and ozone as we use the snapshot hadoop script as of now. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1914) Ozonescript example docker-compose cluster can't be started
Elek, Marton created HDDS-1914: -- Summary: Ozonescript example docker-compose cluster can't be started Key: HDDS-1914 URL: https://issues.apache.org/jira/browse/HDDS-1914 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Elek, Marton Assignee: Elek, Marton the compose/ozonescripts cluster provides an example environment to test the start-ozone.sh and stop-ozone.sh scripts. It starts containers with sshd daemon but witout starting the ozone which makes it possible to start those scripts. Unfortunately the docker files are broken since: * we switched from debian to centos with the base image * we started to use /etc/hadoop instead of /opt/hadoop/etc/hadoop for configuring the hadoop (workers file should be copied there) * we started to use jdk11 to execute ozone (instead of java8) The configuration files should be updated according to these changes. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDDS-1881) Design doc: decommissioning in Ozone
[ https://issues.apache.org/jira/browse/HDDS-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton reopened HDDS-1881: > Design doc: decommissioning in Ozone > > > Key: HDDS-1881 > URL: https://issues.apache.org/jira/browse/HDDS-1881 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Major > Labels: design, pull-request-available > Time Spent: 33h 50m > Remaining Estimate: 0h > > Design doc can be attached to the documentation. In this jira the design doc > will be attached and merged to the documentation page. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1881) Design doc: decommissioning in Ozone
Elek, Marton created HDDS-1881: -- Summary: Design doc: decommissioning in Ozone Key: HDDS-1881 URL: https://issues.apache.org/jira/browse/HDDS-1881 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Elek, Marton Assignee: Elek, Marton Design doc can be attached to the documentation. In this jira the design doc will be attached and merged to the documentation page. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1880) Decommissioining and maintenance mode in Ozone
Elek, Marton created HDDS-1880: -- Summary: Decommissioining and maintenance mode in Ozone Key: HDDS-1880 URL: https://issues.apache.org/jira/browse/HDDS-1880 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: SCM Reporter: Elek, Marton This is the umbrella jira for decommissioning support in Ozone. Design doc will be attached soon. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1871) Remove anti-affinity rules from k8s minkube example
Elek, Marton created HDDS-1871: -- Summary: Remove anti-affinity rules from k8s minkube example Key: HDDS-1871 URL: https://issues.apache.org/jira/browse/HDDS-1871 Project: Hadoop Distributed Data Store Issue Type: Bug Components: kubernetes Reporter: Elek, Marton Assignee: Elek, Marton HDDS-1646 introduced real persistence for k8s example deployment files which means that we need anti-affinity scheduling rules: Even if we use statefulset instead of daemonset we would like to start one datanode per real nodes. With minikube we have only one node therefore the scheduling rule should be removed to enable at least 3 datanodes on the same physical nodes. How to test: {code} mvn clean install -DskipTests -f pom.ozone.xml cd hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/kubernetes/examples/minikube minikube start kubectl apply -f . kc get pod {code} You should see 3 datanode instances. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1811) Prometheus metrics are broken for datanodes due tue a wrong metrics
Elek, Marton created HDDS-1811: -- Summary: Prometheus metrics are broken for datanodes due tue a wrong metrics Key: HDDS-1811 URL: https://issues.apache.org/jira/browse/HDDS-1811 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Reporter: Elek, Marton Datanodes can't be monitored with prometheus any more: {code} level=warn ts=2019-07-16T16:29:55.876Z caller=scrape.go:937 component="scrape manager" scrape_pool=pods target=http://192.168.69.76:9882/prom msg="append failed" err="invalid metric type \"apache.hadoop.ozone.container.common.transport.server.ratis._csm_metrics_delete_container_avg_time gauge\"" {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1673) smoketests are failing because an acl error
[ https://issues.apache.org/jira/browse/HDDS-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton resolved HDDS-1673. Resolution: Cannot Reproduce Seems to be fixed by the latest ACL patches. I am closing this for now. Feel free to reopen if you see the problem again. > smoketests are failing because an acl error > --- > > Key: HDDS-1673 > URL: https://issues.apache.org/jira/browse/HDDS-1673 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Elek, Marton >Assignee: Anu Engineer >Priority: Blocker > > After executing this command: > {code} > yarn jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0.jar pi 3 3 2 > {code} > The result is: > {code} > Number of Maps = 3 > Samples per Map = 3 > 2019-06-12 03:16:20 ERROR OzoneClientFactory:294 - Couldn't create protocol > class org.apache.hadoop.ozone.client.rpc.RpcClient exception: > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:291) > at > org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:169) > at > org.apache.hadoop.fs.ozone.BasicOzoneClientAdapterImpl.(BasicOzoneClientAdapterImpl.java:134) > at > org.apache.hadoop.fs.ozone.OzoneClientAdapterImpl.(OzoneClientAdapterImpl.java:50) > at > org.apache.hadoop.fs.ozone.OzoneFileSystem.createAdapter(OzoneFileSystem.java:103) > at > org.apache.hadoop.fs.ozone.BasicOzoneFileSystem.initialize(BasicOzoneFileSystem.java:143) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:227) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:463) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361) > at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:522) > at > org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:275) > at > org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:360) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at > org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:368) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) > at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) > at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:323) > at org.apache.hadoop.util.RunJar.main(RunJar.java:236) > Caused by: org.apache.hadoop.hdds.conf.ConfigurationException: Can't inject > configuration to class > org.apache.hadoop.ozone.security.acl.OzoneAclConfig.setUserDefaultRights > at > org.apache.hadoop.hdds.conf.OzoneConfiguration.getObject(OzoneConfiguration.java:160) > at > org.apache.hadoop.ozone.client.rpc.RpcClient.(RpcClient.java:148) > ... 36 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at >
[jira] [Created] (HDDS-1800) Result of author check is inverted
Elek, Marton created HDDS-1800: -- Summary: Result of author check is inverted Key: HDDS-1800 URL: https://issues.apache.org/jira/browse/HDDS-1800 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Elek, Marton ## What changes were proposed in this pull request? Fix: 1. author check fails when no violations are found 2. author check violations are duplicated in the output Eg. https://ci.anzix.net/job/ozone-nightly/173/consoleText says that: {code:java} The following tests are FAILED: [author]: author check is failed (https://ci.anzix.net/job/ozone-nightly/173//artifact/build/author.out/*view*/){code} but no actual `@author` tags were found: ``` $ curl -s 'https://ci.anzix.net/job/ozone-nightly/173//artifact/build/author.out/*view*/' | wc 0 0 0 ``` ## How was this patch tested? {code} $ bash -o pipefail -c 'hadoop-ozone/dev-support/checks/author.sh | tee build/author.out'; echo $? 0 $ wc build/author.out 0 0 0 build/author.out $ echo '// @author Tolkien' >> hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/BucketManager.java $ bash -o pipefail -c 'hadoop-ozone/dev-support/checks/author.sh | tee build/author.out'; echo $? ./hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/BucketManager.java:// @author Tolkien 1 $ wc build/author.out 1 3 108 build/author.out {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1799) Add goofyfs to the ozone-runner docker image
Elek, Marton created HDDS-1799: -- Summary: Add goofyfs to the ozone-runner docker image Key: HDDS-1799 URL: https://issues.apache.org/jira/browse/HDDS-1799 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Elek, Marton Assignee: Elek, Marton Goofys is a s3 fuse driver which is required for the ozone csi setup. As of now it's installed in hadoop-ozone/dist/src/main/docker/Dockerfile from a non-standard location (because it couldn't be part of hadoop-runner earlier as it's ozone specific). It should be installed to the ozone-runner from a canonical goffys release URL. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1793) Acceptance test of ozone-topology cluster is failing
Elek, Marton created HDDS-1793: -- Summary: Acceptance test of ozone-topology cluster is failing Key: HDDS-1793 URL: https://issues.apache.org/jira/browse/HDDS-1793 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Elek, Marton Since HDDS-1586 the smoketests of the ozone-topology compose file is broken: {code:java} Output: /tmp/smoketest/ozone-topology/result/robot-ozone-topology-ozone-topology-basic-scm.xml must specify at least one container source Stopping datanode_2 ... Stopping datanode_3 ... Stopping datanode_4 ... Stopping scm... Stopping om ... Stopping datanode_1 ... [6A[2K Stopping datanode_2 ... [32mdone[0m [6B[4A[2K Stopping datanode_4 ... [32mdone[0m [4B[1A[2K Stopping datanode_1 ... [32mdone[0m [1B[5A[2K Stopping datanode_3 ... [32mdone[0m [5B[3A[2K Stopping scm... [32mdone[0m [3B[2A[2K Stopping om ... [32mdone[0m [2BRemoving datanode_2 ... Removing datanode_3 ... Removing datanode_4 ... Removing scm... Removing om ... Removing datanode_1 ... [1A[2K Removing datanode_1 ... [32mdone[0m [1B[2A[2K Removing om ... [32mdone[0m [2B[5A[2K Removing datanode_3 ... [32mdone[0m [5B[4A[2K Removing datanode_4 ... [32mdone[0m [4B[6A[2K Removing datanode_2 ... [32mdone[0m [6B[3A[2K Removing scm... [32mdone[0m [3BRemoving network ozone-topology_net [ ERROR ] Reading XML source '/var/jenkins_home/workspace/ozone/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/ozone-topology/result/robot-*.xml' failed: No such file or directory Try --help for usage information. ERROR: Test execution of /var/jenkins_home/workspace/ozone/hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/ozone-topology is FAILED{code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1785) OOM error in Freon due to the concurrency handling
Elek, Marton created HDDS-1785: -- Summary: OOM error in Freon due to the concurrency handling Key: HDDS-1785 URL: https://issues.apache.org/jira/browse/HDDS-1785 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Elek, Marton HDDS-1530 modified the concurrent framework usage of Freon (RandomKeyGenerator). The new approach uses separated tasks (Runnable) to create the volumes/buckets/keys. Unfortunately it doesn't work very well in some cases. # When Freon starts it creates an executor with fixed number of threads (10) # The first loop submits numOfVolumes (10) VolumeProcessor tasks to the executor # The 10 threads starts to execute the 10 VolumeProcessor tasks # Each VolumeProcessor tasks creates numOfBuckets (1000) BucketProcessor tasks. All together 1 tasks are submitted to the executor. # The 10 threads starts to execute the first 10 BucketProcessor tasks, they starts to create the KeyProcessor tasks: 500 000 * 10 tasks are submitted. # At this point of the time no keys are generated, but the next 10 BucketProcessor tasks are started to execute.. # To execute the first key creation we should process all the BucketProcessor tasks which means that all the Key creation tasks (10 * 1000 * 500 000) are created and added to the executor # Which requires a huge amount of time and memory -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1459) Docker compose of ozonefs has older hadoop image for hadoop 3.2
[ https://issues.apache.org/jira/browse/HDDS-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton resolved HDDS-1459. Resolution: Duplicate Thanks the report [~vivekratnavel] It's fixed with HDDS-1525: ozonefs compose files are removed because ozone-mr tests are improved and they include the same functionality (ozone fs test with hdfs cli AND with mr client). Versions are fixed (2.7, 3.1, 3.2) > Docker compose of ozonefs has older hadoop image for hadoop 3.2 > --- > > Key: HDDS-1459 > URL: https://issues.apache.org/jira/browse/HDDS-1459 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem >Affects Versions: 0.4.0 >Reporter: Vivek Ratnavel Subramanian >Assignee: Vivek Ratnavel Subramanian >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1014) hadoop-ozone-filesystem is missing required jars
[ https://issues.apache.org/jira/browse/HDDS-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton resolved HDDS-1014. Resolution: Duplicate Thanks the report [~bharatviswa] For mapreduce please use hadoop-ozone-filesystem-lib-current-0.5.0-SNAPSHOT.jar or hadoop-ozone-filesystem-lib-legacy-0.5.0-SNAPSHOT.jar instead of hadoop-ozone-filesystem-0.5.0-SNAPSHOT.jar. Legacy and current jar files are the shaded jar files. the simple filesystem includes only the ozonefs jar files to make it work with "ozone fs" command. BTW, legacy/current jar files were broken at the time of this report which made harder to find the right jars. but they will be fixed by HDDS-1525 and HDDS-1717 very soon... > hadoop-ozone-filesystem is missing required jars > > > Key: HDDS-1014 > URL: https://issues.apache.org/jira/browse/HDDS-1014 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > > https://hadoop.apache.org/ozone/docs/0.3.0-alpha/ozonefs.html > After following the steps mentioned, I still get below error: > {code:java} > 19/01/25 17:15:28 ERROR client.OzoneClientFactory: Couldn't create protocol > class org.apache.hadoop.ozone.client.rpc.RpcClient exception: > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:291) > at > org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:169) > at > org.apache.hadoop.fs.ozone.OzoneFileSystem.initialize(OzoneFileSystem.java:128) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3354) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3403) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3371) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:477) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:226) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:461) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361) > at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:352) > at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:250) > at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:233) > at > org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:104) > at org.apache.hadoop.fs.shell.Command.run(Command.java:177) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:328) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:391) > Caused by: java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: > org/apache/ratis/thirdparty/com/google/protobuf/ByteString > at org.apache.ratis.protocol.RaftId.(RaftId.java:64) > at org.apache.ratis.protocol.ClientId.(ClientId.java:47) > at org.apache.ratis.protocol.ClientId.randomId(ClientId.java:31) > at org.apache.hadoop.ozone.client.rpc.RpcClient.(RpcClient.java:115) > ... 24 more > Caused by: java.lang.NoClassDefFoundError: > org/apache/ratis/thirdparty/com/google/protobuf/ByteString > ... 28 more > Caused by: java.lang.ClassNotFoundException: > org.apache.ratis.thirdparty.com.google.protobuf.ByteString > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 28 more > {code} > So, I proceeded and added ratis-thirdparty-misc jar. > After that I got error related to missing RatisProto, and then next missing > bouncy castle. > After adding all of those jars I am able to run dfs and map red jobs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1431) Linkage error thrown if ozone-fs-legacy jar is on the classpath
[ https://issues.apache.org/jira/browse/HDDS-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton resolved HDDS-1431. Resolution: Duplicate Assignee: Elek, Marton Thanks [~swagle] the report this issue. # classpath issues related the usage of the lib-legacy jar will be fixed by HDDS-1525 (I will close this as a duplicate) # ozonefs-lib-legacy/current jar files are shaded all-in-one files. It's not supported to add them to the classpath together with any other ozone jar. It's enough to add just the fat jars to the HADOOP_CLASSPATH. (But if you do it, you won't get the this strange exception after HDDS-1525) > Linkage error thrown if ozone-fs-legacy jar is on the classpath > --- > > Key: HDDS-1431 > URL: https://issues.apache.org/jira/browse/HDDS-1431 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem >Reporter: Siddharth Wagle >Assignee: Elek, Marton >Priority: Major > > With hadoop-ozone-filesystem-lib-legacy-0.5.0-SNAPSHOT.jar on the classpath > along with current jar results in classloader throwing an error on fs write > operation as below: > {code} > 2019-04-11 16:06:54,127 ERROR [OzoneClientAdapterFactory] Can't initialize > the ozoneClientAdapter > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.fs.ozone.OzoneClientAdapterFactory.lambda$createAdapter$1(OzoneClientAdapterFactory.java:66) > at > org.apache.hadoop.fs.ozone.OzoneClientAdapterFactory.createAdapter(OzoneClientAdapterFactory.java:116) > at > org.apache.hadoop.fs.ozone.OzoneClientAdapterFactory.createAdapter(OzoneClientAdapterFactory.java:62) > at > org.apache.hadoop.fs.ozone.OzoneFileSystem.createAdapter(OzoneFileSystem.java:92) > at > org.apache.hadoop.fs.ozone.BasicOzoneFileSystem.initialize(BasicOzoneFileSystem.java:146) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352) > at > org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:3326) > at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:532) > at org.notmysock.repl.Works$CopyWorker.run(Works.java:252) > at org.notmysock.repl.Works$CopyWorker.call(Works.java:287) > at org.notmysock.repl.Works$CopyWorker.call(Works.java:207) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.LinkageError: loader constraint violation: loader > (instance of org/apache/hadoop/fs/ozone/FilteredClassLoader) previously > initiated loading for a different t > ype with name "org/apache/hadoop/crypto/key/KeyProvider" > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:763) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) > at java.net.URLClassLoader.access$100(URLClassLoader.java:73) > at java.net.URLClassLoader$1.run(URLClassLoader.java:368) > at java.net.URLClassLoader$1.run(URLClassLoader.java:362) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:361) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.hadoop.fs.ozone.FilteredClassLoader.loadClass(FilteredClassLoader.java:72) > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.privateGetPublicMethods(Class.java:2902) > at java.lang.Class.getMethods(Class.java:1615) > at sun.misc.ProxyGenerator.generateClassFile(ProxyGenerator.java:451) > at sun.misc.ProxyGenerator.generateProxyClass(ProxyGenerator.java:339) > at
[jira] [Resolved] (HDDS-1338) ozone shell commands are throwing InvocationTargetException
[ https://issues.apache.org/jira/browse/HDDS-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton resolved HDDS-1338. Resolution: Duplicate > ozone shell commands are throwing InvocationTargetException > --- > > Key: HDDS-1338 > URL: https://issues.apache.org/jira/browse/HDDS-1338 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Priority: Major > > ozone version > {noformat} > Source code repository g...@github.com:hortonworks/ozone.git -r > 310ebf5dc83b6c9e68d09246ed6c6f7cf6370fde > Compiled by jenkins on 2019-03-21T22:06Z > Compiled with protoc 2.5.0 > From source with checksum 9c367143ad43b81ca84bfdaafd1c3f > Using HDDS 0.4.0.3.0.100.0-388 > Source code repository g...@github.com:hortonworks/ozone.git -r > 310ebf5dc83b6c9e68d09246ed6c6f7cf6370fde > Compiled by jenkins on 2019-03-21T22:06Z > Compiled with protoc 2.5.0 > From source with checksum f3297cbd3a5f59fb4e5fd551afa05ba9 > {noformat} > Here is the ozone volume create failure output : > {noformat} > hdfs@ctr-e139-1542663976389-91321-01-02 ~]$ ozone sh volume create > testvolume11 > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.0.100.0-388/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.0.100.0-388/hadoop-ozone/share/ozone/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > 19/03/26 17:31:37 ERROR client.OzoneClientFactory: Couldn't create protocol > class org.apache.hadoop.ozone.client.rpc.RpcClient exception: > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:291) > at > org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:169) > at > org.apache.hadoop.ozone.web.ozShell.OzoneAddress.createClient(OzoneAddress.java:111) > at > org.apache.hadoop.ozone.web.ozShell.volume.CreateVolumeHandler.call(CreateVolumeHandler.java:70) > at > org.apache.hadoop.ozone.web.ozShell.volume.CreateVolumeHandler.call(CreateVolumeHandler.java:38) > at picocli.CommandLine.execute(CommandLine.java:919) > at picocli.CommandLine.access$700(CommandLine.java:104) > at picocli.CommandLine$RunLast.handle(CommandLine.java:1083) > at picocli.CommandLine$RunLast.handle(CommandLine.java:1051) > at > picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:959) > at picocli.CommandLine.parseWithHandlers(CommandLine.java:1242) > at picocli.CommandLine.parseWithHandler(CommandLine.java:1181) > at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:61) > at org.apache.hadoop.ozone.web.ozShell.Shell.execute(Shell.java:82) > at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:52) > at org.apache.hadoop.ozone.web.ozShell.Shell.main(Shell.java:93) > Caused by: java.lang.VerifyError: Cannot inherit from final class > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:763) > at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) > at java.net.URLClassLoader.access$100(URLClassLoader.java:74) > at java.net.URLClassLoader$1.run(URLClassLoader.java:369) > at java.net.URLClassLoader$1.run(URLClassLoader.java:363) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:362) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.(OzoneManagerProtocolClientSideTranslatorPB.java:169) > at org.apache.hadoop.ozone.client.rpc.RpcClient.(RpcClient.java:142) > ... 20 more > Couldn't create protocol class org.apache.hadoop.ozone.client.rpc.RpcClient > {noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For
[jira] [Resolved] (HDDS-1305) Robot test containers: hadoop client can't access o3fs
[ https://issues.apache.org/jira/browse/HDDS-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton resolved HDDS-1305. Resolution: Duplicate Thanks to report this issue. It will be fixed in HDDS-1717 (Based on the timeline that one is the duplicate, but we have a working patch there, so I am closing this one). > Robot test containers: hadoop client can't access o3fs > -- > > Key: HDDS-1305 > URL: https://issues.apache.org/jira/browse/HDDS-1305 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Sandeep Nemuri >Assignee: Anu Engineer >Priority: Major > Attachments: run.log > > > Run the robot test using: > {code:java} > ./test.sh --keep --env ozonefs > {code} > login to OM container and check if we have desired volume/bucket/key got > created with robot tests. > {code:java} > [root@o3new ~]$ docker exec -it ozonefs_om_1 /bin/bash > bash-4.2$ ozone fs -ls o3fs://bucket1.fstest/ > Found 3 items > -rw-rw-rw- 1 hadoop hadoop 22990 2019-03-15 17:28 > o3fs://bucket1.fstest/KEY.txt > drwxrwxrwx - hadoop hadoop 0 1970-01-01 00:00 > o3fs://bucket1.fstest/testdir > drwxrwxrwx - hadoop hadoop 0 2019-03-15 17:27 > o3fs://bucket1.fstest/testdir1 > {code} > {code:java} > [root@o3new ~]$ docker exec -it ozonefs_hadoop3_1 /bin/bash > bash-4.4$ hadoop classpath > /opt/hadoop/etc/hadoop:/opt/hadoop/share/hadoop/common/lib/*:/opt/hadoop/share/hadoop/common/*:/opt/hadoop/share/hadoop/hdfs:/opt/hadoop/share/hadoop/hdfs/lib/*:/opt/hadoop/share/hadoop/hdfs/*:/opt/hadoop/share/hadoop/mapreduce/*:/opt/hadoop/share/hadoop/yarn:/opt/hadoop/share/hadoop/yarn/lib/*:/opt/hadoop/share/hadoop/yarn/*:/opt/ozone/share/ozone/lib/hadoop-ozone-filesystem-lib-current-0.5.0-SNAPSHOT.jar > bash-4.4$ hadoop fs -ls o3fs://bucket1.fstest/ > 2019-03-18 19:12:42 INFO Configuration:3204 - Removed undeclared tags: > 2019-03-18 19:12:42 ERROR OzoneClientFactory:294 - Couldn't create protocol > class org.apache.hadoop.ozone.client.rpc.RpcClient exception: > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:291) > at > org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:169) > at > org.apache.hadoop.fs.ozone.OzoneClientAdapterImpl.(OzoneClientAdapterImpl.java:127) > at > org.apache.hadoop.fs.ozone.OzoneFileSystem.initialize(OzoneFileSystem.java:189) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3354) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3403) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3371) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:477) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361) > at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:325) > at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:249) > at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:232) > at > org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:104) > at org.apache.hadoop.fs.shell.Command.run(Command.java:176) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:328) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:391) > Caused by: java.lang.VerifyError: Cannot inherit from final class > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:763) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) > at java.net.URLClassLoader.access$100(URLClassLoader.java:73) > at java.net.URLClassLoader$1.run(URLClassLoader.java:368) > at java.net.URLClassLoader$1.run(URLClassLoader.java:362) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:361) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at
[jira] [Created] (HDDS-1764) Fix hidden errors in acceptance tests
Elek, Marton created HDDS-1764: -- Summary: Fix hidden errors in acceptance tests Key: HDDS-1764 URL: https://issues.apache.org/jira/browse/HDDS-1764 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Bharat Viswanadham Assignee: Elek, Marton [~bharatviswa] pinged me offline with the problem that in some cases the smoketest is failing even if the reports are green: {code:java} All smoke tests are passed, but CI is showing as Failed. https://ci.anzix.net/job/ozone/17284/RobotTests/log.html https://github.com/apache/hadoop/pull/1048{code} The root cause is a few typo after HDDS-1698, which can be fixed with the uploaded PR. *What is the problem?* In case of any error during the test execution the smoketest is failed. In this case because the typo in two docker-compose.yaml files two of the tests can't be started. But there is no separated robot test report and the error is visible only in the console. *How did it happen?* The ACL work improved some intermittency in the acceptance tests. HDDS-1698 is committed because the acceptance tests were failed with ACL errors which hide the real error (the test was red anyway). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1763) Use vendor neutral s3 logo in ozone doc
Elek, Marton created HDDS-1763: -- Summary: Use vendor neutral s3 logo in ozone doc Key: HDDS-1763 URL: https://issues.apache.org/jira/browse/HDDS-1763 Project: Hadoop Distributed Data Store Issue Type: Bug Components: documentation Reporter: Bharat Viswanadham Assignee: Elek, Marton In HDDS-1639 we restructured the ozone documentation and a new overview page is added to the main page. This page contains an official aws logo, As [~bharatviswa] reported we are not sure about the exact condition to use logos / trademarks from Amazon. It's better to remain on the safe side and use a neutral S3 label. In this patch the aws logo is replaced with an orange cloud + s3 text. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1747) Support override of configuration annotations
Elek, Marton created HDDS-1747: -- Summary: Support override of configuration annotations Key: HDDS-1747 URL: https://issues.apache.org/jira/browse/HDDS-1747 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Elek, Marton Assignee: Stephen O'Donnell To support HDDS-1744 we need a way to override existing configuration defaults. For example given a main HttpConfiguration: {code:java} public class OzoneHttpServerConfig { private int httpBindPort; @Config(key = "http-bind-port", defaultValue = "9874", description = "The actual port the web server will listen on for HTTP " + "communication. If the " + "port is 0 then the server will start on a free port.", tags = {ConfigTag.OM, ConfigTag.MANAGEMENT}) public void setHttpBindPort(int httpBindPort) { this.httpBindPort = httpBindPort; } {code} We need an option to extend this class and override the default value: {code:java} @ConfigGroup(prefix = "hdds.datanode") public static class HttpConfig extends OzoneHttpServerConfig { @Override @ConfigOverride(defaultValue = "9882") public void setHttpBindPort(int httpBindPort) { super.setHttpBindPort(httpBindPort); } } {code} The expected behavior is a generated hdds.datanode.http-bind-port where the default is 9882. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1744) Improve BaseHttpServer to use typesafe configuration.
Elek, Marton created HDDS-1744: -- Summary: Improve BaseHttpServer to use typesafe configuration. Key: HDDS-1744 URL: https://issues.apache.org/jira/browse/HDDS-1744 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Elek, Marton Assignee: Stephen O'Donnell As it's defined in the parent task, we have a new typesafe way to define configuration based on annotations instead of constants. The next step is to replace existing code to use the new code. In this Jira I propose to improve the org.apache.hadoop.hdds.server.BaseHttpServer to use configuration object instead of constants. We need to create a generic configuration object with the right annotation: {code:java} public class OzoneHttpServerConfig{ private String httpBindHost @Config(key = "http-bind-host", defaultValue = "0.0.0.0", description = "The actual address the web server will bind to. If " + "this optional address is set, it overrides only the hostname" + " portion of http-address configuration value.", tags = {ConfigTag.OM, ConfigTag.MANAGEMENT}) public void setHttpBindHost(String httpBindHost) { this.httpBindHost = httpBindHost; } }{code} And we need to extend this basic configuration in all the HttpServer implementation: {code:java} public class OzoneManagerHttpServer extends BaseHttpServer{ @ConfigGroup(prefix = "ozone.om") public static class HttpConfig extends OzoneHttpServerConfig { @Override @ConfigOverride(defaultValue = "9874") public void setHttpBindPort(int httpBindPort) { super.setHttpBindPort(httpBindPort); } } }{code} Note: configuration keys used by HttpServer2 can't be replaced easly. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1743) Create service catalog endpoint in the SCM
Elek, Marton created HDDS-1743: -- Summary: Create service catalog endpoint in the SCM Key: HDDS-1743 URL: https://issues.apache.org/jira/browse/HDDS-1743 Project: Hadoop Distributed Data Store Issue Type: Sub-task Components: SCM Reporter: Elek, Marton Assignee: Stephen O'Donnell Based on the the design doc in the parent pom, we need a Service Catalog endpoint in the SCM. {code:java} public interface ServiceRegistry { void register(ServiceEndpoint endpoint) throws IOException; ServiceEndpoint findEndpoint(String serviceName, int instanceId); Collection getAllServices(); }{code} Where the ServiceEndpoint is something like this: {code:java} public class ServiceEndpoint { private String host; private String ip; private ServicePort port; private String serviceName; private int instanceId; ... } public class ServicePort { private ServiceProtocol protocol; private String name; private int port; ... } public enum ServiceProtocol { RPC, HTTP, GRPC }{code} The ServiceRegistry may have multiple implementation, but as a first step we need a simple implementation which calls a new endpoint on SCM via REST. The endpoint should persist the data to a local Rocksdb with the help of DBStore. This task is about to create the server and client implementation. In a follow-up Jira we can start to use the client on the om/datanode/client side to mix the service discovery data with the existing configuration. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1742) Merge ozone-perf and ozonetrace example clusters
Elek, Marton created HDDS-1742: -- Summary: Merge ozone-perf and ozonetrace example clusters Key: HDDS-1742 URL: https://issues.apache.org/jira/browse/HDDS-1742 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: docker Reporter: Elek, Marton Assignee: Istvan Fajth We have multiple example clusters in hadoop-ozone/dist/src/main/compose to demonstrate how different type of configuration can be set with ozone. But some of them can be consolidated. I propose to combine ozonetrace to ozoneperf to one ozoneperf which includes all the required components for a local performance testing: # opentracing (jaeger component in docker-compose + environment variables) # monitoring (grafana + prometheus) # perf profile (as of now it's enabled only in the ozone cluster[1]) [1] {code:java} cat compose/ozone/docker-config | grep prof OZONE-SITE.XML_hdds.profiler.endpoint.enabled=true ASYNC_PROFILER_HOME=/opt/profiler {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1741) Fix prometheus configuration in ozoneperf example cluster
Elek, Marton created HDDS-1741: -- Summary: Fix prometheus configuration in ozoneperf example cluster Key: HDDS-1741 URL: https://issues.apache.org/jira/browse/HDDS-1741 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: docker Affects Versions: 0.4.0 Reporter: Elek, Marton Assignee: Elek, Marton HDDS-1216 renamed the ozoneManager components to om in the docker-compose file. But the prometheus configuration of the compose/ozoneperf environment is not updated. We need to updated it to get meaningful metrics from om. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1735) Create separate unit and integration test executor dev-support script
Elek, Marton created HDDS-1735: -- Summary: Create separate unit and integration test executor dev-support script Key: HDDS-1735 URL: https://issues.apache.org/jira/browse/HDDS-1735 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Elek, Marton Assignee: Elek, Marton hadoop-ozone/dev-support/checks directory contains multiple helper script to execute different type of testing (findbugs, rat, unit, build). They easily define how tests should be executed, with the following contract: * The problems should be printed out to the console * in case of test failure a non zero exit code should be used The tests are working well (in fact I have some experiments with executing these scripts on k8s and argo where all the shell scripts are executed parallel) but we need some update: 1. Most important: the unit tests and integration tests can be separated. Integration tests are more flaky and it's better to have a way to run only the normal unit tests 2. As HDDS-1115 introduced a pom.ozone.xml it's better to use them instead of the magical "am pl hadoop-ozone-dist" trick-- 3. To make it possible to run blockade test in containers we should use - T flag with docker-compose 4. checkstyle violations are printed out to the console -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1725) pv-test example to test csi is not working
Elek, Marton created HDDS-1725: -- Summary: pv-test example to test csi is not working Key: HDDS-1725 URL: https://issues.apache.org/jira/browse/HDDS-1725 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Ratish Maruthiyodan Assignee: Elek, Marton [~rmaruthiyodan] reported two problems regarding to the pv-test example in csi examples folder. pv-test folder contains an example nginx deployment which can use an ozone PVC/PV to publish content of a folder via http. Two problems are identified: * The label based matching filter of service doesn't point to the nginx deployment * The configmap mounting is missing from nginx deployment -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1716) Smoketest results are generated with an internal user
Elek, Marton created HDDS-1716: -- Summary: Smoketest results are generated with an internal user Key: HDDS-1716 URL: https://issues.apache.org/jira/browse/HDDS-1716 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Elek, Marton Assignee: Elek, Marton [~eyang] reported the problem in HDDS-1609 that the smoketest results are generated a user (the user inside the docker container) which can be different from the host user. There is a minimal risk that the test results can be deleted/corrupted by an other users if the current user is different from uid=1000 I opened this issue because [~eyang] said me during an offline discussion that HDDS-1609 is a more complex issue and not only about the ownership of the test results. I suggest to handle the two problems in different way. With this patch, the permission of the test result files can be fixed easily. In HDDS-1609 we can discuss about general security problems and try to find generic solution for them. Steps to reproduce _this_ the problem: # Use a user which is different from uid=1000 # Create a new ozone build (mvn clean install -f pom.ozone.xml -DskipTests) # Go to a compose directory (cd hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/) # Execute tests (./test.sh) # check the ownership of the results (ls -lah ./results) Current result: the owner of the result files are the user uid=1000 Expected result: the owner of the files should be always the current user (even if the current uid is different) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1715) Update the Intellij runner definitition of SCM to use the new class name
Elek, Marton created HDDS-1715: -- Summary: Update the Intellij runner definitition of SCM to use the new class name Key: HDDS-1715 URL: https://issues.apache.org/jira/browse/HDDS-1715 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Tools Reporter: Elek, Marton Assignee: Stephen O'Donnell HDDS-1622 changed the CLI framework of SCM and with a new additional class (StorageContainerMangerStarter) it made it more testable. But the intellij runner definitions are not (yet) updated to use the new class name for SCM/SCM-init (they are updated for OM in HDDS-1660). We need to adjust the main class names in: {code:java} hadoop-ozone/dev-support/intellij/runConfigurations/StorageContainerManager.xml hadoop-ozone/dev-support/intellij/runConfigurations/StorageContainerManagerInit.xml{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1710) Publish JVM metrics via Hadoop metrics
Elek, Marton created HDDS-1710: -- Summary: Publish JVM metrics via Hadoop metrics Key: HDDS-1710 URL: https://issues.apache.org/jira/browse/HDDS-1710 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: om, Ozone Datanode, SCM Reporter: Elek, Marton Assignee: Elek, Marton In ozone metrics can be published with the help of hadoop metrics (for example via PrometheusMetricsSink) The basic jvm metrics are not published by the metrics system (just with JMX) I am very interested about the basic JVM metrics (gc count, heap memory usage) to identify possible problems in the test environment. Fortunately it's very easy to turn it on with the help of org.apache.hadoop.metrics2.source.JvmMetrics. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1709) TestScmSafeNode is flaky
Elek, Marton created HDDS-1709: -- Summary: TestScmSafeNode is flaky Key: HDDS-1709 URL: https://issues.apache.org/jira/browse/HDDS-1709 Project: Hadoop Distributed Data Store Issue Type: Bug Components: SCM, test Reporter: Elek, Marton Assignee: Elek, Marton org.apache.hadoop.ozone.om.TestScmSafeMode.testSCMSafeMode is failed at last night with the following error: {code:java} java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.ozone.om.TestScmSafeMode.testSCMSafeMode(TestScmSafeMode.java:285) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74){code} Locally it can be tested but it's very easy to reproduce by adding an additional sleep DataNodeSafeModeRule: {code:java} +++ b/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/DataNodeSafeModeRule.java @@ -63,7 +63,11 @@ protected boolean validate() { @Override protected void process(NodeRegistrationContainerReport reportsProto) { - + try { + Thread.sleep(3000); + } catch (InterruptedException e) { + e.printStackTrace(); + }{code} This is a clear race condition: DatanodeSafeModeRule and ContainerSafeModeRule are processing the same events but it can be possible (in case of an accidental sleep) that the container safe mode rule is done, but DatanodeSafeModeRule didn't process the new event (yet). As a result the test execution will continue: {code:java} GenericTestUtils .waitFor(() -> scm.getCurrentContainerThreshold() == 1.0, 100, 2); {code} (This line is waiting ONLY for the ContainerSafeModeRule). The fix is easy, let's wait for the processing of all the async events: {code:java} EventQueue eventQueue = (EventQueue) cluster.getStorageContainerManager().getEventQueue(); eventQueue.processAll(5000L);{code} As we are sure that the events are already sent to the EventQueue (because we have the previous waitFor), it should be enough. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1694) TestNodeReportHandler is failing with NPE
[ https://issues.apache.org/jira/browse/HDDS-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton resolved HDDS-1694. Resolution: Fixed > TestNodeReportHandler is failing with NPE > - > > Key: HDDS-1694 > URL: https://issues.apache.org/jira/browse/HDDS-1694 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Critical > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > {code:java} > FAILURE in > ozone-unit-076618677d39x4h9/unit/hadoop-hdds/server-scm/org.apache.hadoop.hdds.scm.node.TestNodeReportHandler.txt > --- > Test set: org.apache.hadoop.hdds.scm.node.TestNodeReportHandler > --- > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.43 s <<< > FAILURE! - in org.apache.hadoop.hdds.scm.node.TestNodeReportHandler > testNodeReport(org.apache.hadoop.hdds.scm.node.TestNodeReportHandler) Time > elapsed: 0.288 s <<< ERROR! > java.lang.NullPointerException > at > org.apache.hadoop.hdds.scm.node.SCMNodeManager.(SCMNodeManager.java:122) > at > org.apache.hadoop.hdds.scm.node.TestNodeReportHandler.resetEventCollector(TestNodeReportHandler.java:53) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > 2019-06-16 23:52:29,345 INFO node.SCMNodeManager > (SCMNodeManager.java:(119)) - Entering startup safe mode. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1698) Switch to use apache/ozone-runner in the compose/Dockerfile
Elek, Marton created HDDS-1698: -- Summary: Switch to use apache/ozone-runner in the compose/Dockerfile Key: HDDS-1698 URL: https://issues.apache.org/jira/browse/HDDS-1698 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: docker Reporter: Elek, Marton Assignee: Elek, Marton Since HDDS-1634 we have an ozone specific runner image to run ozone with docker-compose based pseudo clusters. As the new apache/ozone-runner image is uploaded to the dockerhub we can switch our scripts and use the new image. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1694) TestNodeReportHandler is failing with NPE
Elek, Marton created HDDS-1694: -- Summary: TestNodeReportHandler is failing with NPE Key: HDDS-1694 URL: https://issues.apache.org/jira/browse/HDDS-1694 Project: Hadoop Distributed Data Store Issue Type: Bug Components: SCM Reporter: Elek, Marton Assignee: Elek, Marton {code:java} FAILURE in ozone-unit-076618677d39x4h9/unit/hadoop-hdds/server-scm/org.apache.hadoop.hdds.scm.node.TestNodeReportHandler.txt --- Test set: org.apache.hadoop.hdds.scm.node.TestNodeReportHandler --- Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.43 s <<< FAILURE! - in org.apache.hadoop.hdds.scm.node.TestNodeReportHandler testNodeReport(org.apache.hadoop.hdds.scm.node.TestNodeReportHandler) Time elapsed: 0.288 s <<< ERROR! java.lang.NullPointerException at org.apache.hadoop.hdds.scm.node.SCMNodeManager.(SCMNodeManager.java:122) at org.apache.hadoop.hdds.scm.node.TestNodeReportHandler.resetEventCollector(TestNodeReportHandler.java:53) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) 2019-06-16 23:52:29,345 INFO node.SCMNodeManager (SCMNodeManager.java:(119)) - Entering startup safe mode. {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1682) TestEventWatcher.testMetrics is flaky
Elek, Marton created HDDS-1682: -- Summary: TestEventWatcher.testMetrics is flaky Key: HDDS-1682 URL: https://issues.apache.org/jira/browse/HDDS-1682 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Reporter: Elek, Marton Assignee: Elek, Marton TestEventWatcher is intermittent. (Failed twice out of 44 executions). Error is: {code} Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.764 s <<< FAILURE! - in org.apache.hadoop.hdds.server.events.TestEventWatcher testMetrics(org.apache.hadoop.hdds.server.events.TestEventWatcher) Time elapsed: 2.384 s <<< FAILURE! java.lang.AssertionError: expected:<2> but was:<3> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdds.server.events.TestEventWatcher.testMetrics(TestEventWatcher.java:197) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) {code} In the test we do the following: 1. fire start-event1 2. fire start-event2 3. fire start-event3 4. fire end-event1 5. wait Usually the event2 and event3 are timed out and event1 is completed but in case of an accidental time between 3 and 4 (in fact between 1 and 4) the event1 also can be timed out. I improved the unit test and fixed the metrics calculation (completed message should be incremented only if it's not yet timed out). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1680) Create missing parent directories during the creation of HddsVolume dirs
Elek, Marton created HDDS-1680: -- Summary: Create missing parent directories during the creation of HddsVolume dirs Key: HDDS-1680 URL: https://issues.apache.org/jira/browse/HDDS-1680 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Elek, Marton Assignee: Elek, Marton I started to execute all the unit tests continuously (in kubernetes with argo workflow). Until now I got the following failures (number of failures / unit test name): ``` 1 org.apache.hadoop.fs.ozone.contract.ITestOzoneContractMkdir 1 org.apache.hadoop.fs.ozone.contract.ITestOzoneContractRename 3 org.apache.hadoop.hdds.scm.container.placement.algorithms.TestSCMContainerPlacementRackAware 31 org.apache.hadoop.ozone.container.common.TestDatanodeStateMachine 31 org.apache.hadoop.ozone.container.common.volume.TestVolumeSet 1 org.apache.hadoop.ozone.freon.TestDataValidateWithSafeByteOperations ``` TestVolumeSet is also failed locally: {code} 2019-06-13 14:23:18,637 ERROR volume.VolumeSet (VolumeSet.java:initializeVolumeSet(184)) - Failed to parse the storage location: /home/elek/projects/hadoop/hadoop-hdds/container-service/target/test-dir/dfs java.io.IOException: Cannot create directory /home/elek/projects/hadoop/hadoop-hdds/container-service/target/test-dir/dfs/hdds at org.apache.hadoop.ozone.container.common.volume.HddsVolume.initialize(HddsVolume.java:208) at org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:179) at org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:72) at org.apache.hadoop.ozone.container.common.volume.HddsVolume$Builder.build(HddsVolume.java:156) at org.apache.hadoop.ozone.container.common.volume.VolumeSet.createVolume(VolumeSet.java:311) at org.apache.hadoop.ozone.container.common.volume.VolumeSet.initializeVolumeSet(VolumeSet.java:165) at org.apache.hadoop.ozone.container.common.volume.VolumeSet.(VolumeSet.java:130) at org.apache.hadoop.ozone.container.common.volume.VolumeSet.(VolumeSet.java:109) at org.apache.hadoop.ozone.container.common.volume.TestVolumeSet.testFailVolumes(TestVolumeSet.java:232) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} The problem here is that the parent directory of the volume dir is missing. I propose to use hddsRootDir.mkdirs() instead of hddsRootDir.mkdir() which creates the missing parent directories. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1678) Default image name for kubernetes examples should be ozone and not hadoop
Elek, Marton created HDDS-1678: -- Summary: Default image name for kubernetes examples should be ozone and not hadoop Key: HDDS-1678 URL: https://issues.apache.org/jira/browse/HDDS-1678 Project: Hadoop Distributed Data Store Issue Type: Bug Components: docker Reporter: Elek, Marton Assignee: Elek, Marton During the build the kubernetes example files are adjusted to use a specific docker image name. By default it should be the apache/ozone:${VERSION} to make it possible to use the examples without any build from the release artifact. With the examples of the release artifact the user can use the latest released apache/ozone:${VERSION} from docker hub. For development build the image can be set with -Ddocker.image (or -Dozone.docker.image with HDDS-1667). Unfortunately -- due to a small typo -- apace/hadoop image is used by default instead of apache/ozone. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1677) Auditparser robot test shold use a world writable working directory
Elek, Marton created HDDS-1677: -- Summary: Auditparser robot test shold use a world writable working directory Key: HDDS-1677 URL: https://issues.apache.org/jira/browse/HDDS-1677 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Elek, Marton Assignee: Elek, Marton When I tried to reproduce a problem which is reported by [~eyang], I found that the auditparser robot test uses the /opt/hadoop directory as a working directory to generate the audit.db export. /opt/hadoop is may or may not be writable, it's better to use /tmp instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1674) Make ScmBlockLocationProtocol message type based
Elek, Marton created HDDS-1674: -- Summary: Make ScmBlockLocationProtocol message type based Key: HDDS-1674 URL: https://issues.apache.org/jira/browse/HDDS-1674 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Elek, Marton Assignee: Shweta Most of the Ozone protocols are "message type based" and not "method based". For example in OzoneManagerProtocol.proto there is only one method: {code} service OzoneManagerService { // A client-to-OM RPC to send client requests to OM Ratis server rpc submitRequest(OMRequest) returns(OMResponse); } {code} And the exact method is determined by the type of the message: {code} message OMResponse { required Type cmdType = 1; // Type of the command // A string that identifies this command, we generate Trace ID in Ozone // frontend and this allows us to trace that command all over ozone. optional string traceID = 2; optional bool success = 3 [default=true]; optional string message = 4; required Status status = 5; optional string leaderOMNodeId = 6; optional CreateVolumeResponse createVolumeResponse = 11; optional SetVolumePropertyResponse setVolumePropertyResponse = 12; optional CheckVolumeAccessResponse checkVolumeAccessResponse = 13; } enum Type { CreateVolume = 11; SetVolumeProperty = 12; CheckVolumeAccess = 13; InfoVolume = 14; DeleteVolume = 15; ListVolume = 16; {code} This is not the most natural way to use protobuf services but it has the additional benefit that we can propagate traceId / exception in a common way. Earlier there was an agreement to modify all the protocols to use this "message type based" approach to make it possible to provide proper error handling. In this issue the ScmBlockLocationProtocol.proto should be modified to use only one message: {code} service ScmBlockLocationProtocolService { rpc send (SCMBlockLocationRequest) returns (SCMBlockLocationResponse); } {code} It also requires to create the common request and response objects (with the common fields like type, traceId, success, message, status as they are used in the OzoneManagerProtocol.proto). To make it work, the ScmBlockLocationProtocolClientSideTranslatorPB and the ScmBlockLocationProtocolServerSideTranslatorPB should be improved to wrap/unwrap the original message to/from the generic message. I propose to only the protocol change here (if possible) we can keep the message/status fields empty and fix the error propagation in HDDS-1258 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDDS-1659) Define the process to add proposal/design docs to the Ozone subproject
[ https://issues.apache.org/jira/browse/HDDS-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton reopened HDDS-1659: > Define the process to add proposal/design docs to the Ozone subproject > -- > > Key: HDDS-1659 > URL: https://issues.apache.org/jira/browse/HDDS-1659 > Project: Hadoop Distributed Data Store > Issue Type: Task >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Major > Labels: pull-request-available > Fix For: 0.4.1 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > We think that it would be more effective to collect all the design docs in > one place and make it easier to review them by the community. > We propose to follow an approach where the proposals are committed to the > hadoop-hdds/docs project and the review can be the same as a review of a PR -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1673) mapreduce smoketests are failig because an acl error
Elek, Marton created HDDS-1673: -- Summary: mapreduce smoketests are failig because an acl error Key: HDDS-1673 URL: https://issues.apache.org/jira/browse/HDDS-1673 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Elek, Marton After executing this command: {code} yarn jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0.jar pi 3 3 2 {code} The result is: {code} Number of Maps = 3 Samples per Map = 3 2019-06-12 03:16:20 ERROR OzoneClientFactory:294 - Couldn't create protocol class org.apache.hadoop.ozone.client.rpc.RpcClient exception: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:291) at org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:169) at org.apache.hadoop.fs.ozone.BasicOzoneClientAdapterImpl.(BasicOzoneClientAdapterImpl.java:134) at org.apache.hadoop.fs.ozone.OzoneClientAdapterImpl.(OzoneClientAdapterImpl.java:50) at org.apache.hadoop.fs.ozone.OzoneFileSystem.createAdapter(OzoneFileSystem.java:103) at org.apache.hadoop.fs.ozone.BasicOzoneFileSystem.initialize(BasicOzoneFileSystem.java:143) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:227) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:463) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:522) at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:275) at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:360) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:368) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:323) at org.apache.hadoop.util.RunJar.main(RunJar.java:236) Caused by: org.apache.hadoop.hdds.conf.ConfigurationException: Can't inject configuration to class org.apache.hadoop.ozone.security.acl.OzoneAclConfig.setUserDefaultRights at org.apache.hadoop.hdds.conf.OzoneConfiguration.getObject(OzoneConfiguration.java:160) at org.apache.hadoop.ozone.client.rpc.RpcClient.(RpcClient.java:148) ... 36 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hdds.conf.OzoneConfiguration.getObject(OzoneConfiguration.java:137) ... 37 more Caused by: java.lang.NullPointerException: Name is null at java.lang.Enum.valueOf(Enum.java:236) at org.apache.hadoop.ozone.security.acl.IAccessAuthorizer$ACLType.valueOf(IAccessAuthorizer.java:48) at org.apache.hadoop.ozone.security.acl.OzoneAclConfig.setUserDefaultRights(OzoneAclConfig.java:43) ... 42 more java.io.IOException:
[jira] [Created] (HDDS-1669) SCM startup is failing if network-topology-default.xml is part of a jar
Elek, Marton created HDDS-1669: -- Summary: SCM startup is failing if network-topology-default.xml is part of a jar Key: HDDS-1669 URL: https://issues.apache.org/jira/browse/HDDS-1669 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Elek, Marton Assignee: Elek, Marton network-topology-default.xml can be loaded from file or classpath. But the NodeSchemaLoader assumes that the files on the classpath can be opened as a file. It's true if the file is in etc/hadoop (which is part of the classpath) but not true if the file is packaged to a jajr file: {code} scm_1 | 2019-06-11 13:18:03 INFO NodeSchemaLoader:118 - Loading file from jar:file:/opt/hadoop/share/ozone/lib/hadoop-hdds-common-0.5.0-SNAPSHOT.jar!/network-topology-default.xml scm_1 | 2019-06-11 13:18:03 ERROR NodeSchemaManager:74 - Failed to load schema file:network-topology-default.xml, error: scm_1 | java.lang.IllegalArgumentException: URI is not hierarchical scm_1 | at java.io.File.(File.java:418) scm_1 | at org.apache.hadoop.hdds.scm.net.NodeSchemaLoader.loadSchemaFromFile(NodeSchemaLoader.java:119) scm_1 | at org.apache.hadoop.hdds.scm.net.NodeSchemaManager.init(NodeSchemaManager.java:67) scm_1 | at org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.(NetworkTopologyImpl.java:63) scm_1 | at org.apache.hadoop.hdds.scm.server.StorageContainerManager.initializeSystemManagers(StorageContainerManager.java:382) scm_1 | at org.apache.hadoop.hdds.scm.server.StorageContainerManager.(StorageContainerManager.java:275) scm_1 | at org.apache.hadoop.hdds.scm.server.StorageContainerManager.(StorageContainerManager.java:208) scm_1 | at org.apache.hadoop.hdds.scm.server.StorageContainerManager.createSCM(StorageContainerManager.java:586) scm_1 | at org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter$SCMStarterHelper.start(StorageContainerManagerStarter.java:139) scm_1 | at org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.startScm(StorageContainerManagerStarter.java:115) scm_1 | at org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:67) scm_1 | at org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:42) scm_1 | at picocli.CommandLine.execute(CommandLine.java:1173) scm_1 | at picocli.CommandLine.access$800(CommandLine.java:141) scm_1 | at picocli.CommandLine$RunLast.handle(CommandLine.java:1367) scm_1 | at picocli.CommandLine$RunLast.handle(CommandLine.java:1335) scm_1 | at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243) scm_1 | at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526) scm_1 | at picocli.CommandLine.parseWithHandler(CommandLine.java:1465) scm_1 | at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65) scm_1 | at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56) scm_1 | at org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.main(StorageContainerManagerStarter.java:56) scm_1 | Failed to load schema file:network-topology-default.xml, error: {code} The quick fix is to keep the current behaviour but read the file from classloader.getResourceAsStream() instead of classloader.getResource().toURI() -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1668) Add liveness probe to the example k8s resources files
Elek, Marton created HDDS-1668: -- Summary: Add liveness probe to the example k8s resources files Key: HDDS-1668 URL: https://issues.apache.org/jira/browse/HDDS-1668 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Elek, Marton Assignee: Elek, Marton In kubernetes resources we can define livebess probes which can help to detect any failure. If the define port is not available the pod will be rescheduled. We need to add the liveness probes to our k8s resource files. Note: We shouldn't add readiness probes. Readiness probe is about the service availability. The service/dns can be available only after the service is restarted. This is not good for us as: * We need DNS resolution during the startup (See OzoneManager.loadOMHAConfigs) * We already implemented retry in case of missing DNS entries -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1622) Use picocli for StorageContainerManager
[ https://issues.apache.org/jira/browse/HDDS-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton resolved HDDS-1622. Resolution: Fixed > Use picocli for StorageContainerManager > --- > > Key: HDDS-1622 > URL: https://issues.apache.org/jira/browse/HDDS-1622 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Elek, Marton >Assignee: Stephen O'Donnell >Priority: Major > Labels: pull-request-available > Time Spent: 4h 10m > Remaining Estimate: 0h > > Recently we switched to use PicoCli with (almost) all of our daemons (eg. s3 > Gateway, Freon, etc.) > PicoCli has better output, it can generate nice help, and easier to use as > it's enough to put a few annotations and we don't need to add all the > boilerplate code to print out help, etc. > StorageContainerManager and OzoneManager is not yet supported. The previous > issue was closed HDDS-453 but since then we improved the GenericCli parser > (eg. in HDDS-1192), so I think we are ready to move. > The main idea is to create a starter java similar to > org.apache.hadoop.ozone.s3.Gateway and we can start StorageContainerManager > from there. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1659) Define the process to add proposal/design docs to the Ozone subproject
Elek, Marton created HDDS-1659: -- Summary: Define the process to add proposal/design docs to the Ozone subproject Key: HDDS-1659 URL: https://issues.apache.org/jira/browse/HDDS-1659 Project: Hadoop Distributed Data Store Issue Type: Task Reporter: Elek, Marton Assignee: Elek, Marton We think that it would be more effective to collect all the design docs in one place and make it easier to review them by the community. We propose to follow an approach where the proposals are committed to the hadoop-hdds/docs project and the review can be the same as a review of a PR -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1646) Support real persistent in the k8s example files
Elek, Marton created HDDS-1646: -- Summary: Support real persistent in the k8s example files Key: HDDS-1646 URL: https://issues.apache.org/jira/browse/HDDS-1646 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Elek, Marton Assignee: Elek, Marton Ozone release contains example k8s deployment files to make it easier to deploy Ozone to kubernetes. As of now we use emptyDir everywhere, we should support the configuration of host volumes (hostPath or Local Persistent volumes). The big question here is the default: * Make the examples easy to start and ephemeral * Make the examples more safe, by default (but couldn't be started without additional administration). (Note this conversation is started in the review of HDDS-1508) Xiaoyu: Can we support mount hostVolume for datanode daemons? Marton: Yes, we can. AFAIK there are two options: * using [hostPath](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath) * or with [Local PersistentVolumes](https://kubernetes.io/blog/2018/04/13/local-persistent-volumes-beta/) The first one requires the knowledge of directory names on the host. The second one is recommended but it requires the creation of PersistentVolumes or install a PersistentVolume provider I am not sure what is the best approach, my current proposal is: * Use empty dir everywhere to make it easier to start simple ozone cluster * Provide simple option to turn on any of theses persistence (the kubernetes files are generated and the generation can be parametrized) * Document how to customize the kubernetes resources files Summary: it's question of the defaults: 1. Use a complex, but persistent solution, which may not work out of the box 2. Use a simple, but ephemeral solution (as default) I started to use (2) but I am open to change. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1640) Reduce the size of recon jar file
[ https://issues.apache.org/jira/browse/HDDS-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton resolved HDDS-1640. Resolution: Fixed > Reduce the size of recon jar file > - > > Key: HDDS-1640 > URL: https://issues.apache.org/jira/browse/HDDS-1640 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Recon >Reporter: Elek, Marton >Assignee: Vivek Ratnavel Subramanian >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > hadoop-ozone-recon-0.5.0-SNAPSHOT.jar is 73 MB, mainly because the > node_modules are included (full typescript source, eslint, babel, etc.): > {code} > unzip -l hadoop-ozone-recon-0.5.0-SNAPSHOT.jar | grep node_modules | wc > {code} > Fix me if I am wrong, but I think node_modules is not required in the > distribution as the dependencies are already included in the compiled > javascript files. > I propose to remove the node_modules from the jar file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1642) Avoid shell references relative to the current script path
Elek, Marton created HDDS-1642: -- Summary: Avoid shell references relative to the current script path Key: HDDS-1642 URL: https://issues.apache.org/jira/browse/HDDS-1642 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Eric Yang This is based on the review comment from [~eyang]: bq. You might need pwd -P to resolve symlinks. I don't recommend to use script location to make decision of where binaries are supposed to be because someone else can make newbie mistake and refactor your script to invalid the original coding intend. See this blog to explain the right way to get the directory of a bash script. This is the reason that I used OZONE_HOME as base reference frequently. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1641) Csi server fails because transitive Netty dependency
Elek, Marton created HDDS-1641: -- Summary: Csi server fails because transitive Netty dependency Key: HDDS-1641 URL: https://issues.apache.org/jira/browse/HDDS-1641 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Elek, Marton Assignee: Elek, Marton CSI server can't be started because an ClassNotFound exception. It turned out that with using the new configuration api we got old netty jar files as transitive dependencies. (hdds-configuration depends on hadoop-common, hadoop-commons depends on the word) We should exclude all the old netty version from the classpath of the CSI server. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDDS-1628) Fix the execution and return code of smoketest executor shell script
[ https://issues.apache.org/jira/browse/HDDS-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton reopened HDDS-1628: > Fix the execution and return code of smoketest executor shell script > > > Key: HDDS-1628 > URL: https://issues.apache.org/jira/browse/HDDS-1628 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Blocker > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Problem: Some of the smoketest executions were reported to green even if they > contained failed tests. > Root cause: the legacy test executor > (hadoop-ozone/dist/src/main/smoketest/test.sh) which just calls the new > executor script (hadoop-ozone/dist/src/main/compose/test-all.sh) didn't > handle the return code well (the failure of the smoketests should be > signalled by the bash return code) > This patch: > * Fixes the error code handling in smoketest/test.sh > * Fixes the test execution in compose/test-all.sh (should work from any > other directories) > * Updates hadoop-ozone/dev-support/checks/acceptance.sh to use the newer > test-all.sh executor instead of the old one. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1639) Restructure documentation pages for better understanding
Elek, Marton created HDDS-1639: -- Summary: Restructure documentation pages for better understanding Key: HDDS-1639 URL: https://issues.apache.org/jira/browse/HDDS-1639 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Elek, Marton Assignee: Elek, Marton Documentation page should be updated according to the recent changes: In the uploaded PR I modified the following: # Pages are restructured to use a similar structure what is intruced on the wiki by [~anu]. (Getting started guides are separated for different environments) # The width of the menu is increased (to make it more readable) # The logo is moved from the main page from the menu (to get more space for the menu items) # 'Requirements' section is added to each 'Getting started' page # Test tools / docker image / kubernetes pages are imported from the wiki. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1636) Tracing id is not propagated via async datanode grpc call
Elek, Marton created HDDS-1636: -- Summary: Tracing id is not propagated via async datanode grpc call Key: HDDS-1636 URL: https://issues.apache.org/jira/browse/HDDS-1636 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Elek, Marton Recently a new exception become visible in the datanode logs, using standard freon (STANDLAONE) {code} datanode_2 | 2019-06-03 12:18:21 WARN PropagationRegistry$ExceptionCatchingExtractorDecorator:60 - Error when extracting SpanContext from carrier. Handling gracefully. datanode_2 | io.jaegertracing.internal.exceptions.MalformedTracerStateStringException: String does not match tracer state format: 7576cabf-37a4-4232-9729-939a3fdb68c4WriteChunk150a8a848a951784256ca0801f7d9cf8b_stream_ed583cee-9552-4f1a-8c77-63f7d07b755f_chunk_1 datanode_2 | at org.apache.hadoop.hdds.tracing.StringCodec.extract(StringCodec.java:49) datanode_2 | at org.apache.hadoop.hdds.tracing.StringCodec.extract(StringCodec.java:34) datanode_2 | at io.jaegertracing.internal.PropagationRegistry$ExceptionCatchingExtractorDecorator.extract(PropagationRegistry.java:57) datanode_2 | at io.jaegertracing.internal.JaegerTracer.extract(JaegerTracer.java:208) datanode_2 | at io.jaegertracing.internal.JaegerTracer.extract(JaegerTracer.java:61) datanode_2 | at io.opentracing.util.GlobalTracer.extract(GlobalTracer.java:143) datanode_2 | at org.apache.hadoop.hdds.tracing.TracingUtil.importAndCreateScope(TracingUtil.java:102) datanode_2 | at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:148) datanode_2 | at org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:73) datanode_2 | at org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:61) datanode_2 | at org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:248) datanode_2 | at org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33) datanode_2 | at org.apache.ratis.thirdparty.io.grpc.Contexts$ContextualizedServerCallListener.onMessage(Contexts.java:76) datanode_2 | at org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33) datanode_2 | at org.apache.hadoop.hdds.tracing.GrpcServerInterceptor$1.onMessage(GrpcServerInterceptor.java:46) datanode_2 | at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:263) datanode_2 | at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:686) datanode_2 | at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) datanode_2 | at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) datanode_2 | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) datanode_2 | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) {code} It turned out that the tracingId propagation between XCeiverClient and Server doesn't work very well (in case of Standalone and async commands) 1. there are many places (on the client side) where the traceId filled with UUID.randomUUID().toString(); 2. This random id is propagated between the Output/InputStream and different part of the clients 3. It is unnecessary, because in the XceiverClientGrpc and XceiverClientGrpc the traceId field is overridden with the real opentracing id anyway (sendCommand/sendCommandAsync) 4. Except in the XceiverClientGrpc.sendCommandAsync where this part is accidentally missing. Things to fix: 1. fix XceiverClientGrpc.sendCommandAsync (replace any existing traceId with the good one) 2. remove the usage of the UUID based traceId (it's not used) 3. Improve the error logging in case of an invalid traceId on the server side. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1635) Maintain docker entrypoint and envtoconf inside ozone project
Elek, Marton created HDDS-1635: -- Summary: Maintain docker entrypoint and envtoconf inside ozone project Key: HDDS-1635 URL: https://issues.apache.org/jira/browse/HDDS-1635 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Elek, Marton Assignee: Elek, Marton During an offline discussion with [~eyang] and [~arp], Eric suggested to maintain the source of the docker specific start images inside the main ozone branch (trunk) instead of the branch of the docker image. With this approach the ozone-runner image can be a very lightweight image and the entrypoint logic can be versioned together with the ozone itself. An other use case is a container creation script. Recently we [documented|https://cwiki.apache.org/confluence/display/HADOOP/Ozone+Docker+images] that hadoop-runner/ozone-runner/ozone images are not for production (for example because they contain development tools). We can create a helper tool (similar what Spark provides) to create Ozone container images from any production ready base image. But this tool requires the existence of the scripts inside the distribution. (ps: I think sooner or later the functionality of envtoconf.py can be added to the OzoneConfiguration java class and we can parse the configuration values directly from environment variables. In this patch I copied the required scripts to the ozone source tree and the new ozone-runner image (HDDS-1634) is designed to use it from this specific location. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1634) Introduce a new ozone specific runner image
Elek, Marton created HDDS-1634: -- Summary: Introduce a new ozone specific runner image Key: HDDS-1634 URL: https://issues.apache.org/jira/browse/HDDS-1634 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Elek, Marton Assignee: Elek, Marton Ozone compose files use apache/hadoop-runner to provide a fixed environment to run any Ozone distribution. It can be better to use separated hadoop-runner and ozone-runner: 1. To make it easier to include Ozone specific behaviour (For example goofys install, scm/om initialization) 2. To make it clean which feature is required by all the subprojects of Hadoop and which one is Ozone specific (base on the comment from [~eyang] in HADOOP-16092) 3. for hadoop-runner we maintain two tags (jdk11/jdk8/latest). And it seems to be hard to maintain all of them. jdk8 is required only for hadoop and with separating hadoop-runner/ozone-runner we can use only one simple branch for ozone-runner development (and we can create incremental fixed tags very easily) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1633) Update rat from 0.12 to 0.13 in hadoop-runner build script
Elek, Marton created HDDS-1633: -- Summary: Update rat from 0.12 to 0.13 in hadoop-runner build script Key: HDDS-1633 URL: https://issues.apache.org/jira/browse/HDDS-1633 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Elek, Marton Assignee: Elek, Marton We have a new rat, the old one is not available. The url should be updated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1632) Make the hadoop home word readble and avoid sudo in hadoop-runner
Elek, Marton created HDDS-1632: -- Summary: Make the hadoop home word readble and avoid sudo in hadoop-runner Key: HDDS-1632 URL: https://issues.apache.org/jira/browse/HDDS-1632 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Elek, Marton Assignee: Elek, Marton [~eyang] reporeted in HDDS-1609 that the hadoop-runner image can be started *without* mounting a real hadoop (usually, it's ounted) AND using a different uid: {code} docker run -it -u $(id -u):$(id -g) apache/hadoop-runner bash docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "chdir to cwd (\"/opt/hadoop\") set in config.json failed: permission denied": unknown. {code} There are two blocking problems here: * the /opt/hadoop directory (which is the CWD inside the container) is 700 instead of 755 * The usage of sudo in started scripts (sudo is not possible if the real user is not added to the /etc/passwd) Both of them are addressed by this patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1631) Fix auditparser smoketests
Elek, Marton created HDDS-1631: -- Summary: Fix auditparser smoketests Key: HDDS-1631 URL: https://issues.apache.org/jira/browse/HDDS-1631 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Elek, Marton Assignee: Elek, Marton In HDDS-1518 we modified the location of the var and config files inside the container. There are three problems with the current auditparser smokest: 1. The default audit log4j files are not part of the new config directory (fixed with HDDS-1630) 2. The smoketest is executed in scm container instead of om 3. The log directory is hard coded The 2 and 3 will be fined in this patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1630) Copy default configuration files to the writeable directory
Elek, Marton created HDDS-1630: -- Summary: Copy default configuration files to the writeable directory Key: HDDS-1630 URL: https://issues.apache.org/jira/browse/HDDS-1630 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Elek, Marton Assignee: Elek, Marton HDDS-1518 separates the read-only directories (/opt/ozone, /opt/hadoop) from the read-write directories (/etc/hadoop, /var/log/hadoop). The configuration directory and log directory should be writeable and to make it easier to run the docker-compose based pseudo clusters with *different* host uid we started to use different config dir. But we need all the defaults in the configuration dir. In this patch I add a small fragments to the hadoop-runner image to copy the default files (if available). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDDS-1597) Remove hdds-server-scm dependency from ozone-common
[ https://issues.apache.org/jira/browse/HDDS-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton reopened HDDS-1597: > Remove hdds-server-scm dependency from ozone-common > --- > > Key: HDDS-1597 > URL: https://issues.apache.org/jira/browse/HDDS-1597 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Major > Labels: pull-request-available > Fix For: 0.4.1 > > Attachments: ozone-dependency.png > > Time Spent: 2h 50m > Remaining Estimate: 0h > > I noticed that the hadoop-ozone/common project depends on > hadoop-hdds-server-scm project. > The common projects are designed to be a shared artifacts between client and > server side. Adding additional dependency to the common pom means that the > dependency will be available for all the clients as well. > (See the attached artifact about the current, desired structure). > We definitely don't need scm server dependency on the client side. > The code dependency is just one class (ScmUtils) and the shared code can be > easily moved to the common. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1629) Tar file creation can be option for non-dist builds
Elek, Marton created HDDS-1629: -- Summary: Tar file creation can be option for non-dist builds Key: HDDS-1629 URL: https://issues.apache.org/jira/browse/HDDS-1629 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Elek, Marton Assignee: Elek, Marton Ozone tar file creation is a very time consuming step. I propose to make it optional and create the tar file only if the dist profile is enabled (-Pdist) The tar file is not required to test ozone as the same content is available from hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT which is enough to run docker-compose pseudo clusters, smoketests. If it's required, the tar file creation can be requested by the dist profile. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1628) Fix the execution and retur code of smoketest executor shell script
Elek, Marton created HDDS-1628: -- Summary: Fix the execution and retur code of smoketest executor shell script Key: HDDS-1628 URL: https://issues.apache.org/jira/browse/HDDS-1628 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Elek, Marton Problem: Some of the smoketest executions were reported to green even if they contained failed tests. Root cause: the legacy test executor (hadoop-ozone/dist/src/main/smoketest/test.sh) which just calls the new executor script (hadoop-ozone/dist/src/main/compose/test-all.sh) didn't handle the return code well (the failure of the smoketests should be signalled by the bash return code) This patch: * Fixes the error code handling in smoketest/test.sh * Fixes the test execution in compose/test-all.sh (should work from any other directories) * Updates hadoop-ozone/dev-support/checks/acceptance.sh to use the newer test-all.sh executor instead of the old one. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1627) Make the version of the used hadoop-runner configurable
Elek, Marton created HDDS-1627: -- Summary: Make the version of the used hadoop-runner configurable Key: HDDS-1627 URL: https://issues.apache.org/jira/browse/HDDS-1627 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Elek, Marton During an offline discussion with [~arp] and [~eyang] we agreed that it could be more safe to fix the tag of the used hadoop-runner images during the releases. It also requires fix tags from hadoop-runner, but after that it's possible to use the fixed tags. This patch makes it possible to define the required version/tag in pom.xml 1. the default hadoop-runner.version is added to all .env files during the build 2. If a variable is added to the .env, it can be used from docker-compose files AND can be overridden by environment variables (it makes it possible to define custom version during a local run) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1622) Use picocli for StorageContainerManager
Elek, Marton created HDDS-1622: -- Summary: Use picocli for StorageContainerManager Key: HDDS-1622 URL: https://issues.apache.org/jira/browse/HDDS-1622 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Elek, Marton Assignee: Stephen O'Donnell Recently we switched to use PicoCli with (almost) all of our daemons (eg. s3 Gateway, Freon, etc.) PicoCli has better output, it can generate nice help, and easier to use as it's enough to put a few annotations and we don't need to add all the boilerplate code to print out help, etc. StorageContainerManager and OzoneManager is not yet supported. The previous issue was closed HDDS-453 but since then we improved the GenericCli parser (eg. in HDDS-1192), so I think we are ready to move. The main idea is to create a starter java similar to org.apache.hadoop.ozone.s3.Gateway and we can start StorageContainerManager from there. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1607) Create smoketest for non-secure mapreduce example
Elek, Marton created HDDS-1607: -- Summary: Create smoketest for non-secure mapreduce example Key: HDDS-1607 URL: https://issues.apache.org/jira/browse/HDDS-1607 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Elek, Marton Assignee: Elek, Marton We had multiple problems earlier with the classpath separation and the internal ozonefs classloader. Before fixing all the issues I propose to create a smoketest to detect if the classpath separation is broken again . As a first step I created an smoketest/ozone-mr environment (based on the work of [~xyao], which is secure) and a smoketest Possible follow-up works: * Adapt the test.sh for the ozonesecure-mr * Include test runs with older hadoop versions -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1598) Fix Ozone checkstyle issues on trunk
Elek, Marton created HDDS-1598: -- Summary: Fix Ozone checkstyle issues on trunk Key: HDDS-1598 URL: https://issues.apache.org/jira/browse/HDDS-1598 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Elek, Marton Assignee: Elek, Marton Some small checkstyle issues are accidentally committed with HDDS-700. Trivial fixes are coming here... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1597) Remove hdds-server-scm dependency from ozone-common
Elek, Marton created HDDS-1597: -- Summary: Remove hdds-server-scm dependency from ozone-common Key: HDDS-1597 URL: https://issues.apache.org/jira/browse/HDDS-1597 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Elek, Marton Assignee: Elek, Marton I noticed that the hadoop-ozone/common project depends on hadoop-hdds-server-scm project. The common projects are designed to be a shared artifacts between client and server side. Adding additional dependency to the common pom means that the dependency will be available for all the clients as well. We definitely don't need scm server dependency on the client side. The code dependency is just one class (ScmUtils) and the shared code can be easily moved to the common. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1596) Create service endpoint to download configuration from SCM
Elek, Marton created HDDS-1596: -- Summary: Create service endpoint to download configuration from SCM Key: HDDS-1596 URL: https://issues.apache.org/jira/browse/HDDS-1596 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Elek, Marton Assignee: Elek, Marton As written in the design doc (see the parent issue) it was proposed to download the configuration from the scm by the other services. I propose to create a separated endpoint to provide the ozone configuration. /conf can't be used as it contains *all* the configuration and we need only the modified configuration. The easiest way to implement this feature is: * Create a simple rest endpoint which publishes all the configuration * Download the configurations to $HADOOP_CONF_DIR/ozone-global.xml during the service startup. * Add ozone-global.xml as an additional config source (before ozone-site.xml but after ozone-default.xml) * The download can be optional With this approach we keep the support of the existing manual configuration (ozone-site.xml has higher priority) but we can download the configuration to a separated file during the startup, which will be loaded. There is no magic: the configuration file is saved and it's easy to debug what's going on as the OzoneConfiguration is loaded from the $HADOOP_CONF_DIR as before. Possible follow-up steps: * Migrate all the other services (recon, s3g) to the new approach. (possible newbie jiras) * Improve the CLI to define the SCM address. (As of now we use ozone.scm.names) * Create a service/hostname registration mechanism and autofill some of the configuration based on the topology information. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1565) Rename k8s-dev and k8s-dev-push profiles to docker-build and docker-push
[ https://issues.apache.org/jira/browse/HDDS-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton resolved HDDS-1565. Resolution: Fixed > Rename k8s-dev and k8s-dev-push profiles to docker-build and docker-push > > > Key: HDDS-1565 > URL: https://issues.apache.org/jira/browse/HDDS-1565 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Based on the feedback from [~eyang] I realized that the names of the k8s-dev > and k8s-dev-push profiles are not expressive enough as the created containers > can be used not only for kubernetes but can be used together with any other > container orchestrator. > I propose to rename them to docker-build/docker-push. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org