[jira] [Commented] (YARN-8275) Create a JNI interface to interact with Windows
[ https://issues.apache.org/jira/browse/YARN-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474424#comment-16474424 ] Íñigo Goiri commented on YARN-8275: --- [~aw], thanks for the feedback, much appreciated. It looks like we can put all you proposed together into an umbrella for fixing the way Hadoop interacts with Windows. >From this thread, I see: * Move away from an external processese (winutils.exe) for native code: ** Replace by native Java APIs (e.g., symlinks) ** Replace by something like JNI or so * Fix the build system to fully leverage cmake instead of msbuild I would create an umbrella for this bigger task and make this JIRA just a subtask focusing on the YARN side (e.g., task). > Create a JNI interface to interact with Windows > --- > > Key: YARN-8275 > URL: https://issues.apache.org/jira/browse/YARN-8275 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: WinUtils-Functions.pdf, WinUtils.CSV > > > I did a quick investigation of the performance of WinUtils in YARN. In > average NM calls 4.76 times per second and 65.51 per container. > > | |Requests|Requests/sec|Requests/min|Requests/container| > |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*| > |[WinUtils] Execute -help|4148|0.145|8.769|2.007| > |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37| > |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43| > |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37| > |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05| > Interval: 7 hours, 53 minutes and 48 seconds > Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops. > This means *666.58* IO ops/second due to WinUtils. > We should start considering to remove WinUtils from Hadoop and creating a JNI > interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8275) Create a JNI interface to interact with Windows
[ https://issues.apache.org/jira/browse/YARN-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16473789#comment-16473789 ] Allen Wittenauer commented on YARN-8275: bq. I am planning to code everything in Commons to be used from YARN and HDFS. The umbrella JIRA should really start out in HADOOP so that people aren't taken by surprise. I suspect any YARN and HDFS specific code to be relatively tiny since winutils is used all over the place, including in the client code. That fact probably makes ... bq. a long running native process communicating with YARN over pipe almost certainly a non-starter, never mind the security concerns, with greatly increasing the complexity for likely very little gain. The other thing to keep in mind is that winutils pre-dates Java 7. Things like symlinks can now be done with Java APIs. No C required. I'd highly recommend starting with replacing the winutils calls with Java API calls first and then digging into something more complex later. [The Unix versions of those same calls will likely get a speed bump too.] > Create a JNI interface to interact with Windows > --- > > Key: YARN-8275 > URL: https://issues.apache.org/jira/browse/YARN-8275 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: WinUtils-Functions.pdf, WinUtils.CSV > > > I did a quick investigation of the performance of WinUtils in YARN. In > average NM calls 4.76 times per second and 65.51 per container. > > | |Requests|Requests/sec|Requests/min|Requests/container| > |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*| > |[WinUtils] Execute -help|4148|0.145|8.769|2.007| > |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37| > |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43| > |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37| > |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05| > Interval: 7 hours, 53 minutes and 48 seconds > Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops. > This means *666.58* IO ops/second due to WinUtils. > We should start considering to remove WinUtils from Hadoop and creating a JNI > interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8275) Create a JNI interface to interact with Windows
[ https://issues.apache.org/jira/browse/YARN-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472863#comment-16472863 ] Íñigo Goiri commented on YARN-8275: --- [~miklos.szeg...@cloudera.com], we would have to decide how to write the native service and that opens a big design space. Do you have any proposal for that? In any case, I would make this pluggable and then we can rely on winutils, a separate service or JNI. > Create a JNI interface to interact with Windows > --- > > Key: YARN-8275 > URL: https://issues.apache.org/jira/browse/YARN-8275 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: WinUtils-Functions.pdf, WinUtils.CSV > > > I did a quick investigation of the performance of WinUtils in YARN. In > average NM calls 4.76 times per second and 65.51 per container. > > | |Requests|Requests/sec|Requests/min|Requests/container| > |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*| > |[WinUtils] Execute -help|4148|0.145|8.769|2.007| > |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37| > |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43| > |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37| > |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05| > Interval: 7 hours, 53 minutes and 48 seconds > Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops. > This means *666.58* IO ops/second due to WinUtils. > We should start considering to remove WinUtils from Hadoop and creating a JNI > interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8275) Create a JNI interface to interact with Windows
[ https://issues.apache.org/jira/browse/YARN-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472663#comment-16472663 ] Miklos Szegedi commented on YARN-8275: -- [~giovanni.fumarola], I am curious about your opinion about the design of YARN-4599. In that case we considered JNI vs. a long running native process communicating with YARN over pipe. The latter seems better in terms of security and maintainability in case some native functions start corrupting JVM heap. There is only a single process start in that case, so that it does not affect performance. What do you think? > Create a JNI interface to interact with Windows > --- > > Key: YARN-8275 > URL: https://issues.apache.org/jira/browse/YARN-8275 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: WinUtils-Functions.pdf, WinUtils.CSV > > > I did a quick investigation of the performance of WinUtils in YARN. In > average NM calls 4.76 times per second and 65.51 per container. > > | |Requests|Requests/sec|Requests/min|Requests/container| > |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*| > |[WinUtils] Execute -help|4148|0.145|8.769|2.007| > |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37| > |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43| > |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37| > |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05| > Interval: 7 hours, 53 minutes and 48 seconds > Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops. > This means *666.58* IO ops/second due to WinUtils. > We should start considering to remove WinUtils from Hadoop and creating a JNI > interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8275) Create a JNI interface to interact with Windows
[ https://issues.apache.org/jira/browse/YARN-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472520#comment-16472520 ] Giovanni Matteo Fumarola commented on YARN-8275: [~elgoiri] thanks for the comment. I am planning to code everything in Commons to be used from YARN and HDFS. > Create a JNI interface to interact with Windows > --- > > Key: YARN-8275 > URL: https://issues.apache.org/jira/browse/YARN-8275 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: WinUtils-Functions.pdf, WinUtils.CSV > > > I did a quick investigation of the performance of WinUtils in YARN. In > average NM calls 4.76 times per second and 65.51 per container. > > | |Requests|Requests/sec|Requests/min|Requests/container| > |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*| > |[WinUtils] Execute -help|4148|0.145|8.769|2.007| > |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37| > |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43| > |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37| > |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05| > Interval: 7 hours, 53 minutes and 48 seconds > Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops. > This means *666.58* IO ops/second due to WinUtils. > We should start considering to remove WinUtils from Hadoop and creating a JNI > interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8275) Create a JNI interface to interact with Windows
[ https://issues.apache.org/jira/browse/YARN-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472476#comment-16472476 ] Íñigo Goiri commented on YARN-8275: --- Even though the main use should be in YARN for now, we should do the changes in Commons and eventually plug this in HDFS too. > Create a JNI interface to interact with Windows > --- > > Key: YARN-8275 > URL: https://issues.apache.org/jira/browse/YARN-8275 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: WinUtils-Functions.pdf, WinUtils.CSV > > > I did a quick investigation of the performance of WinUtils in YARN. In > average NM calls 4.76 times per second and 65.51 per container. > > | |Requests|Requests/sec|Requests/min|Requests/container| > |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*| > |[WinUtils] Execute -help|4148|0.145|8.769|2.007| > |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37| > |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43| > |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37| > |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05| > Interval: 7 hours, 53 minutes and 48 seconds > Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops. > This means *666.58* IO ops/second due to WinUtils. > We should start considering to remove WinUtils from Hadoop and creating a JNI > interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8275) Create a JNI interface to interact with Windows
[ https://issues.apache.org/jira/browse/YARN-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472460#comment-16472460 ] Giovanni Matteo Fumarola commented on YARN-8275: The attached file [^WinUtils-Functions.pdf] shows the current usage (inputs and outputs) of all the WinUtils functions. We should design a JNI interface aligned with it. > Create a JNI interface to interact with Windows > --- > > Key: YARN-8275 > URL: https://issues.apache.org/jira/browse/YARN-8275 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: WinUtils-Functions.pdf, WinUtils.CSV > > > I did a quick investigation of the performance of WinUtils in YARN. In > average NM calls 4.76 times per second and 65.51 per container. > > | |Requests|Requests/sec|Requests/min|Requests/container| > |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*| > |[WinUtils] Execute -help|4148|0.145|8.769|2.007| > |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37| > |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43| > |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37| > |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05| > Interval: 7 hours, 53 minutes and 48 seconds > Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops. > This means *666.58* IO ops/second due to WinUtils. > We should start considering to remove WinUtils from Hadoop and creating a JNI > interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8275) Create a JNI interface to interact with Windows
[ https://issues.apache.org/jira/browse/YARN-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471346#comment-16471346 ] Íñigo Goiri commented on YARN-8275: --- We've been seeing heavy load on winutils too. AFAIK, the main contributor for this was [~cnauroth] but I don't think this is being maintained much nowadays. Others involved were [~kiranmr] and [~rusanu]. In any case, as long as we make it pluggable and allow switching between winutils.exe and JNI this is doable. > Create a JNI interface to interact with Windows > --- > > Key: YARN-8275 > URL: https://issues.apache.org/jira/browse/YARN-8275 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Minor > Attachments: WinUtils.CSV > > > I did a quick investigation of the performance of WinUtils in YARN. In > average NM calls 4.76 times per second and 65.51 per container. > > | |Requests|Requests/sec|Requests/min|Requests/container| > |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*| > |[WinUtils] Execute -help|4148|0.145|8.769|2.007| > |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37| > |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43| > |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37| > |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05| > Interval: 7 hours, 53 minutes and 48 seconds > Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops. > This means *666.58* IO ops/second due to WinUtils. > We should start considering to remove WinUtils from Hadoop and creating a JNI > interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8275) Create a JNI interface to interact with Windows
[ https://issues.apache.org/jira/browse/YARN-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471331#comment-16471331 ] Giovanni Matteo Fumarola commented on YARN-8275: The attached file [^WinUtils.CSV] shows the IO ops for a single WinUtils call. > Create a JNI interface to interact with Windows > --- > > Key: YARN-8275 > URL: https://issues.apache.org/jira/browse/YARN-8275 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Minor > Attachments: WinUtils.CSV > > > I did a quick investigation of the performance of WinUtils. In average NM > calls 4.76 times per second and 65.51 per container. > > | |Requests|Requests/sec|Requests/min|Requests/container| > |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*| > |[WinUtils] Execute -help|4148|0.145|8.769|2.007| > |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37| > |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43| > |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37| > |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05| > Interval: 7 hours, 53 minutes and 48 seconds > Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops. > This means *666.58* IO ops/second due to WinUtils. > We should start considering to remove WinUtils from Hadoop and creating a JNI > interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org