[jira] [Commented] (HADOOP-14559) FTPFileSystem instance in TestFTPFileSystem should be created before tests and closed after tests
[ https://issues.apache.org/jira/browse/HADOOP-14559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139464#comment-16139464 ] Hongyuan Li commented on HADOOP-14559: -- Hi, [~ste...@apache.org], [~yzhangal], sorry to interrupt you, should this issue be resolved or closed ? > FTPFileSystem instance in TestFTPFileSystem should be created before tests > and closed after tests > -- > > Key: HADOOP-14559 > URL: https://issues.apache.org/jira/browse/HADOOP-14559 > Project: Hadoop Common > Issue Type: Improvement > Components: fs, test >Affects Versions: 3.0.0-alpha2 >Reporter: Hongyuan Li >Assignee: Hongyuan Li >Priority: Minor > Attachments: HADOOP-14559-001.patch > > > The used FTPFileSystem in TestFTPFileSystem should be closed in each test > case as an improvement. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-13743) error message in AzureNativeFileSystemStore.connectUsingAnonymousCredentials has too many spaces
[ https://issues.apache.org/jira/browse/HADOOP-13743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119883#comment-16119883 ] Hongyuan Li edited comment on HADOOP-13743 at 8/9/17 1:49 PM: -- dig into the log4j source code, the message format is implemented by the stringbuilder to format error messages internally. was (Author: hongyuan li): dig into the log4j source code, the message format uses the stringbuilder to format error messages internally. > error message in AzureNativeFileSystemStore.connectUsingAnonymousCredentials > has too many spaces > > > Key: HADOOP-13743 > URL: https://issues.apache.org/jira/browse/HADOOP-13743 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 2.8.0, 2.7.3 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Trivial > Attachments: HADOOP-13743-branch-2-001.patch, > HADOOP-14373-branch-2-002.patch > > > The error message on a failed hadoop fs -ls command against an unauthed azure > container has an extra space in {{" them in"}} > {code} > ls: org.apache.hadoop.fs.azure.AzureException: Unable to access container > demo in account example.blob.core.windows.net using anonymous credentials, > and no credentials found for them in the configuration. > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13743) error message in AzureNativeFileSystemStore.connectUsingAnonymousCredentials has too many spaces
[ https://issues.apache.org/jira/browse/HADOOP-13743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119883#comment-16119883 ] Hongyuan Li commented on HADOOP-13743: -- dig into the log4j source code, the message format uses the stringbuilder to format error messages internally. > error message in AzureNativeFileSystemStore.connectUsingAnonymousCredentials > has too many spaces > > > Key: HADOOP-13743 > URL: https://issues.apache.org/jira/browse/HADOOP-13743 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 2.8.0, 2.7.3 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Trivial > Attachments: HADOOP-13743-branch-2-001.patch, > HADOOP-14373-branch-2-002.patch > > > The error message on a failed hadoop fs -ls command against an unauthed azure > container has an extra space in {{" them in"}} > {code} > ls: org.apache.hadoop.fs.azure.AzureException: Unable to access container > demo in account example.blob.core.windows.net using anonymous credentials, > and no credentials found for them in the configuration. > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14623) fixed some bugs in KafkaSink
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099702#comment-16099702 ] Hongyuan Li commented on HADOOP-14623: -- Hi, [~bharatviswa], you are right. It does not matter. The value of param {{key.serializer}} in the origin code is {{ByteArraySerializer}}, which is not the correct corespondent serializer class of type Integer, so i changed the type to {{byte[]}} although the key has not been used. If this would confuse you, i will remove the modification and resubmit a new patch. Thanks anyway for your code review. > fixed some bugs in KafkaSink > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch, > HADOOP-14623-003.patch, HADOOP-14623-004.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} > *Update* > find another bug about this class, {{key.serializer}} used > {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the > key properties of Producer is Integer, codes list below: > {code} > props.put("key.serializer", > "org.apache.kafka.common.serialization.ByteArraySerializer"); > … > producer = new KafkaProducer(props); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14623) fixed some bugs in KafkaSink
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097550#comment-16097550 ] Hongyuan Li edited comment on HADOOP-14623 at 7/23/17 9:32 AM: --- Hi, [~bharatviswa]. i resubmit the patch 003 according to the discussion above with modification below 1、set the {{acks}} to 1 2、Use {{ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG}} instead of {{key.serializer}}. 3、set the key type from {{Integer}} to {{byte[]}} *Update* patch 004 is the latest. was (Author: hongyuan li): Hi, [~bharatviswa]. i resubmit the patch 003 according to the discussion above with modification below 1、set the {{acks}} to 1 2、Use {{ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG}} instead of {{key.serializer}}. 3、set the key type from {{Integer}} to {{byte[]}} > fixed some bugs in KafkaSink > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch, > HADOOP-14623-003.patch, HADOOP-14623-004.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} > *Update* > find another bug about this class, {{key.serializer}} used > {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the > key properties of Producer is Integer, codes list below: > {code} > props.put("key.serializer", > "org.apache.kafka.common.serialization.ByteArraySerializer"); > … > producer = new KafkaProducer(props); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14559) FTPFileSystem instance in TestFTPFileSystem should be created before tests and closed after tests
[ https://issues.apache.org/jira/browse/HADOOP-14559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097566#comment-16097566 ] Hongyuan Li commented on HADOOP-14559: -- Hi, [~yzhangal]. Sorry to call you again, can this jira be resolve as you suggested ? > FTPFileSystem instance in TestFTPFileSystem should be created before tests > and closed after tests > -- > > Key: HADOOP-14559 > URL: https://issues.apache.org/jira/browse/HADOOP-14559 > Project: Hadoop Common > Issue Type: Improvement > Components: fs, test >Affects Versions: 3.0.0-alpha2 >Reporter: Hongyuan Li >Assignee: Hongyuan Li >Priority: Minor > Attachments: HADOOP-14559-001.patch > > > The used FTPFileSystem in TestFTPFileSystem should be closed in each test > case as an improvement. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14623) fixed some bugs in KafkaSink
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14623: - Attachment: HADOOP-14623-004.patch modify compile error and check style warn > fixed some bugs in KafkaSink > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch, > HADOOP-14623-003.patch, HADOOP-14623-004.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} > *Update* > find another bug about this class, {{key.serializer}} used > {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the > key properties of Producer is Integer, codes list below: > {code} > props.put("key.serializer", > "org.apache.kafka.common.serialization.ByteArraySerializer"); > … > producer = new KafkaProducer(props); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14623) fixed some bugs in KafkaSink
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097550#comment-16097550 ] Hongyuan Li commented on HADOOP-14623: -- Hi, [~bharatviswa]. i resubmit the patch 003 according to the discussion above with modification below 1、set the {{acks}} to 1 2、Use {{ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG}} instead of {{key.serializer}}. 3、set the key type from {{Integer}} to {{byte[]}} > fixed some bugs in KafkaSink > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch, > HADOOP-14623-003.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} > *Update* > find another bug about this class, {{key.serializer}} used > {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the > key properties of Producer is Integer, codes list below: > {code} > props.put("key.serializer", > "org.apache.kafka.common.serialization.ByteArraySerializer"); > … > producer = new KafkaProducer(props); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14623) fixed some bugs in KafkaSink
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14623: - Attachment: HADOOP-14623-003.patch > fixed some bugs in KafkaSink > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch, > HADOOP-14623-003.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} > *Update* > find another bug about this class, {{key.serializer}} used > {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the > key properties of Producer is Integer, codes list below: > {code} > props.put("key.serializer", > "org.apache.kafka.common.serialization.ByteArraySerializer"); > … > producer = new KafkaProducer(props); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14623) fixed some bugs in KafkaSink
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095937#comment-16095937 ] Hongyuan Li edited comment on HADOOP-14623 at 7/21/17 8:54 AM: --- Hi, [~bharatviswa], find that the kafka client must have the same version of kafka server, or the new producer api will not functions well.The old {{kafka.javaapi.producer}} Producer functions well, but it will be removed from the future kafka version. The following is the stack trace when use kafka client 0.10.0 to write into kaffa server 0.9.0. {code} org.apache.kafka.common.protocol.types.SchemaException: Error reading field 'topic_metadata': Error reading array of size 1702065152, only 29 bytes available at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:73) at org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380) at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:236) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135) at java.lang.Thread.run(Thread.java:745) {code} Any good idea? *Update* use kafka client 0.9.x can write to kafka server whose version is 0.9.x-0.10.x. *Update* seems that the key type can be set to byte[], instead of using Integer. was (Author: hongyuan li): Hi, [~bharatviswa], find that the kafka client must have the same version of kafka server, or the new producer api will not functions well.The old {{kafka.javaapi.producer}} Producer functions well, but it will be removed from the future kafka version. The following is the stack trace when use kafka client 0.10.0 to write into kaffa server 0.9.0. {code} org.apache.kafka.common.protocol.types.SchemaException: Error reading field 'topic_metadata': Error reading array of size 1702065152, only 29 bytes available at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:73) at org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380) at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:236) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135) at java.lang.Thread.run(Thread.java:745) {code} Any good idea? *Update* use kafka client 0.9.x can write to kafka server whose version is 0.9.x-0.10.x. > fixed some bugs in KafkaSink > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} > *Update* > find another bug about this class, {{key.serializer}} used > {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the > key properties of Producer is Integer, codes list below: > {code} > props.put("key.serializer", > "org.apache.kafka.common.serialization.ByteArraySerializer"); > … > producer = new KafkaProducer(props); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14623) fixed some bugs in KafkaSink
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095937#comment-16095937 ] Hongyuan Li edited comment on HADOOP-14623 at 7/21/17 8:50 AM: --- Hi, [~bharatviswa], find that the kafka client must have the same version of kafka server, or the new producer api will not functions well.The old {{kafka.javaapi.producer}} Producer functions well, but it will be removed from the future kafka version. The following is the stack trace when use kafka client 0.10.0 to write into kaffa server 0.9.0. {code} org.apache.kafka.common.protocol.types.SchemaException: Error reading field 'topic_metadata': Error reading array of size 1702065152, only 29 bytes available at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:73) at org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380) at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:236) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135) at java.lang.Thread.run(Thread.java:745) {code} Any good idea? *Update* use kafka client 0.9.x can write to kafka server whose version is 0.9.x-0.10.x. was (Author: hongyuan li): Hi, [~bharatviswa], find that the kafka client must have the same version of kafka server, or the new producer api will not functions well.The old {{kafka.javaapi.producer}} Producer functions well, but it will be removed from the future kafka version. The following is the stack trace when use kafka client 0.10.0 to write into kaffa server 0.9.0. {code} org.apache.kafka.common.protocol.types.SchemaException: Error reading field 'topic_metadata': Error reading array of size 1702065152, only 29 bytes available at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:73) at org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380) at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:236) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135) at java.lang.Thread.run(Thread.java:745) {code} Any good idea? > fixed some bugs in KafkaSink > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} > *Update* > find another bug about this class, {{key.serializer}} used > {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the > key properties of Producer is Integer, codes list below: > {code} > props.put("key.serializer", > "org.apache.kafka.common.serialization.ByteArraySerializer"); > … > producer = new KafkaProducer(props); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14623) fixed some bugs in KafkaSink
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095937#comment-16095937 ] Hongyuan Li commented on HADOOP-14623: -- Hi, [~bharatviswa], find that the kafka client must have the same version of kafka server, or the new producer api will not functions well.The old {{kafka.javaapi.producer}} Producer functions well, but it will be removed from the future kafka version. The following is the stack trace when use kafka client 0.10.0 to write into kaffa server 0.9.0. {code} org.apache.kafka.common.protocol.types.SchemaException: Error reading field 'topic_metadata': Error reading array of size 1702065152, only 29 bytes available at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:73) at org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380) at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:236) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135) at java.lang.Thread.run(Thread.java:745) {code} Any good idea? > fixed some bugs in KafkaSink > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} > *Update* > find another bug about this class, {{key.serializer}} used > {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the > key properties of Producer is Integer, codes list below: > {code} > props.put("key.serializer", > "org.apache.kafka.common.serialization.ByteArraySerializer"); > … > producer = new KafkaProducer(props); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14623) fixed some bugs in KafkaSink
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095632#comment-16095632 ] Hongyuan Li commented on HADOOP-14623: -- Hi [~bharatviswa]. Thanks for your kindly comment.In my opinion, the former code use integer key is just to partition the data, which i don't think it is necessary, however, the {{key.serializer}} must be set correctly anyway. Thanks. > fixed some bugs in KafkaSink > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} > *Update* > find another bug about this class, {{key.serializer}} used > {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the > key properties of Producer is Integer, codes list below: > {code} > props.put("key.serializer", > "org.apache.kafka.common.serialization.ByteArraySerializer"); > … > producer = new KafkaProducer(props); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14623) fixed some bugs in KafkaSink
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081509#comment-16081509 ] Hongyuan Li commented on HADOOP-14623: -- Hi [~aw], could you give me a code review? > fixed some bugs in KafkaSink > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} > *Update* > find another bug about this class, {{key.serializer}} used > {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the > key properties of Producer is Integer, codes list below: > {code} > props.put("key.serializer", > "org.apache.kafka.common.serialization.ByteArraySerializer"); > … > producer = new KafkaProducer(props); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14623) fixed some bugs in KafkaSink
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076628#comment-16076628 ] Hongyuan Li edited comment on HADOOP-14623 at 7/8/17 11:55 AM: --- futuremore, flush method is to confirm that data has been written. *Update/Crorrection* sorry, it is the {{putMetrics}} method. in {{KafkaSink}}#{{putMetrics}} , code lists below, which makes me have a different opinion: {code} …… Future future = producer.send(data); jsonLines.setLength(0); try { future.get(); // which means synchronously } catch (InterruptedException e) { throw new MetricsException("Error sending data", e); } catch (ExecutionException e) { throw new MetricsException("Error sending data", e); } …… {code} was (Author: hongyuan li): futuremore, flush method is to confirm that data has been written. *Update/Crorrection* sorry, it is the {{putMetrics}} method. in {{KafkaSink}}#{{putMetrics}} , code lists below, which makes me have a different opinion: {code} …… Future future = producer.send(data); jsonLines.setLength(0); try { future.get(); } catch (InterruptedException e) { throw new MetricsException("Error sending data", e); } catch (ExecutionException e) { throw new MetricsException("Error sending data", e); } …… {code} > fixed some bugs in KafkaSink > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} > *Update* > find another bug about this class, {{key.serializer}} used > {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the > key properties of Producer is Integer, codes list below: > {code} > props.put("key.serializer", > "org.apache.kafka.common.serialization.ByteArraySerializer"); > … > producer = new KafkaProducer(props); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-10949) metrics2 sink plugin for Apache Kafka
[ https://issues.apache.org/jira/browse/HADOOP-10949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079110#comment-16079110 ] Hongyuan Li commented on HADOOP-10949: -- i filed HADOOP-14623 to update this module. > metrics2 sink plugin for Apache Kafka > - > > Key: HADOOP-10949 > URL: https://issues.apache.org/jira/browse/HADOOP-10949 > Project: Hadoop Common > Issue Type: New Feature > Components: metrics >Reporter: Babak Behzad >Assignee: Babak Behzad > Fix For: 3.0.0-alpha1 > > Attachments: HADOOP-10949-1.patch, HADOOP-10949-2.patch, > HADOOP-10949-4.patch, HADOOP-10949-5.patch, HADOOP-10949-6-1.patch, > HADOOP-10949-6.patch, HADOOP-10949.patch, HADOOP-10949.patch, > HADOOP-10949.patch, HADOOP-10949.patch, HADOOP-10949.patch, > HADOOP-10949.patch, HADOOP-10949.patch, HADOOP-10949.patch, > HADOOP-10949.patch, HADOOP-10949.patch, HADOOP-10949.patch > > > Write a metrics2 sink plugin for Hadoop to send metrics directly to Apache > Kafka in addition to the current, Graphite > ([Hadoop-9704|https://issues.apache.org/jira/browse/HADOOP-9704]), Ganglia > and File sinks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14623) fixed some bugs in KafkaSink
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079108#comment-16079108 ] Hongyuan Li commented on HADOOP-14623: -- none of the test failure is related with this patch. None checkstyle warning and findbug warning. > fixed some bugs in KafkaSink > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} > *Update* > find another bug about this class, {{key.serializer}} used > {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the > key properties of Producer is Integer, codes list below: > {code} > props.put("key.serializer", > "org.apache.kafka.common.serialization.ByteArraySerializer"); > … > producer = new KafkaProducer(props); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14632) add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.
[ https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079075#comment-16079075 ] Hongyuan Li commented on HADOOP-14632: -- none of the test failure is related with this patch. None checkstyle warning and findbug warning. Ping [~ste...@apache.org]、[~brahmareddy] for code review. Performance comparision will be submitted soon. > add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can > improve the transfer speed. > -- > > Key: HADOOP-14632 > URL: https://issues.apache.org/jira/browse/HADOOP-14632 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.0.0-alpha1 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14632-001.patch, HADOOP-14632-002.patch, > HADOOP-14632-003.patch, HADOOP-14632-004.patch, HADOOP-14632-005.patch > > > add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which > can improve the transfer speed. > Test example shows transfer performance has improved a lot. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14623) fixed some bugs in KafkaSink
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079033#comment-16079033 ] Hongyuan Li edited comment on HADOOP-14623 at 7/8/17 9:20 AM: -- I highly recommend Four steps: 1、should use {{acks}} = {{1}}. 2、add {{https://repository.apache.org/content/repositories/releases}} repo, the {{apache snapshot rep}} doesnot have a higher version kafka module, the version of which is less than {{0.8.2}} 3、update kafka client version to at least {{0.10.1.0}}, which has a IntegerSerializer class If kafka sink want to generate a kafka producer with the the type of key being Integer. 4、 Use ProducerConfig.XXX instead of using string value directly. For example, use {{ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG}} instead of {{key.serializer}} Thanks for any advice. latest patch has implemeted above. was (Author: hongyuan li): I highly recommend Four steps: 1、should use {{acks}} = {{1}}. 2、add {{https://repository.apache.org/content/repositories/releases}} repo, the {{apache snapshot rep}} doesnot have a higher version kafka module, the version of which is less than {{0.8.2}} 3、update kafka client version to at least {{0.10.1.0}}, which has a IntegerSerializer class If kafka sink want to generate a kafka producer with the the type of key being Integer. 4、 Use ProducerConfig.XXX instead of using string value directly. For example, use {{ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG}} instead of {{key.serializer}} Thanks for any advice. > fixed some bugs in KafkaSink > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} > *Update* > find another bug about this class, {{key.serializer}} used > {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the > key properties of Producer is Integer, codes list below: > {code} > props.put("key.serializer", > "org.apache.kafka.common.serialization.ByteArraySerializer"); > … > producer = new KafkaProducer(props); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14623) fixed some bugs in KafkaSink
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14623: - Attachment: HADOOP-14623-002.patch > fixed some bugs in KafkaSink > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} > *Update* > find another bug about this class, {{key.serializer}} used > {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the > key properties of Producer is Integer, codes list below: > {code} > props.put("key.serializer", > "org.apache.kafka.common.serialization.ByteArraySerializer"); > … > producer = new KafkaProducer(props); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14623) fixed some bugs in KafkaSink
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079033#comment-16079033 ] Hongyuan Li edited comment on HADOOP-14623 at 7/8/17 9:15 AM: -- I highly recommend Four steps: 1、should use {{acks}} = {{1}}. 2、add {{https://repository.apache.org/content/repositories/releases}} repo, the {{apache snapshot rep}} doesnot have a higher version kafka module, the version of which is less than {{0.8.2}} 3、update kafka client version to at least {{0.10.1.0}}, which has a IntegerSerializer class If kafka sink want to generate a kafka producer with the the type of key being Integer. 4、 Use ProducerConfig.XXX instead of using string value directly. For example, use {{ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG}} instead of {{key.serializer}} Thanks for any advice. was (Author: hongyuan li): I highly recommend Five steps: 1、should use {{acks}} = {{1}}. 2、add {{https://repository.apache.org/content/repositories/releases}} repo, the {{apache snapshot rep}} doesnot have a higher version kafka module, the version of which is less than {{0.8.2}} 3、update kafka client version to at least {{0.10.1.0}}, which has a IntegerSerializer class If kafka sink want to generate a kafka producer with the the type of key being Integer. 4、add {{callback}} when using new {{KafkaProducer}}#{{send}} 5、 Use ProducerConfig.XXX instead of using string value directly. For example, use {{ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG}} instead of {{key.serializer}} Thanks for any advice. > fixed some bugs in KafkaSink > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} > *Update* > find another bug about this class, {{key.serializer}} used > {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the > key properties of Producer is Integer, codes list below: > {code} > props.put("key.serializer", > "org.apache.kafka.common.serialization.ByteArraySerializer"); > … > producer = new KafkaProducer(props); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14623) fixed some bugs in KafkaSink
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079033#comment-16079033 ] Hongyuan Li edited comment on HADOOP-14623 at 7/8/17 8:51 AM: -- I highly recommend Five steps: 1、should use {{acks}} = {{1}}. 2、add {{https://repository.apache.org/content/repositories/releases}} repo, the {{apache snapshot rep}} doesnot have a higher version kafka module, the version of which is less than {{0.8.2}} 3、update kafka client version to at least {{0.10.1.0}}, which has a IntegerSerializer class If kafka sink want to generate a kafka producer with the the type of key being Integer. 4、add {{callback}} when using new {{KafkaProducer}}#{{send}} 5、 Use ProducerConfig.XXX instead of using string value directly. For example, use {{ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG}} instead of {{key.serializer}} Thanks for any advice. was (Author: hongyuan li): I highly recommend Four points: 1、should use {{acks}} = {{1}}. 2、add {{https://repository.apache.org/content/repositories/releases}} repo, the {{apache snapshot rep}} doesnot have a higher version kafka module, the version of which is less than {{0.8.2}} 3、update kafka client version to at least {{0.10.1.0}}, which has a IntegerSerializer class If kafka sink want to generate a kafka producer with the the type of key being Integer. 4、add {{callback}} when using new {{KafkaProducer}}#{{send}} Thanks for any advice. > fixed some bugs in KafkaSink > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} > *Update* > find another bug about this class, {{key.serializer}} used > {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the > key properties of Producer is Integer, codes list below: > {code} > props.put("key.serializer", > "org.apache.kafka.common.serialization.ByteArraySerializer"); > … > producer = new KafkaProducer(props); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14623) fixed some bugs in KafkaSink
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079033#comment-16079033 ] Hongyuan Li edited comment on HADOOP-14623 at 7/8/17 8:46 AM: -- I highly recommend Four points: 1、should use {{acks}} = {{1}}. 2、add {{https://repository.apache.org/content/repositories/releases}} repo, the {{apache snapshot rep}} doesnot have a higher version kafka module, the version of which is less than {{0.8.2}} 3、update kafka client version to at least {{0.10.1.0}}, which has a IntegerSerializer class If kafka sink want to generate a kafka producer with the the type of key being Integer. 4、add {{callback}} when using new {{KafkaProducer}}#{{send}} Thanks for any advice. was (Author: hongyuan li): I highly recommend Two points: 1、should use {{acks}} = {{1}}. 2、add {{https://repository.apache.org/content/repositories/releases}} repo, the {{apache snapshot rep}} doesnot have a higher version kafka module, the version of which is less than {{0.8.2}} 2、update kafka client version to at least {{0.10.1.0}}, which has a IntegerSerializer class If kafka sink want to generate a kafka producer with the the type of key being Integer. 3、add {{callback}} when using new {{KafkaProducer}}#{{send}} Thanks for any advice. > fixed some bugs in KafkaSink > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} > *Update* > find another bug about this class, {{key.serializer}} used > {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the > key properties of Producer is Integer, codes list below: > {code} > props.put("key.serializer", > "org.apache.kafka.common.serialization.ByteArraySerializer"); > … > producer = new KafkaProducer(props); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14632) add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.
[ https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14632: - Attachment: HADOOP-14632-005.patch fix junit test > add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can > improve the transfer speed. > -- > > Key: HADOOP-14632 > URL: https://issues.apache.org/jira/browse/HADOOP-14632 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.0.0-alpha1 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14632-001.patch, HADOOP-14632-002.patch, > HADOOP-14632-003.patch, HADOOP-14632-004.patch, HADOOP-14632-005.patch > > > add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which > can improve the transfer speed. > Test example shows transfer performance has improved a lot. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14623) fixed some bugs in KafkaSink
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079033#comment-16079033 ] Hongyuan Li edited comment on HADOOP-14623 at 7/8/17 8:24 AM: -- I highly recommend Two points: 1、should use {{acks}} = {{1}}. 2、add {{https://repository.apache.org/content/repositories/releases}} repo, the {{apache snapshot rep}} doesnot have a higher version kafka module, the version of which is less than {{0.8.2}} 2、update kafka client version to at least {{0.10.1.0}}, which has a IntegerSerializer class If kafka sink want to generate a kafka producer with the the type of key being Integer. 3、add {{callback}} when using new {{KafkaProducer}}#{{send}} Thanks for any advice. was (Author: hongyuan li): I highly recommend Two points: 1、should use {{acks}} = {{1}}. 2、update kafka client version to at least {{0.10.1.0}}, which has a IntegerSerializer class If kafka sink want to generate a kafka producer with the the type of key being Integer, but what blocks it that the {{apache snapshot rep}} doesnot have a kafka, the version of which is higher than {{0.8.2}} 3、add {{callback}} when using new {{KafkaProducer}}#{{send}} Thanks for any advice. > fixed some bugs in KafkaSink > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} > *Update* > find another bug about this class, {{key.serializer}} used > {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the > key properties of Producer is Integer, codes list below: > {code} > props.put("key.serializer", > "org.apache.kafka.common.serialization.ByteArraySerializer"); > … > producer = new KafkaProducer(props); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14623) fixed some bugs in KafkaSink
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079033#comment-16079033 ] Hongyuan Li edited comment on HADOOP-14623 at 7/8/17 8:20 AM: -- I highly recommend Two points: 1、should use {{acks}} = {{1}}. 2、update kafka client version to at least {{0.10.1.0}}, which has a IntegerSerializer class If kafka sink want to generate a kafka producer with the the type of key being Integer, but what blocks it that the {{apache snapshot rep}} doesnot have a kafka, the version of which is higher than {{0.8.2}} 3、add {{callback}} when using new {{KafkaProducer}}#{{send}} Thanks for any advice. was (Author: hongyuan li): I highly recommend Two points: 1、should use {{acks}} = {{1}}. 2、update kafka client version to at least {{0.10.1.0}}, which has a IntegerSerializer class If kafka sink want to generate a kafka producer with the the type of key being Integer. The last patch will fix the two, If you don't think so, close the jira. Thanks for any advice. 3、add {{callback}} when using new {{KafkaProducer}}#{{send}} > fixed some bugs in KafkaSink > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} > *Update* > find another bug about this class, {{key.serializer}} used > {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the > key properties of Producer is Integer, codes list below: > {code} > props.put("key.serializer", > "org.apache.kafka.common.serialization.ByteArraySerializer"); > … > producer = new KafkaProducer(props); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14623) fixed some bugs in KafkaSink
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079033#comment-16079033 ] Hongyuan Li edited comment on HADOOP-14623 at 7/8/17 8:11 AM: -- I highly recommend Two points: 1、should use {{acks}} = {{1}}. 2、update kafka client version to at least {{0.10.1.0}}, which has a IntegerSerializer class If kafka sink want to generate a kafka producer with the the type of key being Integer. The last patch will fix the two, If you don't think so, close the jira. Thanks for any advice. 3、add {{callback}} when using new {{KafkaProducer}}#{{send}} was (Author: hongyuan li): I highly recommend Two points: 1、should use {{acks}} = {{1}}. 2、update kafka client version to {{0.10.1.0}}, which has a IntegerSerializer class If kafka sink want to generate a kafka producer with the the type of key being Integer. The last patch will fix the two, If you don't think so, close the jira. Thanks for any advice. > fixed some bugs in KafkaSink > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} > *Update* > find another bug about this class, {{key.serializer}} used > {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the > key properties of Producer is Integer, codes list below: > {code} > props.put("key.serializer", > "org.apache.kafka.common.serialization.ByteArraySerializer"); > … > producer = new KafkaProducer(props); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14623) fixed some bugs in KafkaSink
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079033#comment-16079033 ] Hongyuan Li edited comment on HADOOP-14623 at 7/8/17 8:06 AM: -- I highly recommend Two points: 1、should use {{acks}} = {{1}}. 2、update kafka client version to {{0.10.1.0}}, which has a IntegerSerializer class If kafka sink want to generate a kafka producer with the the type of key being Integer. The last patch will fix the two, If you don't think so, close the jira. Thanks for any advice. was (Author: hongyuan li): I highly recommend Two points: 1、should use acks = 1. 2、update kafka client version to 0.10.1. which has a IntegerSerializer class If kafka sink want to generate a kafka producer with the the type of key being Integer. The last patch will fix the two, If you don't think so, close the jira.Thanks. > fixed some bugs in KafkaSink > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} > *Update* > find another bug about this class, {{key.serializer}} used > {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the > key properties of Producer is Integer, codes list below: > {code} > props.put("key.serializer", > "org.apache.kafka.common.serialization.ByteArraySerializer"); > … > producer = new KafkaProducer(props); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14623) fixed some bugs in KafkaSink
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079033#comment-16079033 ] Hongyuan Li commented on HADOOP-14623: -- I highly recommend Two points: 1、should use acks = 1. 2、update kafka client version to 0.10.1. which has a IntegerSerializer class If kafka sink want to generate a kafka producer with the the type of key being Integer. The last patch will fix the two, If you don't think so, close the jira.Thanks. > fixed some bugs in KafkaSink > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} > *Update* > find another bug about this class, {{key.serializer}} used > {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the > key properties of Producer is Integer, codes list below: > {code} > props.put("key.serializer", > "org.apache.kafka.common.serialization.ByteArraySerializer"); > … > producer = new KafkaProducer(props); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14623) fixed some bugs in KafkaSink
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14623: - Summary: fixed some bugs in KafkaSink (was: KafkaSink#init should set acks to 1,not 0 and key.serializer is wrong however key not used) > fixed some bugs in KafkaSink > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} > *Update* > find another bug about this class, {{key.serializer}} used > {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the > key properties of Producer is Integer, codes list below: > {code} > props.put("key.serializer", > "org.apache.kafka.common.serialization.ByteArraySerializer"); > … > producer = new KafkaProducer(props); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0 and key.serializer is wrong however key not used
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14623: - Description: {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has been written to the broker at least. current code list below: {code} props.put("request.required.acks", "0"); {code} *Update* find another bug about this class, {{key.serializer}} used {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the key properties of Producer is Integer, codes list below: {code} props.put("key.serializer", "org.apache.kafka.common.serialization.ByteArraySerializer"); … producer = new KafkaProducer(props); {code} was: {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has been written to the broker at least. current code list below: {code} props.put("request.required.acks", "0"); {code} *Update* find another bug about this class, key.serializer used {{org.apache.kafka.common.serialization.ByteArraySerializer}} > KafkaSink#init should set acks to 1,not 0 and key.serializer is wrong however > key not used > -- > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} > *Update* > find another bug about this class, {{key.serializer}} used > {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the > key properties of Producer is Integer, codes list below: > {code} > props.put("key.serializer", > "org.apache.kafka.common.serialization.ByteArraySerializer"); > … > producer = new KafkaProducer (props); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0 and key.serializer is wrong however key not used
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14623: - Description: {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has been written to the broker at least. current code list below: {code} props.put("request.required.acks", "0"); {code} *Update* find another bug about this class, key.serializer used {{org.apache.kafka.common.serialization.ByteArraySerializer}} was: {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has been written to the broker at least. current code list below: {code} props.put("request.required.acks", "0"); {code} > KafkaSink#init should set acks to 1,not 0 and key.serializer is wrong however > key not used > -- > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} > *Update* > find another bug about this class, key.serializer used > {{org.apache.kafka.common.serialization.ByteArraySerializer}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0 and key.serializer is wrong however key not used
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14623: - Summary: KafkaSink#init should set acks to 1,not 0 and key.serializer is wrong however key not used (was: KafkaSink#init should set acks to 1,not 0) > KafkaSink#init should set acks to 1,not 0 and key.serializer is wrong however > key not used > -- > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14567) DistCP NullPointerException when -atomic is set but -tmp is not
[ https://issues.apache.org/jira/browse/HADOOP-14567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071185#comment-16071185 ] Hongyuan Li edited comment on HADOOP-14567 at 7/8/17 7:35 AM: -- i think we should add a default tmp workPath for distcp. [~yzhangal] i files a new jira to HADOOP-14631 to state it obvsiously. was (Author: hongyuan li): i think we should add a default tmp workPath for distcp. [~yzhangal] > DistCP NullPointerException when -atomic is set but -tmp is not > --- > > Key: HADOOP-14567 > URL: https://issues.apache.org/jira/browse/HADOOP-14567 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.7.3 > Environment: HDP 2.5.0 kerberized cluster -> HDP 2.6.0 kerberized > cluster >Reporter: Hari Sekhon >Assignee: Hongyuan Li >Priority: Minor > > When running distcp if using -atomic but not specifying -tmp then the > following NullPointerException is encountered - removing -atomic avoids this > bug: > {code} > 17/06/21 16:50:59 ERROR tools.DistCp: Exception encountered > java.lang.NullPointerException > at org.apache.hadoop.fs.Path.(Path.java:104) > at org.apache.hadoop.fs.Path.(Path.java:93) > at > org.apache.hadoop.tools.DistCp.configureOutputFormat(DistCp.java:363) > at org.apache.hadoop.tools.DistCp.createJob(DistCp.java:247) > at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:176) > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:155) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:128) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.tools.DistCp.main(DistCp.java:462) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14631) Distcp should add a default atomicWorkPath properties when using atomic
[ https://issues.apache.org/jira/browse/HADOOP-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14631: - Fix Version/s: 2.7.3 > Distcp should add a default atomicWorkPath properties when using atomic > > > Key: HADOOP-14631 > URL: https://issues.apache.org/jira/browse/HADOOP-14631 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.7.3, 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Fix For: 2.7.3 > > > Distcp should add a default AtomicWorkPath properties when using atomic > {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work > path, > {code} > if (context.shouldAtomicCommit()) { > Path workDir = context.getAtomicWorkPath(); > if (workDir == null) { > workDir = targetPath.getParent(); > } > workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() > + rand.nextInt()); > {code} > When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent > of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent > will be {{null}}, wich means > {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + > rand.nextInt());}} will throw a nullpoint exception. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-14567) DistCP NullPointerException when -atomic is set but -tmp is not
[ https://issues.apache.org/jira/browse/HADOOP-14567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li reassigned HADOOP-14567: Assignee: Hongyuan Li > DistCP NullPointerException when -atomic is set but -tmp is not > --- > > Key: HADOOP-14567 > URL: https://issues.apache.org/jira/browse/HADOOP-14567 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.7.3 > Environment: HDP 2.5.0 kerberized cluster -> HDP 2.6.0 kerberized > cluster >Reporter: Hari Sekhon >Assignee: Hongyuan Li >Priority: Minor > > When running distcp if using -atomic but not specifying -tmp then the > following NullPointerException is encountered - removing -atomic avoids this > bug: > {code} > 17/06/21 16:50:59 ERROR tools.DistCp: Exception encountered > java.lang.NullPointerException > at org.apache.hadoop.fs.Path.(Path.java:104) > at org.apache.hadoop.fs.Path.(Path.java:93) > at > org.apache.hadoop.tools.DistCp.configureOutputFormat(DistCp.java:363) > at org.apache.hadoop.tools.DistCp.createJob(DistCp.java:247) > at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:176) > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:155) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:128) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.tools.DistCp.main(DistCp.java:462) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14632) add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.
[ https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079022#comment-16079022 ] Hongyuan Li commented on HADOOP-14632: -- attched new patch. > add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can > improve the transfer speed. > -- > > Key: HADOOP-14632 > URL: https://issues.apache.org/jira/browse/HADOOP-14632 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.0.0-alpha1 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14632-001.patch, HADOOP-14632-002.patch, > HADOOP-14632-003.patch, HADOOP-14632-004.patch > > > add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which > can improve the transfer speed. > Test example shows transfer performance has improved a lot. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14632) add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.
[ https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14632: - Attachment: HADOOP-14632-004.patch > add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can > improve the transfer speed. > -- > > Key: HADOOP-14632 > URL: https://issues.apache.org/jira/browse/HADOOP-14632 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.0.0-alpha1 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14632-001.patch, HADOOP-14632-002.patch, > HADOOP-14632-003.patch, HADOOP-14632-004.patch > > > add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which > can improve the transfer speed. > Test example shows transfer performance has improved a lot. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14632) add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.
[ https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079020#comment-16079020 ] Hongyuan Li commented on HADOOP-14632: -- one of the junit test is related with the patch, will work on it. > add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can > improve the transfer speed. > -- > > Key: HADOOP-14632 > URL: https://issues.apache.org/jira/browse/HADOOP-14632 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.0.0-alpha1 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14632-001.patch, HADOOP-14632-002.patch, > HADOOP-14632-003.patch > > > add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which > can improve the transfer speed. > Test example shows transfer performance has improved a lot. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14631) Distcp should add a default atomicWorkPath properties when using atomic
[ https://issues.apache.org/jira/browse/HADOOP-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14631: - Summary: Distcp should add a default atomicWorkPath properties when using atomic (was: Distcp should add a default atomicWorkPath properties when using atomic or throw obvious Exception) > Distcp should add a default atomicWorkPath properties when using atomic > > > Key: HADOOP-14631 > URL: https://issues.apache.org/jira/browse/HADOOP-14631 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.7.3, 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > > Distcp should add a default AtomicWorkPath properties when using atomic > {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work > path, > {code} > if (context.shouldAtomicCommit()) { > Path workDir = context.getAtomicWorkPath(); > if (workDir == null) { > workDir = targetPath.getParent(); > } > workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() > + rand.nextInt()); > {code} > When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent > of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent > will be {{null}}, wich means > {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + > rand.nextInt());}} will throw a nullpoint exception. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14632) add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.
[ https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14632: - Attachment: HADOOP-14632-003.patch > add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can > improve the transfer speed. > -- > > Key: HADOOP-14632 > URL: https://issues.apache.org/jira/browse/HADOOP-14632 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.0.0-alpha1 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14632-001.patch, HADOOP-14632-002.patch, > HADOOP-14632-003.patch > > > add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which > can improve the transfer speed. > Test example shows transfer performance has improved a lot. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14632) add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.
[ https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078964#comment-16078964 ] Hongyuan Li edited comment on HADOOP-14632 at 7/8/17 5:46 AM: -- Will add funtional test units in next patch. was (Author: hongyuan li): Will add funtionational test units in next patch. > add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can > improve the transfer speed. > -- > > Key: HADOOP-14632 > URL: https://issues.apache.org/jira/browse/HADOOP-14632 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.0.0-alpha1 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14632-001.patch, HADOOP-14632-002.patch > > > add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which > can improve the transfer speed. > Test example shows transfer performance has improved a lot. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14632) add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.
[ https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078964#comment-16078964 ] Hongyuan Li edited comment on HADOOP-14632 at 7/8/17 5:45 AM: -- Will add funtionational test units in next patch. was (Author: hongyuan li): Will add fntionational test units in next patch. > add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can > improve the transfer speed. > -- > > Key: HADOOP-14632 > URL: https://issues.apache.org/jira/browse/HADOOP-14632 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.0.0-alpha1 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14632-001.patch, HADOOP-14632-002.patch > > > add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which > can improve the transfer speed. > Test example shows transfer performance has improved a lot. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14632) add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.
[ https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14632: - Summary: add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed. (was: add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.) > add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can > improve the transfer speed. > -- > > Key: HADOOP-14632 > URL: https://issues.apache.org/jira/browse/HADOOP-14632 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.0.0-alpha1 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14632-001.patch, HADOOP-14632-002.patch > > > add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which > can improve the transfer speed. > Test example shows transfer performance has improved a lot. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14632) add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.
[ https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078964#comment-16078964 ] Hongyuan Li commented on HADOOP-14632: -- Will add fntionational test units in next patch. > add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can > improve the transfer speed. > -- > > Key: HADOOP-14632 > URL: https://issues.apache.org/jira/browse/HADOOP-14632 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.0.0-alpha1 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14632-001.patch, HADOOP-14632-002.patch > > > add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which > can improve the transfer speed. > Test example shows transfer performance has improved a lot. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14631) Distcp should add a default atomicWorkPath properties when using atomic or throw obvious Exception
[ https://issues.apache.org/jira/browse/HADOOP-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078963#comment-16078963 ] Hongyuan Li commented on HADOOP-14631: -- ok, thanks for your advice. Will working on it. > Distcp should add a default atomicWorkPath properties when using atomic or > throw obvious Exception > --- > > Key: HADOOP-14631 > URL: https://issues.apache.org/jira/browse/HADOOP-14631 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.7.3, 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > > Distcp should add a default AtomicWorkPath properties when using atomic > {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work > path, > {code} > if (context.shouldAtomicCommit()) { > Path workDir = context.getAtomicWorkPath(); > if (workDir == null) { > workDir = targetPath.getParent(); > } > workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() > + rand.nextInt()); > {code} > When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent > of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent > will be {{null}}, wich means > {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + > rand.nextInt());}} will throw a nullpoint exception. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14631) Distcp should add a default atomicWorkPath properties when using atomic or throw obvious Exception
[ https://issues.apache.org/jira/browse/HADOOP-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078563#comment-16078563 ] Hongyuan Li commented on HADOOP-14631: -- ping [~liuml07] for a suggestion. > Distcp should add a default atomicWorkPath properties when using atomic or > throw obvious Exception > --- > > Key: HADOOP-14631 > URL: https://issues.apache.org/jira/browse/HADOOP-14631 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.7.3, 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > > Distcp should add a default AtomicWorkPath properties when using atomic > {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work > path, > {code} > if (context.shouldAtomicCommit()) { > Path workDir = context.getAtomicWorkPath(); > if (workDir == null) { > workDir = targetPath.getParent(); > } > workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() > + rand.nextInt()); > {code} > When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent > of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent > will be {{null}}, wich means > {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + > rand.nextInt());}} will throw a nullpoint exception. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14632) add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.
[ https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14632: - Attachment: HADOOP-14632-002.patch fix findbug warnings > add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which > can improve the transfer speed. > -- > > Key: HADOOP-14632 > URL: https://issues.apache.org/jira/browse/HADOOP-14632 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.0.0-alpha1 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14632-001.patch, HADOOP-14632-002.patch > > > add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which > can improve the transfer speed. > Test example shows transfer performance has improved a lot. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14631) Distcp should add a default atomicWorkPath properties when using atomic or throw obvious Exception
[ https://issues.apache.org/jira/browse/HADOOP-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078250#comment-16078250 ] Hongyuan Li edited comment on HADOOP-14631 at 7/7/17 6:02 PM: -- i don't know which is a good choice? to adding a default atomic workPath or throwing obvious Exception. was (Author: hongyuan li): i don't know which is a goos idea ? to adding a default atomic workPath or throwing obvious Exception. > Distcp should add a default atomicWorkPath properties when using atomic or > throw obvious Exception > --- > > Key: HADOOP-14631 > URL: https://issues.apache.org/jira/browse/HADOOP-14631 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.7.3, 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > > Distcp should add a default AtomicWorkPath properties when using atomic > {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work > path, > {code} > if (context.shouldAtomicCommit()) { > Path workDir = context.getAtomicWorkPath(); > if (workDir == null) { > workDir = targetPath.getParent(); > } > workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() > + rand.nextInt()); > {code} > When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent > of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent > will be {{null}}, wich means > {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + > rand.nextInt());}} will throw a nullpoint exception. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14632) add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.
[ https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14632: - Status: Patch Available (was: Open) > add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which > can improve the transfer speed. > -- > > Key: HADOOP-14632 > URL: https://issues.apache.org/jira/browse/HADOOP-14632 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.0.0-alpha1 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14632-001.patch > > > add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which > can improve the transfer speed. > Test example shows transfer performance has improved a lot. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14632) add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.
[ https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078215#comment-16078215 ] Hongyuan Li edited comment on HADOOP-14632 at 7/7/17 3:37 PM: -- attach a patch. was (Author: hongyuan li): attach a file > add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which > can improve the transfer speed. > -- > > Key: HADOOP-14632 > URL: https://issues.apache.org/jira/browse/HADOOP-14632 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.0.0-alpha1 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14632-001.patch > > > add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which > can improve the transfer speed. > Test example shows transfer performance has improved a lot. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14632) add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.
[ https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14632: - Affects Version/s: 3.0.0-alpha1 > add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which > can improve the transfer speed. > -- > > Key: HADOOP-14632 > URL: https://issues.apache.org/jira/browse/HADOOP-14632 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.0.0-alpha1 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14632-001.patch > > > add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which > can improve the transfer speed. > Test example shows transfer performance has improved a lot. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14631) Distcp should add a default atomicWorkPath properties when using atomic or throw obvious Exception
[ https://issues.apache.org/jira/browse/HADOOP-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078250#comment-16078250 ] Hongyuan Li commented on HADOOP-14631: -- i don't know which is a goos idea ? to adding a default atomic workPath or throwing obvious Exception. > Distcp should add a default atomicWorkPath properties when using atomic or > throw obvious Exception > --- > > Key: HADOOP-14631 > URL: https://issues.apache.org/jira/browse/HADOOP-14631 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.7.3, 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > > Distcp should add a default AtomicWorkPath properties when using atomic > {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work > path, > {code} > if (context.shouldAtomicCommit()) { > Path workDir = context.getAtomicWorkPath(); > if (workDir == null) { > workDir = targetPath.getParent(); > } > workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() > + rand.nextInt()); > {code} > When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent > of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent > will be {{null}}, wich means > {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + > rand.nextInt());}} will throw a nullpoint exception. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14632) add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.
[ https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078227#comment-16078227 ] Hongyuan Li commented on HADOOP-14632: -- the performance test result will be added soon. > add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which > can improve the transfer speed. > -- > > Key: HADOOP-14632 > URL: https://issues.apache.org/jira/browse/HADOOP-14632 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14632-001.patch > > > add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which > can improve the transfer speed. > Test example shows transfer performance has improved a lot. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14632) add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.
[ https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14632: - Attachment: HADOOP-14632-001.patch attach a file > add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which > can improve the transfer speed. > -- > > Key: HADOOP-14632 > URL: https://issues.apache.org/jira/browse/HADOOP-14632 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14632-001.patch > > > add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which > can improve the transfer speed. > Test example shows transfer performance has improved a lot. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-14632) add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.
[ https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li reassigned HADOOP-14632: Assignee: Hongyuan Li > add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which > can improve the transfer speed. > -- > > Key: HADOOP-14632 > URL: https://issues.apache.org/jira/browse/HADOOP-14632 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Hongyuan Li >Assignee: Hongyuan Li > > add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which > can improve the transfer speed. > Test example shows transfer performance has improved a lot. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14632) add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.
Hongyuan Li created HADOOP-14632: Summary: add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed. Key: HADOOP-14632 URL: https://issues.apache.org/jira/browse/HADOOP-14632 Project: Hadoop Common Issue Type: Improvement Reporter: Hongyuan Li add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed. Test example shows transfer performance has improved a lot. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14631) Distcp should add a default atomicWorkPath properties when using atomic or throw obvious Exception
[ https://issues.apache.org/jira/browse/HADOOP-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14631: - Summary: Distcp should add a default atomicWorkPath properties when using atomic or throw obvious Exception (was: Distcp should add a default atomicWorkPath properties when using atomic or throw detailed Exception) > Distcp should add a default atomicWorkPath properties when using atomic or > throw obvious Exception > --- > > Key: HADOOP-14631 > URL: https://issues.apache.org/jira/browse/HADOOP-14631 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.7.3, 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > > Distcp should add a default AtomicWorkPath properties when using atomic > {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work > path, > {code} > if (context.shouldAtomicCommit()) { > Path workDir = context.getAtomicWorkPath(); > if (workDir == null) { > workDir = targetPath.getParent(); > } > workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() > + rand.nextInt()); > {code} > When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent > of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent > will be {{null}}, wich means > {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + > rand.nextInt());}} will throw a nullpoint exception. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14631) Distcp should add a default atomicWorkPath properties when using atomic or throw detailed Exception
[ https://issues.apache.org/jira/browse/HADOOP-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14631: - Summary: Distcp should add a default atomicWorkPath properties when using atomic or throw detailed Exception (was: Distcp should add a default AtomicWorkPath properties when using atomic) > Distcp should add a default atomicWorkPath properties when using atomic or > throw detailed Exception > > > Key: HADOOP-14631 > URL: https://issues.apache.org/jira/browse/HADOOP-14631 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.7.3, 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > > Distcp should add a default AtomicWorkPath properties when using atomic > {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work > path, > {code} > if (context.shouldAtomicCommit()) { > Path workDir = context.getAtomicWorkPath(); > if (workDir == null) { > workDir = targetPath.getParent(); > } > workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() > + rand.nextInt()); > {code} > When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent > of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent > will be {{null}}, wich means > {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + > rand.nextInt());}} will throw a nullpoint exception. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14631) Distcp should add a default AtomicWorkPath properties when using atomic
[ https://issues.apache.org/jira/browse/HADOOP-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14631: - Description: Distcp should add a default AtomicWorkPath properties when using atomic {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work path, {code} if (context.shouldAtomicCommit()) { Path workDir = context.getAtomicWorkPath(); if (workDir == null) { workDir = targetPath.getParent(); } workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + rand.nextInt()); [code} When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent will be {{null}}, wich means {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + rand.nextInt());}} will throw a nullpoint exception. was: Distcp should add a default AtomicWorkPath properties when using atomic {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work path {code} if (context.shouldAtomicCommit()) { Path workDir = context.getAtomicWorkPath(); if (workDir == null) { workDir = targetPath.getParent(); } workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + rand.nextInt()); [code} When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent will be {{null}}, wich means {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + rand.nextInt());}} will throw a nullpoint exception. > Distcp should add a default AtomicWorkPath properties when using atomic > > > Key: HADOOP-14631 > URL: https://issues.apache.org/jira/browse/HADOOP-14631 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.7.3, 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > > Distcp should add a default AtomicWorkPath properties when using atomic > {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work > path, > {code} > if (context.shouldAtomicCommit()) { > Path workDir = context.getAtomicWorkPath(); > if (workDir == null) { > workDir = targetPath.getParent(); > } > workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() > + rand.nextInt()); > [code} > When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent > of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent > will be {{null}}, wich means > {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + > rand.nextInt());}} will throw a nullpoint exception. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14631) Distcp should add a default AtomicWorkPath properties when using atomic
[ https://issues.apache.org/jira/browse/HADOOP-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14631: - Description: Distcp should add a default AtomicWorkPath properties when using atomic {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work path, {code} if (context.shouldAtomicCommit()) { Path workDir = context.getAtomicWorkPath(); if (workDir == null) { workDir = targetPath.getParent(); } workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + rand.nextInt()); {code} When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent will be {{null}}, wich means {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + rand.nextInt());}} will throw a nullpoint exception. was: Distcp should add a default AtomicWorkPath properties when using atomic {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work path, {code} if (context.shouldAtomicCommit()) { Path workDir = context.getAtomicWorkPath(); if (workDir == null) { workDir = targetPath.getParent(); } workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + rand.nextInt()); [code} When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent will be {{null}}, wich means {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + rand.nextInt());}} will throw a nullpoint exception. > Distcp should add a default AtomicWorkPath properties when using atomic > > > Key: HADOOP-14631 > URL: https://issues.apache.org/jira/browse/HADOOP-14631 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.7.3, 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > > Distcp should add a default AtomicWorkPath properties when using atomic > {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work > path, > {code} > if (context.shouldAtomicCommit()) { > Path workDir = context.getAtomicWorkPath(); > if (workDir == null) { > workDir = targetPath.getParent(); > } > workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() > + rand.nextInt()); > {code} > When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent > of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent > will be {{null}}, wich means > {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + > rand.nextInt());}} will throw a nullpoint exception. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14631) Distcp should add a default AtomicWorkPath properties when using atomic
[ https://issues.apache.org/jira/browse/HADOOP-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14631: - Affects Version/s: 2.7.3 3.0.0-alpha3 > Distcp should add a default AtomicWorkPath properties when using atomic > > > Key: HADOOP-14631 > URL: https://issues.apache.org/jira/browse/HADOOP-14631 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.7.3, 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > > Distcp should add a default AtomicWorkPath properties when using atomic > {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work > path, > {code} > if (context.shouldAtomicCommit()) { > Path workDir = context.getAtomicWorkPath(); > if (workDir == null) { > workDir = targetPath.getParent(); > } > workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() > + rand.nextInt()); > [code} > When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent > of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent > will be {{null}}, wich means > {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + > rand.nextInt());}} will throw a nullpoint exception. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14631) Distcp should add a default AtomicWorkPath properties when using atomic
[ https://issues.apache.org/jira/browse/HADOOP-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14631: - Description: Distcp should add a default AtomicWorkPath properties when using atomic {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work path {code} if (context.shouldAtomicCommit()) { Path workDir = context.getAtomicWorkPath(); if (workDir == null) { workDir = targetPath.getParent(); } workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + rand.nextInt()); [code} When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent will be {{null}}, wich means {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + rand.nextInt());}} will throw a nullpoint exception. was: Distcp should add a default AtomicWorkPath properties when using atomic {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work path {code} if (context.shouldAtomicCommit()) { Path workDir = context.getAtomicWorkPath(); if (workDir == null) { workDir = targetPath.getParent(); } workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + rand.nextInt()); [code} When atomic is set and tAtomicWorkPath == null, distcp will get the parent of current WorkDir. In this case, if {{workdir}} is {{"/"}}, the parent will be {{null}}, wich means {{ workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + rand.nextInt());}} will throw a nullpoint exception. > Distcp should add a default AtomicWorkPath properties when using atomic > > > Key: HADOOP-14631 > URL: https://issues.apache.org/jira/browse/HADOOP-14631 > Project: Hadoop Common > Issue Type: Bug >Reporter: Hongyuan Li > > Distcp should add a default AtomicWorkPath properties when using atomic > {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work > path > {code} > if (context.shouldAtomicCommit()) { > Path workDir = context.getAtomicWorkPath(); > if (workDir == null) { > workDir = targetPath.getParent(); > } > workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() > + rand.nextInt()); > [code} > When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent > of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent > will be {{null}}, wich means > {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + > rand.nextInt());}} will throw a nullpoint exception. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-14631) Distcp should add a default AtomicWorkPath properties when using atomic
[ https://issues.apache.org/jira/browse/HADOOP-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li reassigned HADOOP-14631: Assignee: Hongyuan Li > Distcp should add a default AtomicWorkPath properties when using atomic > > > Key: HADOOP-14631 > URL: https://issues.apache.org/jira/browse/HADOOP-14631 > Project: Hadoop Common > Issue Type: Bug >Reporter: Hongyuan Li >Assignee: Hongyuan Li > > Distcp should add a default AtomicWorkPath properties when using atomic > {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work > path > {code} > if (context.shouldAtomicCommit()) { > Path workDir = context.getAtomicWorkPath(); > if (workDir == null) { > workDir = targetPath.getParent(); > } > workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() > + rand.nextInt()); > [code} > When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent > of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent > will be {{null}}, wich means > {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + > rand.nextInt());}} will throw a nullpoint exception. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14631) Distcp should add a default AtomicWorkPath properties when using atomic
Hongyuan Li created HADOOP-14631: Summary: Distcp should add a default AtomicWorkPath properties when using atomic Key: HADOOP-14631 URL: https://issues.apache.org/jira/browse/HADOOP-14631 Project: Hadoop Common Issue Type: Bug Reporter: Hongyuan Li Distcp should add a default AtomicWorkPath properties when using atomic {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work path {code} if (context.shouldAtomicCommit()) { Path workDir = context.getAtomicWorkPath(); if (workDir == null) { workDir = targetPath.getParent(); } workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + rand.nextInt()); [code} When atomic is set and tAtomicWorkPath == null, distcp will get the parent of current WorkDir. In this case, if {{workdir}} is {{"/"}}, the parent will be {{null}}, wich means {{ workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + rand.nextInt());}} will throw a nullpoint exception. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-12802) local FileContext does not rename .crc file
[ https://issues.apache.org/jira/browse/HADOOP-12802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077727#comment-16077727 ] Hongyuan Li edited comment on HADOOP-12802 at 7/7/17 7:22 AM: -- [~boky01], i don't think so , the implement of {{rename}} of LocalFileContext is different from {{LocalFileSystem}}#{{rename}}. The junit test failure of reporter is caused by the nonexistance of file {{newCrcPath}}, set {{fs.AbstractFileSystem.file.impl}} to {{org.apache.hadoop.fs.LocalFileSystem}} may solve this problem. [~y0un5] *Update* forgot this, i misktaked the version of hadoop was (Author: hongyuan li): [~boky01], i don't think so , the implement of {{rename}} of LocalFileContext is different from {{LocalFileSystem}}#{{rename}}. The junit test failure of reporter is caused by the nonexistance of file {{newCrcPath}}, set {{fs.AbstractFileSystem.file.impl}} to {{org.apache.hadoop.fs.LocalFileSystem}} may solve this problem. [~y0un5] *Update* forgot this, the latest patch seems solved this. > local FileContext does not rename .crc file > --- > > Key: HADOOP-12802 > URL: https://issues.apache.org/jira/browse/HADOOP-12802 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.7.2, 3.0.0-alpha1 >Reporter: Youngjoon Kim >Assignee: Andras Bokor > Attachments: HADOOP-12802.01.patch > > > After run the following code, "old" file is renamed to "new", but ".old.crc" > is not renamed to ".new.crc" > {code} > Path oldPath = new Path("/tmp/old"); > Path newPath = new Path("/tmp/new"); > Configuration conf = new Configuration(); > FileContext fc = FileContext.getLocalFSFileContext(conf); > FSDataOutputStream out = fc.create(oldPath, EnumSet.of(CreateFlag.CREATE)); > out.close(); > fc.rename(oldPath, newPath); > {code} > On the other hand, local FileSystem successfully renames .crc file. > {code} > Path oldPath = new Path("/tmp/old"); > Path newPath = new Path("/tmp/new"); > Configuration conf = new Configuration(); > FileSystem fs = FileSystem.getLocal(conf); > FSDataOutputStream out = fs.create(oldPath); > out.close(); > fs.rename(oldPath, newPath); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-12802) local FileContext does not rename .crc file
[ https://issues.apache.org/jira/browse/HADOOP-12802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077727#comment-16077727 ] Hongyuan Li edited comment on HADOOP-12802 at 7/7/17 7:21 AM: -- [~boky01], i don't think so , the implement of {{rename}} of LocalFileContext is different from {{LocalFileSystem}}#{{rename}}. The junit test failure of reporter is caused by the nonexistance of file {{newCrcPath}}, set {{fs.AbstractFileSystem.file.impl}} to {{org.apache.hadoop.fs.LocalFileSystem}} may solve this problem. [~y0un5] *Update* forgot this, the latest patch seems solved this. was (Author: hongyuan li): [~boky01], i don't think so , the implement of {{rename}} of LocalFileContext is different from {{LocalFileSystem}}#{{rename}}. The junit test failure of reporter is caused by the nonexistance of file {{newCrcPath}}, set {{fs.AbstractFileSystem.file.impl}} to {{org.apache.hadoop.fs.LocalFileSystem}} may solve this problem. [~y0un5] > local FileContext does not rename .crc file > --- > > Key: HADOOP-12802 > URL: https://issues.apache.org/jira/browse/HADOOP-12802 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.7.2, 3.0.0-alpha1 >Reporter: Youngjoon Kim >Assignee: Andras Bokor > Attachments: HADOOP-12802.01.patch > > > After run the following code, "old" file is renamed to "new", but ".old.crc" > is not renamed to ".new.crc" > {code} > Path oldPath = new Path("/tmp/old"); > Path newPath = new Path("/tmp/new"); > Configuration conf = new Configuration(); > FileContext fc = FileContext.getLocalFSFileContext(conf); > FSDataOutputStream out = fc.create(oldPath, EnumSet.of(CreateFlag.CREATE)); > out.close(); > fc.rename(oldPath, newPath); > {code} > On the other hand, local FileSystem successfully renames .crc file. > {code} > Path oldPath = new Path("/tmp/old"); > Path newPath = new Path("/tmp/new"); > Configuration conf = new Configuration(); > FileSystem fs = FileSystem.getLocal(conf); > FSDataOutputStream out = fs.create(oldPath); > out.close(); > fs.rename(oldPath, newPath); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-12802) local FileContext does not rename .crc file
[ https://issues.apache.org/jira/browse/HADOOP-12802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077727#comment-16077727 ] Hongyuan Li edited comment on HADOOP-12802 at 7/7/17 7:17 AM: -- [~boky01], i don't think so , the implement of {{rename}} of LocalFileContext is different from {{LocalFileSystem}}#{{rename}}. The junit test failure of reporter is caused by the nonexistance of file {{newCrcPath}}, set {{fs.AbstractFileSystem.file.impl}} to {{org.apache.hadoop.fs.LocalFileSystem}} may solve this problem. [~y0un5] was (Author: hongyuan li): [~boky01]], i don't think so , the implement of {{rename}} of LocalFileContext is different from {{LocalFileSystem}}#{{rename}}. The junit test failure of reporter is caused by the nonexistance of file {{newCrcPath}}, set {{fs.AbstractFileSystem.file.impl}} to {{org.apache.hadoop.fs.LocalFileSystem}} may solve this problem. [~y0un5] > local FileContext does not rename .crc file > --- > > Key: HADOOP-12802 > URL: https://issues.apache.org/jira/browse/HADOOP-12802 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.7.2, 3.0.0-alpha1 >Reporter: Youngjoon Kim >Assignee: Andras Bokor > Attachments: HADOOP-12802.01.patch > > > After run the following code, "old" file is renamed to "new", but ".old.crc" > is not renamed to ".new.crc" > {code} > Path oldPath = new Path("/tmp/old"); > Path newPath = new Path("/tmp/new"); > Configuration conf = new Configuration(); > FileContext fc = FileContext.getLocalFSFileContext(conf); > FSDataOutputStream out = fc.create(oldPath, EnumSet.of(CreateFlag.CREATE)); > out.close(); > fc.rename(oldPath, newPath); > {code} > On the other hand, local FileSystem successfully renames .crc file. > {code} > Path oldPath = new Path("/tmp/old"); > Path newPath = new Path("/tmp/new"); > Configuration conf = new Configuration(); > FileSystem fs = FileSystem.getLocal(conf); > FSDataOutputStream out = fs.create(oldPath); > out.close(); > fs.rename(oldPath, newPath); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-12802) local FileContext does not rename .crc file
[ https://issues.apache.org/jira/browse/HADOOP-12802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077727#comment-16077727 ] Hongyuan Li edited comment on HADOOP-12802 at 7/7/17 7:17 AM: -- [~boky01]], i don't think so , the implement of {{rename}} of LocalFileContext is different from {{LocalFileSystem}}#{{rename}}. The junit test failure of reporter is caused by the nonexistance of file {{newCrcPath}}, set {{fs.AbstractFileSystem.file.impl}} to {{org.apache.hadoop.fs.LocalFileSystem}} may solve this problem. [~y0un5] was (Author: hongyuan li): [~Andras Bokor], i don't think so , the implement of {{rename}} of LocalFileContext is different from {{LocalFileSystem}}#{{rename}}. The junit test failure of reporter is caused by the nonexistance of file {{newCrcPath}}, set {{fs.AbstractFileSystem.file.impl}} to {{org.apache.hadoop.fs.LocalFileSystem}} may solve this problem. [~y0un5] > local FileContext does not rename .crc file > --- > > Key: HADOOP-12802 > URL: https://issues.apache.org/jira/browse/HADOOP-12802 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.7.2, 3.0.0-alpha1 >Reporter: Youngjoon Kim >Assignee: Andras Bokor > Attachments: HADOOP-12802.01.patch > > > After run the following code, "old" file is renamed to "new", but ".old.crc" > is not renamed to ".new.crc" > {code} > Path oldPath = new Path("/tmp/old"); > Path newPath = new Path("/tmp/new"); > Configuration conf = new Configuration(); > FileContext fc = FileContext.getLocalFSFileContext(conf); > FSDataOutputStream out = fc.create(oldPath, EnumSet.of(CreateFlag.CREATE)); > out.close(); > fc.rename(oldPath, newPath); > {code} > On the other hand, local FileSystem successfully renames .crc file. > {code} > Path oldPath = new Path("/tmp/old"); > Path newPath = new Path("/tmp/new"); > Configuration conf = new Configuration(); > FileSystem fs = FileSystem.getLocal(conf); > FSDataOutputStream out = fs.create(oldPath); > out.close(); > fs.rename(oldPath, newPath); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-12802) local FileContext does not rename .crc file
[ https://issues.apache.org/jira/browse/HADOOP-12802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077727#comment-16077727 ] Hongyuan Li commented on HADOOP-12802: -- [~Andras Bokor], i don't think so , the implement of {{rename}} of LocalFileContext is different from {{LocalFileSystem}}#{{rename}}. The junit test failure of reporter is caused by the nonexistance of file {{newCrcPath}}, set {{fs.AbstractFileSystem.file.impl}} to {{org.apache.hadoop.fs.LocalFileSystem}} may solve this problem. [~y0un5] > local FileContext does not rename .crc file > --- > > Key: HADOOP-12802 > URL: https://issues.apache.org/jira/browse/HADOOP-12802 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.7.2, 3.0.0-alpha1 >Reporter: Youngjoon Kim >Assignee: Andras Bokor > Attachments: HADOOP-12802.01.patch > > > After run the following code, "old" file is renamed to "new", but ".old.crc" > is not renamed to ".new.crc" > {code} > Path oldPath = new Path("/tmp/old"); > Path newPath = new Path("/tmp/new"); > Configuration conf = new Configuration(); > FileContext fc = FileContext.getLocalFSFileContext(conf); > FSDataOutputStream out = fc.create(oldPath, EnumSet.of(CreateFlag.CREATE)); > out.close(); > fc.rename(oldPath, newPath); > {code} > On the other hand, local FileSystem successfully renames .crc file. > {code} > Path oldPath = new Path("/tmp/old"); > Path newPath = new Path("/tmp/new"); > Configuration conf = new Configuration(); > FileSystem fs = FileSystem.getLocal(conf); > FSDataOutputStream out = fs.create(oldPath); > out.close(); > fs.rename(oldPath, newPath); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14444) New implementation of ftp and sftp filesystems
[ https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077520#comment-16077520 ] Hongyuan Li edited comment on HADOOP-1 at 7/7/17 3:24 AM: -- socket is complex, i don't like to open a new socket just to seek. Jsch and commons-net has plenty of examples, so if you want to make full use of it, you should deep into their implements.Also, commons-net's setTimeOut like method may stuck in some situations when the network environment is very poor. was (Author: hongyuan li): socket is complex, i don't like to open a new socket just to seek. > New implementation of ftp and sftp filesystems > -- > > Key: HADOOP-1 > URL: https://issues.apache.org/jira/browse/HADOOP-1 > Project: Hadoop Common > Issue Type: New Feature > Components: fs >Affects Versions: 2.8.0 >Reporter: Lukas Waldmann >Assignee: Lukas Waldmann > Attachments: HADOOP-1.2.patch, HADOOP-1.3.patch, > HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.patch > > > Current implementation of FTP and SFTP filesystems have severe limitations > and performance issues when dealing with high number of files. Mine patch > solve those issues and integrate both filesystems such a way that most of the > core functionality is common for both and therefore simplifying the > maintainability. > The core features: > * Support for HTTP/SOCKS proxies > * Support for passive FTP > * Support of connection pooling - new connection is not created for every > single command but reused from the pool. > For huge number of files it shows order of magnitude performance improvement > over not pooled connections. > * Caching of directory trees. For ftp you always need to list whole directory > whenever you ask information about particular file. > Again for huge number of files it shows order of magnitude performance > improvement over not cached connections. > * Support of keep alive (NOOP) messages to avoid connection drops > * Support for Unix style or regexp wildcard glob - useful for listing a > particular files across whole directory tree > * Support for reestablishing broken ftp data transfers - can happen > surprisingly often -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14444) New implementation of ftp and sftp filesystems
[ https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077520#comment-16077520 ] Hongyuan Li commented on HADOOP-1: -- socket is complex, i don't like to open a new socket just to seek. > New implementation of ftp and sftp filesystems > -- > > Key: HADOOP-1 > URL: https://issues.apache.org/jira/browse/HADOOP-1 > Project: Hadoop Common > Issue Type: New Feature > Components: fs >Affects Versions: 2.8.0 >Reporter: Lukas Waldmann >Assignee: Lukas Waldmann > Attachments: HADOOP-1.2.patch, HADOOP-1.3.patch, > HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.patch > > > Current implementation of FTP and SFTP filesystems have severe limitations > and performance issues when dealing with high number of files. Mine patch > solve those issues and integrate both filesystems such a way that most of the > core functionality is common for both and therefore simplifying the > maintainability. > The core features: > * Support for HTTP/SOCKS proxies > * Support for passive FTP > * Support of connection pooling - new connection is not created for every > single command but reused from the pool. > For huge number of files it shows order of magnitude performance improvement > over not pooled connections. > * Caching of directory trees. For ftp you always need to list whole directory > whenever you ask information about particular file. > Again for huge number of files it shows order of magnitude performance > improvement over not cached connections. > * Support of keep alive (NOOP) messages to avoid connection drops > * Support for Unix style or regexp wildcard glob - useful for listing a > particular files across whole directory tree > * Support for reestablishing broken ftp data transfers - can happen > surprisingly often -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076628#comment-16076628 ] Hongyuan Li edited comment on HADOOP-14623 at 7/7/17 3:19 AM: -- futuremore, flush method is to confirm that data has been written. *Update/Crorrection* sorry, it is the {{putMetrics}} method. in {{KafkaSink}}#{{putMetrics}} , code lists below, which makes me have a different opinion: {code} …… Future future = producer.send(data); jsonLines.setLength(0); try { future.get(); } catch (InterruptedException e) { throw new MetricsException("Error sending data", e); } catch (ExecutionException e) { throw new MetricsException("Error sending data", e); } …… {code} was (Author: hongyuan li): futuremore, flush method is to confirm that data has been written. > KafkaSink#init should set acks to 1,not 0 > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076628#comment-16076628 ] Hongyuan Li commented on HADOOP-14623: -- futuremore, flush method is to confirm that data has been written. > KafkaSink#init should set acks to 1,not 0 > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076154#comment-16076154 ] Hongyuan Li commented on HADOOP-14623: -- i don't think so, setting it to 1 does not means that it will block.However, i think that Ganglia knows the frquency of data lossed, but kafka does not. What you have said under estimate kafka.Kafka has more power.Compared to complete sync of setting acks to -1, setting acks to 1 is a better choice. > KafkaSink#init should set acks to 1,not 0 > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13743) error message in AzureNativeFileSystemStore.connectUsingAnonymousCredentials has too many spaces
[ https://issues.apache.org/jira/browse/HADOOP-13743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074908#comment-16074908 ] Hongyuan Li commented on HADOOP-13743: -- why not using String.format or stringBuilder ? > error message in AzureNativeFileSystemStore.connectUsingAnonymousCredentials > has too many spaces > > > Key: HADOOP-13743 > URL: https://issues.apache.org/jira/browse/HADOOP-13743 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 2.8.0, 2.7.3 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Trivial > Attachments: HADOOP-13743-branch-2-001.patch, > HADOOP-14373-branch-2-002.patch > > > The error message on a failed hadoop fs -ls command against an unauthed azure > container has an extra space in {{" them in"}} > {code} > ls: org.apache.hadoop.fs.azure.AzureException: Unable to access container > demo in account example.blob.core.windows.net using anonymous credentials, > and no credentials found for them in the configuration. > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14444) New implementation of ftp and sftp filesystems
[ https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074831#comment-16074831 ] Hongyuan Li edited comment on HADOOP-1 at 7/5/17 2:46 PM: -- 1、if you rename {{AbstractFTPFileSystem}} to anything else, i would be more happier.This class name may make users thinking sftp is one of ftp, however, it isn't. So, please rename it to anything without ftp? 2、{{FTPClient}}#{{retrieStream}} will open a new Data connection, which is the reason why i dislike the seek ops. *Update* you said {{SFTPChannel}}#{{close}} just to resue session , but why do you disconnect channelSftp was (Author: hongyuan li): 1、if you rename {{AbstractFTPFileSystem}} to anything else, i would be more happier.This class name may make users thinking sftp is one of ftp, however, it isn't. So, please rename it to anything without ftp? 2、{{FTPClient}}#{{retrieStream}} will open a new Data connection, which is the reason why i dislike the seek ops. > New implementation of ftp and sftp filesystems > -- > > Key: HADOOP-1 > URL: https://issues.apache.org/jira/browse/HADOOP-1 > Project: Hadoop Common > Issue Type: New Feature > Components: fs >Affects Versions: 2.8.0 >Reporter: Lukas Waldmann >Assignee: Lukas Waldmann > Attachments: HADOOP-1.2.patch, HADOOP-1.3.patch, > HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.patch > > > Current implementation of FTP and SFTP filesystems have severe limitations > and performance issues when dealing with high number of files. Mine patch > solve those issues and integrate both filesystems such a way that most of the > core functionality is common for both and therefore simplifying the > maintainability. > The core features: > * Support for HTTP/SOCKS proxies > * Support for passive FTP > * Support of connection pooling - new connection is not created for every > single command but reused from the pool. > For huge number of files it shows order of magnitude performance improvement > over not pooled connections. > * Caching of directory trees. For ftp you always need to list whole directory > whenever you ask information about particular file. > Again for huge number of files it shows order of magnitude performance > improvement over not cached connections. > * Support of keep alive (NOOP) messages to avoid connection drops > * Support for Unix style or regexp wildcard glob - useful for listing a > particular files across whole directory tree > * Support for reestablishing broken ftp data transfers - can happen > surprisingly often -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14444) New implementation of ftp and sftp filesystems
[ https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074831#comment-16074831 ] Hongyuan Li edited comment on HADOOP-1 at 7/5/17 2:36 PM: -- 1、if you rename {{AbstractFTPFileSystem}} to anything else, i would be more happier.This class name may make users thinking sftp is one of ftp, however, it isn't. So, please rename it to anything without ftp? 2、{{FTPClient}}#{{retrieStream}} will open a new Data connection, which is the reason why i dislike the seek ops. was (Author: hongyuan li): 1、if you rename "AbstractFTPFileSystem" to anything else, i would be more happier.This class name may make users thinking sftp is one of ftp, however, it isn't. 2、FTPClient#retrieStream will open a new Data connection, which is the reason why i dislike the seek ops. > New implementation of ftp and sftp filesystems > -- > > Key: HADOOP-1 > URL: https://issues.apache.org/jira/browse/HADOOP-1 > Project: Hadoop Common > Issue Type: New Feature > Components: fs >Affects Versions: 2.8.0 >Reporter: Lukas Waldmann >Assignee: Lukas Waldmann > Attachments: HADOOP-1.2.patch, HADOOP-1.3.patch, > HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.patch > > > Current implementation of FTP and SFTP filesystems have severe limitations > and performance issues when dealing with high number of files. Mine patch > solve those issues and integrate both filesystems such a way that most of the > core functionality is common for both and therefore simplifying the > maintainability. > The core features: > * Support for HTTP/SOCKS proxies > * Support for passive FTP > * Support of connection pooling - new connection is not created for every > single command but reused from the pool. > For huge number of files it shows order of magnitude performance improvement > over not pooled connections. > * Caching of directory trees. For ftp you always need to list whole directory > whenever you ask information about particular file. > Again for huge number of files it shows order of magnitude performance > improvement over not cached connections. > * Support of keep alive (NOOP) messages to avoid connection drops > * Support for Unix style or regexp wildcard glob - useful for listing a > particular files across whole directory tree > * Support for reestablishing broken ftp data transfers - can happen > surprisingly often -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074856#comment-16074856 ] Hongyuan Li edited comment on HADOOP-14623 at 7/5/17 2:35 PM: -- [~jojochuang] hard to write an only junit test to test it. the infos about acks is from kafka document : {code} *request.required.acks* // using old Producer api or the version of kafka is less than 0.9.x or *acks* // using new Producer api and kafka version more than 0.9.x This value controls when a produce request is considered completed. Specifically, how many other brokers must have committed the data to their log and acknowledged this to the leader? Typical values are 0, which means that the producer never waits for an acknowledgement from the broker (the same behavior as 0.7). This option provides the lowest latency but the weakest durability guarantees (some data will be lost when a server fails). 1, which means that the producer gets an acknowledgement after the leader replica has received the data. This option provides better durability as the client waits until the server acknowledges the request as successful (only messages that were written to the now-dead leader but not yet replicated will be lost). -1, which means that the producer gets an acknowledgement after all in-sync replicas have received the data. This option provides the best durability, we guarantee that no messages will be lost as long as at least one in sync replica remains. {code} [DocumentationKafka 0.8.2|http://kafka.apache.org/082/documentation.html] [Documentation Kafka 0.9.0|http://kafka.apache.org/090/documentation.html] FROM the link below, if you use kafka below 0.9.x, should set {{request.required.acks = 1}} at least.When use new Producer above 0.9.x, should set {{acks = 1}} at least. was (Author: hongyuan li): [~jojochuang] hard to write an only junit test to test it. the infos about acks is from kafka document : {code} {{request.required.acks}} // using old Producer api or the version of kafka is less than 0.9.x or {{acks}} // using new Producer api and kafka version more than 0.9.x This value controls when a produce request is considered completed. Specifically, how many other brokers must have committed the data to their log and acknowledged this to the leader? Typical values are 0, which means that the producer never waits for an acknowledgement from the broker (the same behavior as 0.7). This option provides the lowest latency but the weakest durability guarantees (some data will be lost when a server fails). 1, which means that the producer gets an acknowledgement after the leader replica has received the data. This option provides better durability as the client waits until the server acknowledges the request as successful (only messages that were written to the now-dead leader but not yet replicated will be lost). -1, which means that the producer gets an acknowledgement after all in-sync replicas have received the data. This option provides the best durability, we guarantee that no messages will be lost as long as at least one in sync replica remains. {code} [DocumentationKafka 0.8.2|http://kafka.apache.org/082/documentation.html] [Documentation Kafka 0.9.0|http://kafka.apache.org/090/documentation.html] FROM the link below, if you use kafka below 0.9.x, should set {{request.required.acks = 1}} at least.When use new Producer above 0.9.x, should set {{acks = 1}} at least. > KafkaSink#init should set acks to 1,not 0 > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074856#comment-16074856 ] Hongyuan Li edited comment on HADOOP-14623 at 7/5/17 2:34 PM: -- [~jojochuang] hard to write an only junit test to test it. the infos about acks is from kafka document : {code} {{request.required.acks}} // using old Producer api or the version of kafka is less than 0.9.x or {{acks}} // using new Producer api and kafka version more than 0.9.x This value controls when a produce request is considered completed. Specifically, how many other brokers must have committed the data to their log and acknowledged this to the leader? Typical values are 0, which means that the producer never waits for an acknowledgement from the broker (the same behavior as 0.7). This option provides the lowest latency but the weakest durability guarantees (some data will be lost when a server fails). 1, which means that the producer gets an acknowledgement after the leader replica has received the data. This option provides better durability as the client waits until the server acknowledges the request as successful (only messages that were written to the now-dead leader but not yet replicated will be lost). -1, which means that the producer gets an acknowledgement after all in-sync replicas have received the data. This option provides the best durability, we guarantee that no messages will be lost as long as at least one in sync replica remains. {code} [DocumentationKafka 0.8.2|http://kafka.apache.org/082/documentation.html] [Documentation Kafka 0.9.0|http://kafka.apache.org/090/documentation.html] FROM the link below, if you use kafka below 0.9.x, should set {{request.required.acks = 1}} at least.When use new Producer above 0.9.x, should set {{acks = 1}} at least. was (Author: hongyuan li): [~jojochuang] hard to write an only junit test to test it. the infos about acks is from kafka document : {code} request.required.acks This value controls when a produce request is considered completed. Specifically, how many other brokers must have committed the data to their log and acknowledged this to the leader? Typical values are 0, which means that the producer never waits for an acknowledgement from the broker (the same behavior as 0.7). This option provides the lowest latency but the weakest durability guarantees (some data will be lost when a server fails). 1, which means that the producer gets an acknowledgement after the leader replica has received the data. This option provides better durability as the client waits until the server acknowledges the request as successful (only messages that were written to the now-dead leader but not yet replicated will be lost). -1, which means that the producer gets an acknowledgement after all in-sync replicas have received the data. This option provides the best durability, we guarantee that no messages will be lost as long as at least one in sync replica remains. {code} [DocumentationKafka 0.8.2|http://kafka.apache.org/082/documentation.html] FROM the link below, if you use kafka below 0.9.x, should set request.required.acks = 1 at least.When use new Producer above 0.9.x, should set acks = 1 at least. > KafkaSink#init should set acks to 1,not 0 > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074856#comment-16074856 ] Hongyuan Li edited comment on HADOOP-14623 at 7/5/17 2:30 PM: -- [~jojochuang] hard to write an only junit test to test it. the infos about acks is from kafka document : {code} request.required.acks This value controls when a produce request is considered completed. Specifically, how many other brokers must have committed the data to their log and acknowledged this to the leader? Typical values are 0, which means that the producer never waits for an acknowledgement from the broker (the same behavior as 0.7). This option provides the lowest latency but the weakest durability guarantees (some data will be lost when a server fails). 1, which means that the producer gets an acknowledgement after the leader replica has received the data. This option provides better durability as the client waits until the server acknowledges the request as successful (only messages that were written to the now-dead leader but not yet replicated will be lost). -1, which means that the producer gets an acknowledgement after all in-sync replicas have received the data. This option provides the best durability, we guarantee that no messages will be lost as long as at least one in sync replica remains. {code} [DocumentationKafka 0.8.2|http://kafka.apache.org/082/documentation.html] FROM the link below, if you use kafka below 0.9.x, should set request.required.acks = 1 at least.When use new Producer above 0.9.x, should set acks = 1 at least. was (Author: hongyuan li): [~jojochuang] i will try to test it, but iam not sure it can be relised. the infos about acks is from kafka document : {code} request.required.acks This value controls when a produce request is considered completed. Specifically, how many other brokers must have committed the data to their log and acknowledged this to the leader? Typical values are 0, which means that the producer never waits for an acknowledgement from the broker (the same behavior as 0.7). This option provides the lowest latency but the weakest durability guarantees (some data will be lost when a server fails). 1, which means that the producer gets an acknowledgement after the leader replica has received the data. This option provides better durability as the client waits until the server acknowledges the request as successful (only messages that were written to the now-dead leader but not yet replicated will be lost). -1, which means that the producer gets an acknowledgement after all in-sync replicas have received the data. This option provides the best durability, we guarantee that no messages will be lost as long as at least one in sync replica remains. {code} [DocumentationKafka 0.8.2|http://kafka.apache.org/082/documentation.html] FROM the link below, if you use kafka below 0.9.x, should set request.required.acks = 1 at least.When use new Producer above 0.9.x, should set acks = 1 at least. > KafkaSink#init should set acks to 1,not 0 > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074856#comment-16074856 ] Hongyuan Li edited comment on HADOOP-14623 at 7/5/17 2:30 PM: -- [~jojochuang] i will try to test it, but iam not sure it can be relised. the infos about acks is from kafka document : {code} request.required.acks This value controls when a produce request is considered completed. Specifically, how many other brokers must have committed the data to their log and acknowledged this to the leader? Typical values are 0, which means that the producer never waits for an acknowledgement from the broker (the same behavior as 0.7). This option provides the lowest latency but the weakest durability guarantees (some data will be lost when a server fails). 1, which means that the producer gets an acknowledgement after the leader replica has received the data. This option provides better durability as the client waits until the server acknowledges the request as successful (only messages that were written to the now-dead leader but not yet replicated will be lost). -1, which means that the producer gets an acknowledgement after all in-sync replicas have received the data. This option provides the best durability, we guarantee that no messages will be lost as long as at least one in sync replica remains. {code} [DocumentationKafka 0.8.2|http://kafka.apache.org/082/documentation.html] FROM the link below, if you use kafka below 0.9.x, should set request.required.acks = 1 at least.When use new Producer above 0.9.x, should set acks = 1 at least. was (Author: hongyuan li): [~jojochuang] i will try to test it, but iam not sure it can be relised. the infos about acks is from kafka document : {code} request.required.acks This value controls when a produce request is considered completed. Specifically, how many other brokers must have committed the data to their log and acknowledged this to the leader? Typical values are 0, which means that the producer never waits for an acknowledgement from the broker (the same behavior as 0.7). This option provides the lowest latency but the weakest durability guarantees (some data will be lost when a server fails). 1, which means that the producer gets an acknowledgement after the leader replica has received the data. This option provides better durability as the client waits until the server acknowledges the request as successful (only messages that were written to the now-dead leader but not yet replicated will be lost). -1, which means that the producer gets an acknowledgement after all in-sync replicas have received the data. This option provides the best durability, we guarantee that no messages will be lost as long as at least one in sync replica remains. {code} [DocumentationKafka 0.8.1|http://kafka.apache.org/082/documentation.html] FROM the link below, if you use kafka below 0.9.x, should set request.required.acks = 1 at least.When use new Producer above 0.9.x, should set acks = 1 at least. > KafkaSink#init should set acks to 1,not 0 > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074856#comment-16074856 ] Hongyuan Li commented on HADOOP-14623: -- [~jojochuang] i will try to test it, but iam not sure it can be relised. the infos about acks is from kafka document : {code} request.required.acks This value controls when a produce request is considered completed. Specifically, how many other brokers must have committed the data to their log and acknowledged this to the leader? Typical values are 0, which means that the producer never waits for an acknowledgement from the broker (the same behavior as 0.7). This option provides the lowest latency but the weakest durability guarantees (some data will be lost when a server fails). 1, which means that the producer gets an acknowledgement after the leader replica has received the data. This option provides better durability as the client waits until the server acknowledges the request as successful (only messages that were written to the now-dead leader but not yet replicated will be lost). -1, which means that the producer gets an acknowledgement after all in-sync replicas have received the data. This option provides the best durability, we guarantee that no messages will be lost as long as at least one in sync replica remains. {code} [DocumentationKafka 0.8.1|http://kafka.apache.org/082/documentation.html] FROM the link below, if you use kafka below 0.9.x, should set request.required.acks = 1 at least.When use new Producer above 0.9.x, should set acks = 1 at least. > KafkaSink#init should set acks to 1,not 0 > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14444) New implementation of ftp and sftp filesystems
[ https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074831#comment-16074831 ] Hongyuan Li commented on HADOOP-1: -- 1、if you rename "AbstractFTPFileSystem" to anything else, i would be more happier.This class name may make users thinking sftp is one of ftp, however, it isn't. 2、FTPClient#retrieStream will open a new Data connection, which is the reason why i dislike the seek ops. > New implementation of ftp and sftp filesystems > -- > > Key: HADOOP-1 > URL: https://issues.apache.org/jira/browse/HADOOP-1 > Project: Hadoop Common > Issue Type: New Feature > Components: fs >Affects Versions: 2.8.0 >Reporter: Lukas Waldmann >Assignee: Lukas Waldmann > Attachments: HADOOP-1.2.patch, HADOOP-1.3.patch, > HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.patch > > > Current implementation of FTP and SFTP filesystems have severe limitations > and performance issues when dealing with high number of files. Mine patch > solve those issues and integrate both filesystems such a way that most of the > core functionality is common for both and therefore simplifying the > maintainability. > The core features: > * Support for HTTP/SOCKS proxies > * Support for passive FTP > * Support of connection pooling - new connection is not created for every > single command but reused from the pool. > For huge number of files it shows order of magnitude performance improvement > over not pooled connections. > * Caching of directory trees. For ftp you always need to list whole directory > whenever you ask information about particular file. > Again for huge number of files it shows order of magnitude performance > improvement over not cached connections. > * Support of keep alive (NOOP) messages to avoid connection drops > * Support for Unix style or regexp wildcard glob - useful for listing a > particular files across whole directory tree > * Support for reestablishing broken ftp data transfers - can happen > surprisingly often -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14444) New implementation of ftp and sftp filesystems
[ https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074526#comment-16074526 ] Hongyuan Li commented on HADOOP-1: -- i have implemented it with some necessary feature. 在 "Steve Loughran (JIRA)",2017年7月5日 18:04写道: [ [1]https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074523#comment-16074523 ] Steve Loughran commented on HADOOP-[2]1: - I am watching this, but not putting any effort into looking at the code right now. Happy that the two of you are working together to come up with something which addresses your needs. # You don't need to have every feature in immediately, have one up to the level where it works slightly better than the current one, enough for it to be alongside the older version for one release, then cut the other version once stable (s3a, wasb, ADL, all have a one-release-to-stabilise experience). # regarding caching, I'd go for a name like {{[3]fs.ftp.cache.host}}, with the host value coming last. Otherwise you get into trouble with other options in future if a hostname matches it. Now, a quick scan through the latest patch h2. Build * all settings for things like java versions, artifact versions should be picked up from the base hadoop-project/[4]pom.xml ... we need to manage everything in one place h2. Tests I like the tests; these are a key part of any new feature * Use {{GenericTestUtils}} to work with logs; there's ongoing changes there for better SLF4J integration & log capture. Please avoid using log4j API calls direct * Add a test timeout rule to {{TestAbstractFTPFileSystem}}, name it {{AbstractFTPFileSystemTest}}. * Every test suite starting Test* should be able to be executed by yetus/jenkins, without any ftp server * Everything with Test* can be started without any endpoint configured, right? * Use {{ContractTestUtils}} to work with filesystems and assert about them (more diags on failure), especially for the {{assertPathExists() kind of assertion, which yuio can move to for things like testFileExists()}} * and use SLF4J logging, not {{[5]System.err}} * All assertTrue/assertFalse asserts should have a meaningful string, ideally even assertEquals. One trick: have the toString() value of the fs provide some details on the connection, so you can include it in the asserts. Another, pull out things like {{assertChannelConnected()}} and have the text in one place * {{[6]TestConnectionPool.testGetChannelFromClosedFS}}. If the unexpected IOE is caught, make it the inner cause of the AssertionError raised. * Lot of duplication in the contract test createContract() calls...could that be shared somehow? * Have some isolated tests for the cache -- This message was sent by Atlassian JIRA (v[14]6.4.14#64029) [1] https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074523#comment-16074523 [2] tel:1 [3] http://fs.ftp.cache.host [4] http://pom.xml [5] http://System.err [6] http://TestConnectionPool.testGetChannelFromClosedFS [7] tel:1 [8] https://issues.apache.org/jira/browse/HADOOP-1 [9] http://HADOOP-1.2.patch [10] http://HADOOP-1.3.patch [11] http://HADOOP-1.4.patch [12] http://HADOOP-1.5.patch [13] http://HADOOP-1.patch [14] tel:641464029 > New implementation of ftp and sftp filesystems > -- > > Key: HADOOP-1 > URL: https://issues.apache.org/jira/browse/HADOOP-1 > Project: Hadoop Common > Issue Type: New Feature > Components: fs >Affects Versions: 2.8.0 >Reporter: Lukas Waldmann >Assignee: Lukas Waldmann > Attachments: HADOOP-1.2.patch, HADOOP-1.3.patch, > HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.patch > > > Current implementation of FTP and SFTP filesystems have severe limitations > and performance issues when dealing with high number of files. Mine patch > solve those issues and integrate both filesystems such a way that most of the > core functionality is common for both and therefore simplifying the > maintainability. > The core features: > * Support for HTTP/SOCKS proxies > * Support for passive FTP > * Support of connection pooling - new connection is not created for every > single command but reused from the pool. > For huge number of files it shows order of magnitude performance improvement > over not pooled connections. > * Caching of directory trees. For ftp you always need to list whole directory > whenever you ask information about particular file. > Again for huge number of files it shows order of
[jira] [Commented] (HADOOP-14444) New implementation of ftp and sftp filesystems
[ https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074478#comment-16074478 ] Hongyuan Li commented on HADOOP-1: -- 1) ftp is very diffrent from sftp. sftp relies on ssh protocol, which means it can get more accurate inf than ftp. FTP allows more connections than sftp. 2) i read your code just because i just planned to implement it when i have time. > New implementation of ftp and sftp filesystems > -- > > Key: HADOOP-1 > URL: https://issues.apache.org/jira/browse/HADOOP-1 > Project: Hadoop Common > Issue Type: New Feature > Components: fs >Affects Versions: 2.8.0 >Reporter: Lukas Waldmann >Assignee: Lukas Waldmann > Attachments: HADOOP-1.2.patch, HADOOP-1.3.patch, > HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.patch > > > Current implementation of FTP and SFTP filesystems have severe limitations > and performance issues when dealing with high number of files. Mine patch > solve those issues and integrate both filesystems such a way that most of the > core functionality is common for both and therefore simplifying the > maintainability. > The core features: > * Support for HTTP/SOCKS proxies > * Support for passive FTP > * Support of connection pooling - new connection is not created for every > single command but reused from the pool. > For huge number of files it shows order of magnitude performance improvement > over not pooled connections. > * Caching of directory trees. For ftp you always need to list whole directory > whenever you ask information about particular file. > Again for huge number of files it shows order of magnitude performance > improvement over not cached connections. > * Support of keep alive (NOOP) messages to avoid connection drops > * Support for Unix style or regexp wildcard glob - useful for listing a > particular files across whole directory tree > * Support for reestablishing broken ftp data transfers - can happen > surprisingly often -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14623: - Summary: KafkaSink#init should set acks to 1,not 0 (was: KafkaSink#init should set ack to 1) > KafkaSink#init should set acks to 1,not 0 > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14623) KafkaSink#init should set ack to 1
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14623: - Status: Patch Available (was: Open) > KafkaSink#init should set ack to 1 > -- > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14623) KafkaSink#init should set ack to 1
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074394#comment-16074394 ] Hongyuan Li edited comment on HADOOP-14623 at 7/5/17 8:07 AM: -- ping [~ajisakaa] 、 [~jojochuang] for code review. was (Author: hongyuan li): ping [~ajisakaa] [~jojochuang] for code review. > KafkaSink#init should set ack to 1 > -- > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14623) KafkaSink#init should set ack to 1
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074394#comment-16074394 ] Hongyuan Li commented on HADOOP-14623: -- ping [~ajisakaa] [~jojochuang] for code review. > KafkaSink#init should set ack to 1 > -- > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14623) KafkaSink#init should set ack to 1
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14623: - Description: {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has been written to the broker at least. current code list below: {code} props.put("request.required.acks", "0"); {code} was: {{KafkaSink}}#{{init}} should set ack to 1 to make sure the message has been written to the broker at least. current code list below: {code} props.put("request.required.acks", "0"); {code} > KafkaSink#init should set ack to 1 > -- > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14623) KafkaSink#init should set ack to 1
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14623: - Component/s: tools common > KafkaSink#init should set ack to 1 > -- > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to 1 to make sure the message has been > written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14623) KafkaSink#init should set ack to 1
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14623: - Affects Version/s: 3.0.0-alpha3 > KafkaSink#init should set ack to 1 > -- > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to 1 to make sure the message has been > written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14623) KafkaSink#init should set ack to 1
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14623: - Attachment: HADOOP-14623-001.patch > KafkaSink#init should set ack to 1 > -- > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to 1 to make sure the message has been > written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-14623) KafkaSink#init should set ack to 1
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li reassigned HADOOP-14623: Assignee: Hongyuan Li > KafkaSink#init should set ack to 1 > -- > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug >Reporter: Hongyuan Li >Assignee: Hongyuan Li > > {{KafkaSink}}#{{init}} should set ack to 1 to make sure the message has been > written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14623) KafkaSink#init should set ack to 1
Hongyuan Li created HADOOP-14623: Summary: KafkaSink#init should set ack to 1 Key: HADOOP-14623 URL: https://issues.apache.org/jira/browse/HADOOP-14623 Project: Hadoop Common Issue Type: Bug Reporter: Hongyuan Li {{KafkaSink}}#{{init}} should set ack to 1 to make sure the message has been written to the broker at least. current code list below: {code} props.put("request.required.acks", "0"); {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14622) Test failure in TestFilterFileSystem and TestHarFileSystem
[ https://issues.apache.org/jira/browse/HADOOP-14622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074351#comment-16074351 ] Hongyuan Li commented on HADOOP-14622: -- {{HarFileSystem}}#{{appendFile}} has been implemented in lastest code on branch trunk > Test failure in TestFilterFileSystem and TestHarFileSystem > -- > > Key: HADOOP-14622 > URL: https://issues.apache.org/jira/browse/HADOOP-14622 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 3.0.0-alpha3 >Reporter: Jichao Zhang >Priority: Trivial > > Root Cause: > Maybe a regression issue introduced by HADOOP-14395. In HADOOP-14395, new > method appendFile was added into FileSystem, but didn't update related unit > tests in TestHarFileSystem and TestFilterFileSystem. > Errors: > 1. org.apache.hadoop.fs.TestHarFileSystem-output.txt > checkInvalidPath: har://127.0.0.1/foo.har > 2017-07-03 13:37:08,191 ERROR fs.TestHarFileSystem > (TestHarFileSystem.java:testInheritedMethodsImplemented(365)) - HarFileSystem > MUST implement protected org.apache.hadoop.fs.FSDataOutputStreamBuilder > org.apache.hadoop.fs.FileSystem.appendFile(org.apache.hadoop.fs.Path) > 2. org.apache.hadoop.fs.TestFilterFileSystem-output.txt > 2017-07-03 13:36:18,217 ERROR fs.FileSystem > (TestFilterFileSystem.java:testFilterFileSystem(161)) - FilterFileSystem MUST > implement protected org.apache.hadoop.fs.FSDataOutputStreamBuilder > org.apache.hadoop.fs.FileSystem.appendFile(org.apache.hadoop.fs.Path) > ~ -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14444) New implementation of ftp and sftp filesystems
[ https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074285#comment-16074285 ] Hongyuan Li commented on HADOOP-1: -- 1、seek cause the client disconnect and connect again, don't think it as a good idea to implment it. 2、{{AbstractFTPFileSystem}} means Abstract base for FTP like FileSystems. Sorry to interrupt you, the ftp protocol is not like sftp protocol any little.the common between the two is that they use username and password to connect to the ftp/sftp server, then doing a lot of ops.Suggest to use another name. 3、about the passwd and user code like below: {{sftpFile}} is a LsEntry instance. {code} { String longName = sftpFile.getLongname(); String[] splitLongName = longName.split(" "); String user = getUserOrGroup("user", splitLongName); String group = getUserOrGroup("group", splitLongName); } private String getUserOrGroup(String flag, String[] splitLongName) { int count = 0; int desPos = getPos(flag); for (String element : splitLongName) { if (count == desPos && !"".equals(element)) { return element; } if (!"".equals(element)) count++; } return null; } /** * generate the pos * * @param flag * @return */ private int getPos(String flag) { if ("user".equals(flag)) { return 2; } else { return 3; } } {code} 4、{{SFTPChannel}}#{{close}} should close the session as well ? {code} client.getSession().disconnect(); {code} 5、i don't know if i can be seen as a reviewer. I'm just interested in your implements, Good job. :D > New implementation of ftp and sftp filesystems > -- > > Key: HADOOP-1 > URL: https://issues.apache.org/jira/browse/HADOOP-1 > Project: Hadoop Common > Issue Type: New Feature > Components: fs >Affects Versions: 2.8.0 >Reporter: Lukas Waldmann >Assignee: Lukas Waldmann > Attachments: HADOOP-1.2.patch, HADOOP-1.3.patch, > HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.patch > > > Current implementation of FTP and SFTP filesystems have severe limitations > and performance issues when dealing with high number of files. Mine patch > solve those issues and integrate both filesystems such a way that most of the > core functionality is common for both and therefore simplifying the > maintainability. > The core features: > * Support for HTTP/SOCKS proxies > * Support for passive FTP > * Support of connection pooling - new connection is not created for every > single command but reused from the pool. > For huge number of files it shows order of magnitude performance improvement > over not pooled connections. > * Caching of directory trees. For ftp you always need to list whole directory > whenever you ask information about particular file. > Again for huge number of files it shows order of magnitude performance > improvement over not cached connections. > * Support of keep alive (NOOP) messages to avoid connection drops > * Support for Unix style or regexp wildcard glob - useful for listing a > particular files across whole directory tree > * Support for reestablishing broken ftp data transfers - can happen > surprisingly often -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14429) FTPFileSystem#getFsAction always returns FsAction.NONE
[ https://issues.apache.org/jira/browse/HADOOP-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056767#comment-16056767 ] Hongyuan Li edited comment on HADOOP-14429 at 7/4/17 11:58 AM: --- Thanks [~yzhangal], for your review and commit. Thanks [~brahmareddy] for your review. I filed HADOOP-14559 as an improvement to close the ftp. was (Author: hongyuan li): Thanks [~yzhangal], for your review and commit. Thanks [~brahmareddy] for your review. I file HADOOP-14559 as an improvement to close the ftp. > FTPFileSystem#getFsAction always returns FsAction.NONE > --- > > Key: HADOOP-14429 > URL: https://issues.apache.org/jira/browse/HADOOP-14429 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 3.0.0-alpha2 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Fix For: 2.9.0, 3.0.0-alpha4 > > Attachments: HADOOP-14429-001.patch, HADOOP-14429-002.patch, > HADOOP-14429-003.patch, HADOOP-14429-004.patch, HADOOP-14429-005.patch, > HADOOP-14429-006.patch, HADOOP-14429-007.patch, HADOOP-14429-008.patch, > HADOOP-14429-009.patch > > > > {code} > private FsAction getFsAction(int accessGroup, FTPFile ftpFile) { > FsAction action = FsAction.NONE; > if (ftpFile.hasPermission(accessGroup, FTPFile.READ_PERMISSION)) { > action.or(FsAction.READ); > } > if (ftpFile.hasPermission(accessGroup, FTPFile.WRITE_PERMISSION)) { > action.or(FsAction.WRITE); > } > if (ftpFile.hasPermission(accessGroup, FTPFile.EXECUTE_PERMISSION)) { > action.or(FsAction.EXECUTE); > } > return action; > } > {code} > from code above, we can see that the getFsAction method doesnot modify the > action generated by FsAction action = FsAction.NONE,which means it return > FsAction.NONE all the time; -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14469) FTPFileSystem#listStatus get currentPath and parentPath at the same time, causing recursively list action endless
[ https://issues.apache.org/jira/browse/HADOOP-14469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongyuan Li updated HADOOP-14469: - Affects Version/s: 3.0.0-alpha2 > FTPFileSystem#listStatus get currentPath and parentPath at the same time, > causing recursively list action endless > - > > Key: HADOOP-14469 > URL: https://issues.apache.org/jira/browse/HADOOP-14469 > Project: Hadoop Common > Issue Type: Bug > Components: fs, tools/distcp >Affects Versions: 2.6.0, 3.0.0-alpha2 > Environment: ftp build by windows7 + Serv-U_64 12.1.0.8 > code runs any os >Reporter: Hongyuan Li >Assignee: Hongyuan Li >Priority: Critical > Attachments: HADOOP-14469-001.patch, HADOOP-14469-002.patch, > HADOOP-14469-003.patch, HADOOP-14469-004.patch, HADOOP-14469-005.patch, > HADOOP-14469-006.patch, HADOOP-14469-007.patch, HADOOP-14469-008.patch > > > for some ftpsystems, liststatus method will return new Path(".") and new > Path(".."), thus causing list op looping.for example, Serv-U > We can see the logic in code below: > {code} > private FileStatus[] listStatus(FTPClient client, Path file) > throws IOException { > …… > FileStatus[] fileStats = new FileStatus[ftpFiles.length]; > for (int i = 0; i < ftpFiles.length; i++) { > fileStats[i] = getFileStatus(ftpFiles[i], absolute); > } > return fileStats; > } > {code} > {code} > public void test() throws Exception{ > FTPFileSystem ftpFileSystem = new FTPFileSystem(); > ftpFileSystem.initialize(new > Path("ftp://test:123456@192.168.44.1/;).toUri(), > new Configuration()); > FileStatus[] fileStatus = ftpFileSystem.listStatus(new Path("/new")); > for(FileStatus fileStatus1 : fileStatus) > System.out.println(fileStatus1); > } > {code} > using test code below, the test results list below > {code} > FileStatus{path=ftp://test:123456@192.168.44.1/new; isDirectory=true; > modification_time=149671698; access_time=0; owner=user; group=group; > permission=-; isSymlink=false} > FileStatus{path=ftp://test:123456@192.168.44.1/; isDirectory=true; > modification_time=149671698; access_time=0; owner=user; group=group; > permission=-; isSymlink=false} > FileStatus{path=ftp://test:123456@192.168.44.1/new/hadoop; isDirectory=true; > modification_time=149671698; access_time=0; owner=user; group=group; > permission=-; isSymlink=false} > FileStatus{path=ftp://test:123456@192.168.44.1/new/HADOOP-14431-002.patch; > isDirectory=false; length=2036; replication=1; blocksize=4096; > modification_time=149579778; access_time=0; owner=user; group=group; > permission=-; isSymlink=false} > FileStatus{path=ftp://test:123456@192.168.44.1/new/HADOOP-14486-001.patch; > isDirectory=false; length=1322; replication=1; blocksize=4096; > modification_time=149671698; access_time=0; owner=user; group=group; > permission=-; isSymlink=false} > FileStatus{path=ftp://test:123456@192.168.44.1/new/hadoop-main; > isDirectory=true; modification_time=149579712; access_time=0; owner=user; > group=group; permission=-; isSymlink=false} > {code} > In results above, {{FileStatus{path=ftp://test:123456@192.168.44.1/new; ……}} > is obviously the current Path, and > {{FileStatus{path=ftp://test:123456@192.168.44.1/;……}} is obviously the > parent Path. > So, if we want to walk the directory recursively, it will stuck. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org