from:"Hongyuan Li \(JIRA\)"

[jira] [Commented] (HADOOP-14559) FTPFileSystem instance in TestFTPFileSystem should be created before tests and closed after tests

2017-08-23 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139464#comment-16139464
 ] 

Hongyuan Li commented on HADOOP-14559:
--

Hi, [~ste...@apache.org], [~yzhangal], sorry to interrupt you, should this 
issue be resolved or closed ? 

> FTPFileSystem instance in TestFTPFileSystem should be created before tests 
> and closed after tests 
> --
>
> Key: HADOOP-14559
> URL: https://issues.apache.org/jira/browse/HADOOP-14559
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, test
>Affects Versions: 3.0.0-alpha2
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
>Priority: Minor
> Attachments: HADOOP-14559-001.patch
>
>
> The used FTPFileSystem in TestFTPFileSystem should be closed in each test 
> case as an improvement.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-13743) error message in AzureNativeFileSystemStore.connectUsingAnonymousCredentials has too many spaces

2017-08-09 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119883#comment-16119883
 ] 

Hongyuan Li edited comment on HADOOP-13743 at 8/9/17 1:49 PM:
--

dig into the log4j source code, the message format is implemented by the 
stringbuilder to format error messages internally.


was (Author: hongyuan li):
dig into the log4j source code, the message format uses the stringbuilder to 
format error messages internally.

> error message in AzureNativeFileSystemStore.connectUsingAnonymousCredentials 
> has too many spaces
> 
>
> Key: HADOOP-13743
> URL: https://issues.apache.org/jira/browse/HADOOP-13743
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Trivial
> Attachments: HADOOP-13743-branch-2-001.patch, 
> HADOOP-14373-branch-2-002.patch
>
>
> The error message on a failed hadoop fs -ls command against an unauthed azure 
> container has an extra space in {{" them  in"}}
> {code}
> ls: org.apache.hadoop.fs.azure.AzureException: Unable to access container 
> demo in account example.blob.core.windows.net using anonymous credentials, 
> and no credentials found for them  in the configuration.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13743) error message in AzureNativeFileSystemStore.connectUsingAnonymousCredentials has too many spaces

2017-08-09 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119883#comment-16119883
 ] 

Hongyuan Li commented on HADOOP-13743:
--

dig into the log4j source code, the message format uses the stringbuilder to 
format error messages internally.

> error message in AzureNativeFileSystemStore.connectUsingAnonymousCredentials 
> has too many spaces
> 
>
> Key: HADOOP-13743
> URL: https://issues.apache.org/jira/browse/HADOOP-13743
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Trivial
> Attachments: HADOOP-13743-branch-2-001.patch, 
> HADOOP-14373-branch-2-002.patch
>
>
> The error message on a failed hadoop fs -ls command against an unauthed azure 
> container has an extra space in {{" them  in"}}
> {code}
> ls: org.apache.hadoop.fs.azure.AzureException: Unable to access container 
> demo in account example.blob.core.windows.net using anonymous credentials, 
> and no credentials found for them  in the configuration.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14623) fixed some bugs in KafkaSink

2017-07-25 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099702#comment-16099702
 ] 

Hongyuan Li commented on HADOOP-14623:
--

Hi, [~bharatviswa], you are right. It does not matter. The value of param 
{{key.serializer}}  in the origin code is {{ByteArraySerializer}}, which is not 
the correct corespondent  serializer class of type Integer, so i changed the 
type to {{byte[]}} although the key has not been used.
If this would confuse you, i will remove the modification and resubmit a new 
patch.

Thanks anyway for your code review.

> fixed some bugs in KafkaSink 
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch, 
> HADOOP-14623-003.patch, HADOOP-14623-004.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}
> *Update*
> find another bug about this class, {{key.serializer}} used 
> {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the 
> key properties of Producer is Integer, codes list below:
> {code}
> props.put("key.serializer",
> "org.apache.kafka.common.serialization.ByteArraySerializer");
> …
>  producer = new KafkaProducer(props);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14623) fixed some bugs in KafkaSink

2017-07-23 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097550#comment-16097550
 ] 

Hongyuan Li edited comment on HADOOP-14623 at 7/23/17 9:32 AM:
---

Hi, [~bharatviswa]. i resubmit the patch 003 according to the discussion above 
with modification below
1、set the {{acks}} to 1
2、Use  {{ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG}} instead of 
{{key.serializer}}.
3、set the key type from {{Integer}} to {{byte[]}}

*Update* 
patch 004 is the latest.


was (Author: hongyuan li):
Hi, [~bharatviswa]. i resubmit the patch 003 according to the discussion above 
with modification below
1、set the {{acks}} to 1
2、Use  {{ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG}} instead of 
{{key.serializer}}.
3、set the key type from {{Integer}} to {{byte[]}}

> fixed some bugs in KafkaSink 
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch, 
> HADOOP-14623-003.patch, HADOOP-14623-004.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}
> *Update*
> find another bug about this class, {{key.serializer}} used 
> {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the 
> key properties of Producer is Integer, codes list below:
> {code}
> props.put("key.serializer",
> "org.apache.kafka.common.serialization.ByteArraySerializer");
> …
>  producer = new KafkaProducer(props);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14559) FTPFileSystem instance in TestFTPFileSystem should be created before tests and closed after tests

2017-07-23 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097566#comment-16097566
 ] 

Hongyuan Li commented on HADOOP-14559:
--

Hi, [~yzhangal]. Sorry to call  you again, can this jira be resolve as you 
suggested ?

> FTPFileSystem instance in TestFTPFileSystem should be created before tests 
> and closed after tests 
> --
>
> Key: HADOOP-14559
> URL: https://issues.apache.org/jira/browse/HADOOP-14559
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, test
>Affects Versions: 3.0.0-alpha2
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
>Priority: Minor
> Attachments: HADOOP-14559-001.patch
>
>
> The used FTPFileSystem in TestFTPFileSystem should be closed in each test 
> case as an improvement.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14623) fixed some bugs in KafkaSink

2017-07-23 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14623:
-
Attachment: HADOOP-14623-004.patch

modify compile error and check style warn

> fixed some bugs in KafkaSink 
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch, 
> HADOOP-14623-003.patch, HADOOP-14623-004.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}
> *Update*
> find another bug about this class, {{key.serializer}} used 
> {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the 
> key properties of Producer is Integer, codes list below:
> {code}
> props.put("key.serializer",
> "org.apache.kafka.common.serialization.ByteArraySerializer");
> …
>  producer = new KafkaProducer(props);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14623) fixed some bugs in KafkaSink

2017-07-23 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097550#comment-16097550
 ] 

Hongyuan Li commented on HADOOP-14623:
--

Hi, [~bharatviswa]. i resubmit the patch 003 according to the discussion above 
with modification below
1、set the {{acks}} to 1
2、Use  {{ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG}} instead of 
{{key.serializer}}.
3、set the key type from {{Integer}} to {{byte[]}}

> fixed some bugs in KafkaSink 
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch, 
> HADOOP-14623-003.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}
> *Update*
> find another bug about this class, {{key.serializer}} used 
> {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the 
> key properties of Producer is Integer, codes list below:
> {code}
> props.put("key.serializer",
> "org.apache.kafka.common.serialization.ByteArraySerializer");
> …
>  producer = new KafkaProducer(props);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14623) fixed some bugs in KafkaSink

2017-07-23 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14623:
-
Attachment: HADOOP-14623-003.patch

> fixed some bugs in KafkaSink 
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch, 
> HADOOP-14623-003.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}
> *Update*
> find another bug about this class, {{key.serializer}} used 
> {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the 
> key properties of Producer is Integer, codes list below:
> {code}
> props.put("key.serializer",
> "org.apache.kafka.common.serialization.ByteArraySerializer");
> …
>  producer = new KafkaProducer(props);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14623) fixed some bugs in KafkaSink

2017-07-21 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095937#comment-16095937
 ] 

Hongyuan Li edited comment on HADOOP-14623 at 7/21/17 8:54 AM:
---

Hi, [~bharatviswa], find that the kafka client must have the same version of 
kafka server, or the new producer api will not functions well.The old 
{{kafka.javaapi.producer}} Producer functions well, but it will be removed from 
the future kafka version.
The following is the stack trace when use kafka client 0.10.0 to write into 
kaffa server 0.9.0.
{code}
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'topic_metadata': Error reading array of size 1702065152, only 29 bytes 
available
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:73)
at 
org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:236)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135)
at java.lang.Thread.run(Thread.java:745)
{code} 

Any good idea? 

*Update*
use kafka client 0.9.x can write to kafka server whose version is 0.9.x-0.10.x.

*Update*
seems that the key type can be set to byte[], instead of using Integer.


was (Author: hongyuan li):
Hi, [~bharatviswa], find that the kafka client must have the same version of 
kafka server, or the new producer api will not functions well.The old 
{{kafka.javaapi.producer}} Producer functions well, but it will be removed from 
the future kafka version.
The following is the stack trace when use kafka client 0.10.0 to write into 
kaffa server 0.9.0.
{code}
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'topic_metadata': Error reading array of size 1702065152, only 29 bytes 
available
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:73)
at 
org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:236)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135)
at java.lang.Thread.run(Thread.java:745)
{code} 

Any good idea? 

*Update*
use kafka client 0.9.x can write to kafka server whose version is 0.9.x-0.10.x.

> fixed some bugs in KafkaSink 
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}
> *Update*
> find another bug about this class, {{key.serializer}} used 
> {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the 
> key properties of Producer is Integer, codes list below:
> {code}
> props.put("key.serializer",
> "org.apache.kafka.common.serialization.ByteArraySerializer");
> …
>  producer = new KafkaProducer(props);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14623) fixed some bugs in KafkaSink

2017-07-21 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095937#comment-16095937
 ] 

Hongyuan Li edited comment on HADOOP-14623 at 7/21/17 8:50 AM:
---

Hi, [~bharatviswa], find that the kafka client must have the same version of 
kafka server, or the new producer api will not functions well.The old 
{{kafka.javaapi.producer}} Producer functions well, but it will be removed from 
the future kafka version.
The following is the stack trace when use kafka client 0.10.0 to write into 
kaffa server 0.9.0.
{code}
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'topic_metadata': Error reading array of size 1702065152, only 29 bytes 
available
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:73)
at 
org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:236)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135)
at java.lang.Thread.run(Thread.java:745)
{code} 

Any good idea? 

*Update*
use kafka client 0.9.x can write to kafka server whose version is 0.9.x-0.10.x.


was (Author: hongyuan li):
Hi, [~bharatviswa], find that the kafka client must have the same version of 
kafka server, or the new producer api will not functions well.The old 
{{kafka.javaapi.producer}} Producer functions well, but it will be removed from 
the future kafka version.
The following is the stack trace when use kafka client 0.10.0 to write into 
kaffa server 0.9.0.
{code}
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'topic_metadata': Error reading array of size 1702065152, only 29 bytes 
available
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:73)
at 
org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:236)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135)
at java.lang.Thread.run(Thread.java:745)
{code} 

Any good idea? 

> fixed some bugs in KafkaSink 
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}
> *Update*
> find another bug about this class, {{key.serializer}} used 
> {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the 
> key properties of Producer is Integer, codes list below:
> {code}
> props.put("key.serializer",
> "org.apache.kafka.common.serialization.ByteArraySerializer");
> …
>  producer = new KafkaProducer(props);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14623) fixed some bugs in KafkaSink

2017-07-21 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095937#comment-16095937
 ] 

Hongyuan Li commented on HADOOP-14623:
--

Hi, [~bharatviswa], find that the kafka client must have the same version of 
kafka server, or the new producer api will not functions well.The old 
{{kafka.javaapi.producer}} Producer functions well, but it will be removed from 
the future kafka version.
The following is the stack trace when use kafka client 0.10.0 to write into 
kaffa server 0.9.0.
{code}
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'topic_metadata': Error reading array of size 1702065152, only 29 bytes 
available
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:73)
at 
org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:236)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135)
at java.lang.Thread.run(Thread.java:745)
{code} 

Any good idea? 

> fixed some bugs in KafkaSink 
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}
> *Update*
> find another bug about this class, {{key.serializer}} used 
> {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the 
> key properties of Producer is Integer, codes list below:
> {code}
> props.put("key.serializer",
> "org.apache.kafka.common.serialization.ByteArraySerializer");
> …
>  producer = new KafkaProducer(props);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14623) fixed some bugs in KafkaSink

2017-07-20 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095632#comment-16095632
 ] 

Hongyuan Li commented on HADOOP-14623:
--

Hi [~bharatviswa]. Thanks for your kindly comment.In my opinion, the former 
code use integer key is just to partition the data, which i don't think it is 
necessary, however, the {{key.serializer}} must be set correctly anyway. Thanks.

> fixed some bugs in KafkaSink 
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}
> *Update*
> find another bug about this class, {{key.serializer}} used 
> {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the 
> key properties of Producer is Integer, codes list below:
> {code}
> props.put("key.serializer",
> "org.apache.kafka.common.serialization.ByteArraySerializer");
> …
>  producer = new KafkaProducer(props);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14623) fixed some bugs in KafkaSink

2017-07-10 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081509#comment-16081509
 ] 

Hongyuan Li commented on HADOOP-14623:
--

Hi [~aw], could you give me a code review?

> fixed some bugs in KafkaSink 
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}
> *Update*
> find another bug about this class, {{key.serializer}} used 
> {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the 
> key properties of Producer is Integer, codes list below:
> {code}
> props.put("key.serializer",
> "org.apache.kafka.common.serialization.ByteArraySerializer");
> …
>  producer = new KafkaProducer(props);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14623) fixed some bugs in KafkaSink

2017-07-08 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076628#comment-16076628
 ] 

Hongyuan Li edited comment on HADOOP-14623 at 7/8/17 11:55 AM:
---

futuremore, flush method is to confirm that data has been written.

*Update/Crorrection*
sorry, it is the {{putMetrics}} method.
in {{KafkaSink}}#{{putMetrics}} , code lists below， which makes me have a 
different opinion:
{code}
……
Future future = producer.send(data);
jsonLines.setLength(0);
try {
  future.get(); // which means synchronously
} catch (InterruptedException e) {
  throw new MetricsException("Error sending data", e);
} catch (ExecutionException e) {
  throw new MetricsException("Error sending data", e);
}

……
{code}


was (Author: hongyuan li):
futuremore, flush method is to confirm that data has been written.

*Update/Crorrection*
sorry, it is the {{putMetrics}} method.
in {{KafkaSink}}#{{putMetrics}} , code lists below， which makes me have a 
different opinion:
{code}
……
Future future = producer.send(data);
jsonLines.setLength(0);
try {
  future.get();
} catch (InterruptedException e) {
  throw new MetricsException("Error sending data", e);
} catch (ExecutionException e) {
  throw new MetricsException("Error sending data", e);
}

……
{code}

> fixed some bugs in KafkaSink 
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}
> *Update*
> find another bug about this class, {{key.serializer}} used 
> {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the 
> key properties of Producer is Integer, codes list below:
> {code}
> props.put("key.serializer",
> "org.apache.kafka.common.serialization.ByteArraySerializer");
> …
>  producer = new KafkaProducer(props);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-10949) metrics2 sink plugin for Apache Kafka

2017-07-08 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079110#comment-16079110
 ] 

Hongyuan Li commented on HADOOP-10949:
--

i filed HADOOP-14623 to update this  module.

> metrics2 sink plugin for Apache Kafka
> -
>
> Key: HADOOP-10949
> URL: https://issues.apache.org/jira/browse/HADOOP-10949
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: metrics
>Reporter: Babak Behzad
>Assignee: Babak Behzad
> Fix For: 3.0.0-alpha1
>
> Attachments: HADOOP-10949-1.patch, HADOOP-10949-2.patch, 
> HADOOP-10949-4.patch, HADOOP-10949-5.patch, HADOOP-10949-6-1.patch, 
> HADOOP-10949-6.patch, HADOOP-10949.patch, HADOOP-10949.patch, 
> HADOOP-10949.patch, HADOOP-10949.patch, HADOOP-10949.patch, 
> HADOOP-10949.patch, HADOOP-10949.patch, HADOOP-10949.patch, 
> HADOOP-10949.patch, HADOOP-10949.patch, HADOOP-10949.patch
>
>
> Write a metrics2 sink plugin for Hadoop to send metrics directly to Apache 
> Kafka in addition to the current, Graphite 
> ([Hadoop-9704|https://issues.apache.org/jira/browse/HADOOP-9704]), Ganglia 
> and File sinks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14623) fixed some bugs in KafkaSink

2017-07-08 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079108#comment-16079108
 ] 

Hongyuan Li commented on HADOOP-14623:
--

none of the test failure is related with this patch. None checkstyle warning 
and findbug warning. 

> fixed some bugs in KafkaSink 
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}
> *Update*
> find another bug about this class, {{key.serializer}} used 
> {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the 
> key properties of Producer is Integer, codes list below:
> {code}
> props.put("key.serializer",
> "org.apache.kafka.common.serialization.ByteArraySerializer");
> …
>  producer = new KafkaProducer(props);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14632) add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.

2017-07-08 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079075#comment-16079075
 ] 

Hongyuan Li commented on HADOOP-14632:
--

none of the test failure is related with this patch. None  checkstyle warning 
and findbug warning. 
Ping [~ste...@apache.org]、[~brahmareddy] for code review. Performance 
comparision will be submitted soon.


> add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can 
> improve the  transfer speed.
> --
>
> Key: HADOOP-14632
> URL: https://issues.apache.org/jira/browse/HADOOP-14632
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha1
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14632-001.patch, HADOOP-14632-002.patch, 
> HADOOP-14632-003.patch, HADOOP-14632-004.patch, HADOOP-14632-005.patch
>
>
> add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which 
> can improve the  transfer speed.
> Test example shows transfer  performance has improved a lot.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14623) fixed some bugs in KafkaSink

2017-07-08 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079033#comment-16079033
 ] 

Hongyuan Li edited comment on HADOOP-14623 at 7/8/17 9:20 AM:
--

I highly recommend Four steps:
1、should use {{acks}} = {{1}}.
2、add  {{https://repository.apache.org/content/repositories/releases}} repo,  
the {{apache snapshot rep}} doesnot have a higher version kafka module, the 
version of which is less than {{0.8.2}}
3、update kafka client version to at least {{0.10.1.0}},  which has a 
IntegerSerializer class If kafka sink want to generate a kafka producer with 
the the type of key being Integer.
4、 Use ProducerConfig.XXX instead of using string value  directly. For example, 
use 
{{ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG}} instead of {{key.serializer}}

 Thanks for any advice. latest patch has implemeted above.


was (Author: hongyuan li):
I highly recommend Four steps:
1、should use {{acks}} = {{1}}.
2、add  {{https://repository.apache.org/content/repositories/releases}} repo,  
the {{apache snapshot rep}} doesnot have a higher version kafka module, the 
version of which is less than {{0.8.2}}
3、update kafka client version to at least {{0.10.1.0}},  which has a 
IntegerSerializer class If kafka sink want to generate a kafka producer with 
the the type of key being Integer.
4、 Use ProducerConfig.XXX instead of using string value  directly. For example, 
use 
{{ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG}} instead of {{key.serializer}}


 Thanks for any advice.

> fixed some bugs in KafkaSink 
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}
> *Update*
> find another bug about this class, {{key.serializer}} used 
> {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the 
> key properties of Producer is Integer, codes list below:
> {code}
> props.put("key.serializer",
> "org.apache.kafka.common.serialization.ByteArraySerializer");
> …
>  producer = new KafkaProducer(props);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14623) fixed some bugs in KafkaSink

2017-07-08 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14623:
-
Attachment: HADOOP-14623-002.patch

> fixed some bugs in KafkaSink 
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch, HADOOP-14623-002.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}
> *Update*
> find another bug about this class, {{key.serializer}} used 
> {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the 
> key properties of Producer is Integer, codes list below:
> {code}
> props.put("key.serializer",
> "org.apache.kafka.common.serialization.ByteArraySerializer");
> …
>  producer = new KafkaProducer(props);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14623) fixed some bugs in KafkaSink

2017-07-08 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079033#comment-16079033
 ] 

Hongyuan Li edited comment on HADOOP-14623 at 7/8/17 9:15 AM:
--

I highly recommend Four steps:
1、should use {{acks}} = {{1}}.
2、add  {{https://repository.apache.org/content/repositories/releases}} repo,  
the {{apache snapshot rep}} doesnot have a higher version kafka module, the 
version of which is less than {{0.8.2}}
3、update kafka client version to at least {{0.10.1.0}},  which has a 
IntegerSerializer class If kafka sink want to generate a kafka producer with 
the the type of key being Integer.
4、 Use ProducerConfig.XXX instead of using string value  directly. For example, 
use 
{{ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG}} instead of {{key.serializer}}


 Thanks for any advice.


was (Author: hongyuan li):
I highly recommend Five steps:
1、should use {{acks}} = {{1}}.
2、add  {{https://repository.apache.org/content/repositories/releases}} repo,  
the {{apache snapshot rep}} doesnot have a higher version kafka module, the 
version of which is less than {{0.8.2}}
3、update kafka client version to at least {{0.10.1.0}},  which has a 
IntegerSerializer class If kafka sink want to generate a kafka producer with 
the the type of key being Integer.
4、add {{callback}} when using new {{KafkaProducer}}#{{send}}
5、 Use ProducerConfig.XXX instead of using string value  directly. For example, 
use 
{{ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG}} instead of {{key.serializer}}


 Thanks for any advice.

> fixed some bugs in KafkaSink 
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}
> *Update*
> find another bug about this class, {{key.serializer}} used 
> {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the 
> key properties of Producer is Integer, codes list below:
> {code}
> props.put("key.serializer",
> "org.apache.kafka.common.serialization.ByteArraySerializer");
> …
>  producer = new KafkaProducer(props);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14623) fixed some bugs in KafkaSink

2017-07-08 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079033#comment-16079033
 ] 

Hongyuan Li edited comment on HADOOP-14623 at 7/8/17 8:51 AM:
--

I highly recommend Five steps:
1、should use {{acks}} = {{1}}.
2、add  {{https://repository.apache.org/content/repositories/releases}} repo,  
the {{apache snapshot rep}} doesnot have a higher version kafka module, the 
version of which is less than {{0.8.2}}
3、update kafka client version to at least {{0.10.1.0}},  which has a 
IntegerSerializer class If kafka sink want to generate a kafka producer with 
the the type of key being Integer.
4、add {{callback}} when using new {{KafkaProducer}}#{{send}}
5、 Use ProducerConfig.XXX instead of using string value  directly. For example, 
use 
{{ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG}} instead of {{key.serializer}}


 Thanks for any advice.


was (Author: hongyuan li):
I highly recommend Four points:
1、should use {{acks}} = {{1}}.
2、add  {{https://repository.apache.org/content/repositories/releases}} repo,  
the {{apache snapshot rep}} doesnot have a higher version kafka module, the 
version of which is less than {{0.8.2}}
3、update kafka client version to at least {{0.10.1.0}},  which has a 
IntegerSerializer class If kafka sink want to generate a kafka producer with 
the the type of key being Integer.
4、add {{callback}} when using new {{KafkaProducer}}#{{send}}


 Thanks for any advice.

> fixed some bugs in KafkaSink 
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}
> *Update*
> find another bug about this class, {{key.serializer}} used 
> {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the 
> key properties of Producer is Integer, codes list below:
> {code}
> props.put("key.serializer",
> "org.apache.kafka.common.serialization.ByteArraySerializer");
> …
>  producer = new KafkaProducer(props);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14623) fixed some bugs in KafkaSink

2017-07-08 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079033#comment-16079033
 ] 

Hongyuan Li edited comment on HADOOP-14623 at 7/8/17 8:46 AM:
--

I highly recommend Four points:
1、should use {{acks}} = {{1}}.
2、add  {{https://repository.apache.org/content/repositories/releases}} repo,  
the {{apache snapshot rep}} doesnot have a higher version kafka module, the 
version of which is less than {{0.8.2}}
3、update kafka client version to at least {{0.10.1.0}},  which has a 
IntegerSerializer class If kafka sink want to generate a kafka producer with 
the the type of key being Integer.
4、add {{callback}} when using new {{KafkaProducer}}#{{send}}


 Thanks for any advice.


was (Author: hongyuan li):
I highly recommend Two points:
1、should use {{acks}} = {{1}}.
2、add  {{https://repository.apache.org/content/repositories/releases}} repo,  
the {{apache snapshot rep}} doesnot have a higher version kafka module, the 
version of which is less than {{0.8.2}}
2、update kafka client version to at least {{0.10.1.0}},  which has a 
IntegerSerializer class If kafka sink want to generate a kafka producer with 
the the type of key being Integer.
3、add {{callback}} when using new {{KafkaProducer}}#{{send}}


 Thanks for any advice.

> fixed some bugs in KafkaSink 
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}
> *Update*
> find another bug about this class, {{key.serializer}} used 
> {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the 
> key properties of Producer is Integer, codes list below:
> {code}
> props.put("key.serializer",
> "org.apache.kafka.common.serialization.ByteArraySerializer");
> …
>  producer = new KafkaProducer(props);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14632) add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.

2017-07-08 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14632:
-
Attachment: HADOOP-14632-005.patch

fix junit test 

> add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can 
> improve the  transfer speed.
> --
>
> Key: HADOOP-14632
> URL: https://issues.apache.org/jira/browse/HADOOP-14632
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha1
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14632-001.patch, HADOOP-14632-002.patch, 
> HADOOP-14632-003.patch, HADOOP-14632-004.patch, HADOOP-14632-005.patch
>
>
> add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which 
> can improve the  transfer speed.
> Test example shows transfer  performance has improved a lot.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14623) fixed some bugs in KafkaSink

2017-07-08 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079033#comment-16079033
 ] 

Hongyuan Li edited comment on HADOOP-14623 at 7/8/17 8:24 AM:
--

I highly recommend Two points:
1、should use {{acks}} = {{1}}.
2、add  {{https://repository.apache.org/content/repositories/releases}} repo,  
the {{apache snapshot rep}} doesnot have a higher version kafka module, the 
version of which is less than {{0.8.2}}
2、update kafka client version to at least {{0.10.1.0}},  which has a 
IntegerSerializer class If kafka sink want to generate a kafka producer with 
the the type of key being Integer.
3、add {{callback}} when using new {{KafkaProducer}}#{{send}}


 Thanks for any advice.


was (Author: hongyuan li):
I highly recommend Two points:
1、should use {{acks}} = {{1}}.
2、update kafka client version to at least {{0.10.1.0}},  which has a 
IntegerSerializer class If kafka sink want to generate a kafka producer with 
the the type of key being Integer, but what blocks it that the {{apache 
snapshot rep}} doesnot have a kafka, the version of which is higher than 
{{0.8.2}}
3、add {{callback}} when using new {{KafkaProducer}}#{{send}}


 Thanks for any advice.

> fixed some bugs in KafkaSink 
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}
> *Update*
> find another bug about this class, {{key.serializer}} used 
> {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the 
> key properties of Producer is Integer, codes list below:
> {code}
> props.put("key.serializer",
> "org.apache.kafka.common.serialization.ByteArraySerializer");
> …
>  producer = new KafkaProducer(props);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14623) fixed some bugs in KafkaSink

2017-07-08 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079033#comment-16079033
 ] 

Hongyuan Li edited comment on HADOOP-14623 at 7/8/17 8:20 AM:
--

I highly recommend Two points:
1、should use {{acks}} = {{1}}.
2、update kafka client version to at least {{0.10.1.0}},  which has a 
IntegerSerializer class If kafka sink want to generate a kafka producer with 
the the type of key being Integer, but what blocks it that the {{apache 
snapshot rep}} doesnot have a kafka, the version of which is higher than 
{{0.8.2}}
3、add {{callback}} when using new {{KafkaProducer}}#{{send}}


 Thanks for any advice.


was (Author: hongyuan li):
I highly recommend Two points:
1、should use {{acks}} = {{1}}.
2、update kafka client version to at least {{0.10.1.0}},  which has a 
IntegerSerializer class If kafka sink want to generate a kafka producer with 
the the type of key being Integer.
The last patch will fix the two, If you don't think so, close the jira. Thanks 
for any advice.
3、add {{callback}} when using new {{KafkaProducer}}#{{send}}

> fixed some bugs in KafkaSink 
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}
> *Update*
> find another bug about this class, {{key.serializer}} used 
> {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the 
> key properties of Producer is Integer, codes list below:
> {code}
> props.put("key.serializer",
> "org.apache.kafka.common.serialization.ByteArraySerializer");
> …
>  producer = new KafkaProducer(props);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14623) fixed some bugs in KafkaSink

2017-07-08 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079033#comment-16079033
 ] 

Hongyuan Li edited comment on HADOOP-14623 at 7/8/17 8:11 AM:
--

I highly recommend Two points:
1、should use {{acks}} = {{1}}.
2、update kafka client version to at least {{0.10.1.0}},  which has a 
IntegerSerializer class If kafka sink want to generate a kafka producer with 
the the type of key being Integer.
The last patch will fix the two, If you don't think so, close the jira. Thanks 
for any advice.
3、add {{callback}} when using new {{KafkaProducer}}#{{send}}


was (Author: hongyuan li):
I highly recommend Two points:
1、should use {{acks}} = {{1}}.
2、update kafka client version to {{0.10.1.0}},  which has a IntegerSerializer 
class If kafka sink want to generate a kafka producer with the the type of key 
being Integer.
The last patch will fix the two, If you don't think so, close the jira. Thanks 
for any advice.

> fixed some bugs in KafkaSink 
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}
> *Update*
> find another bug about this class, {{key.serializer}} used 
> {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the 
> key properties of Producer is Integer, codes list below:
> {code}
> props.put("key.serializer",
> "org.apache.kafka.common.serialization.ByteArraySerializer");
> …
>  producer = new KafkaProducer(props);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14623) fixed some bugs in KafkaSink

2017-07-08 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079033#comment-16079033
 ] 

Hongyuan Li edited comment on HADOOP-14623 at 7/8/17 8:06 AM:
--

I highly recommend Two points:
1、should use {{acks}} = {{1}}.
2、update kafka client version to {{0.10.1.0}},  which has a IntegerSerializer 
class If kafka sink want to generate a kafka producer with the the type of key 
being Integer.
The last patch will fix the two, If you don't think so, close the jira. Thanks 
for any advice.


was (Author: hongyuan li):
I highly recommend Two points:
1、should use acks = 1.
2、update kafka client version to 0.10.1. which has a IntegerSerializer class If 
kafka sink want to generate a kafka producer with the the type of key being 
Integer.
The last patch will fix the two, If you don't think so, close the jira.Thanks.

> fixed some bugs in KafkaSink 
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}
> *Update*
> find another bug about this class, {{key.serializer}} used 
> {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the 
> key properties of Producer is Integer, codes list below:
> {code}
> props.put("key.serializer",
> "org.apache.kafka.common.serialization.ByteArraySerializer");
> …
>  producer = new KafkaProducer(props);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14623) fixed some bugs in KafkaSink

2017-07-08 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079033#comment-16079033
 ] 

Hongyuan Li commented on HADOOP-14623:
--

I highly recommend Two points:
1、should use acks = 1.
2、update kafka client version to 0.10.1. which has a IntegerSerializer class If 
kafka sink want to generate a kafka producer with the the type of key being 
Integer.
The last patch will fix the two, If you don't think so, close the jira.Thanks.

> fixed some bugs in KafkaSink 
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}
> *Update*
> find another bug about this class, {{key.serializer}} used 
> {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the 
> key properties of Producer is Integer, codes list below:
> {code}
> props.put("key.serializer",
> "org.apache.kafka.common.serialization.ByteArraySerializer");
> …
>  producer = new KafkaProducer(props);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14623) fixed some bugs in KafkaSink

2017-07-08 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14623:
-
Summary: fixed some bugs in KafkaSink   (was: KafkaSink#init should set 
acks to 1,not 0 and key.serializer is wrong however key not used)

> fixed some bugs in KafkaSink 
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}
> *Update*
> find another bug about this class, {{key.serializer}} used 
> {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the 
> key properties of Producer is Integer, codes list below:
> {code}
> props.put("key.serializer",
> "org.apache.kafka.common.serialization.ByteArraySerializer");
> …
>  producer = new KafkaProducer(props);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0 and key.serializer is wrong however key not used

2017-07-08 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14623:
-
Description: 
{{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has been 
written to the broker at least.

current code list below:

{code}
  
props.put("request.required.acks", "0");

{code}

*Update*

find another bug about this class, {{key.serializer}} used 
{{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the key 
properties of Producer is Integer, codes list below:
{code}
props.put("key.serializer",
"org.apache.kafka.common.serialization.ByteArraySerializer");
…
 producer = new KafkaProducer(props);
{code}

  was:
{{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has been 
written to the broker at least.

current code list below:

{code}
  
props.put("request.required.acks", "0");

{code}

*Update*

find another bug about this class, key.serializer used 
{{org.apache.kafka.common.serialization.ByteArraySerializer}}


> KafkaSink#init should set acks to 1,not 0 and key.serializer is wrong however 
> key not used
> --
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}
> *Update*
> find another bug about this class, {{key.serializer}} used 
> {{org.apache.kafka.common.serialization.ByteArraySerializer}}, however, the 
> key properties of Producer is Integer, codes list below:
> {code}
> props.put("key.serializer",
> "org.apache.kafka.common.serialization.ByteArraySerializer");
> …
>  producer = new KafkaProducer(props);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0 and key.serializer is wrong however key not used

2017-07-08 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14623:
-
Description: 
{{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has been 
written to the broker at least.

current code list below:

{code}
  
props.put("request.required.acks", "0");

{code}

*Update*

find another bug about this class, key.serializer used 
{{org.apache.kafka.common.serialization.ByteArraySerializer}}

  was:
{{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has been 
written to the broker at least.

current code list below:

{code}
  
props.put("request.required.acks", "0");

{code}


> KafkaSink#init should set acks to 1,not 0 and key.serializer is wrong however 
> key not used
> --
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}
> *Update*
> find another bug about this class, key.serializer used 
> {{org.apache.kafka.common.serialization.ByteArraySerializer}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0 and key.serializer is wrong however key not used

2017-07-08 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14623:
-
Summary: KafkaSink#init should set acks to 1,not 0 and key.serializer is 
wrong however key not used  (was: KafkaSink#init should set acks to 1,not 0)

> KafkaSink#init should set acks to 1,not 0 and key.serializer is wrong however 
> key not used
> --
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14567) DistCP NullPointerException when -atomic is set but -tmp is not

2017-07-08 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071185#comment-16071185
 ] 

Hongyuan Li edited comment on HADOOP-14567 at 7/8/17 7:35 AM:
--

i think we should add a default tmp workPath for distcp. [~yzhangal]

i files a new jira to HADOOP-14631  to state it obvsiously.


was (Author: hongyuan li):
i think we should add a default tmp workPath for distcp. [~yzhangal]

> DistCP NullPointerException when -atomic is set but -tmp is not
> ---
>
> Key: HADOOP-14567
> URL: https://issues.apache.org/jira/browse/HADOOP-14567
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.7.3
> Environment: HDP 2.5.0 kerberized cluster -> HDP 2.6.0 kerberized 
> cluster
>Reporter: Hari Sekhon
>Assignee: Hongyuan Li
>Priority: Minor
>
> When running distcp if using -atomic but not specifying -tmp then the 
> following NullPointerException is encountered - removing -atomic avoids this 
> bug:
> {code}
> 17/06/21 16:50:59 ERROR tools.DistCp: Exception encountered
> java.lang.NullPointerException
> at org.apache.hadoop.fs.Path.(Path.java:104)
> at org.apache.hadoop.fs.Path.(Path.java:93)
> at 
> org.apache.hadoop.tools.DistCp.configureOutputFormat(DistCp.java:363)
> at org.apache.hadoop.tools.DistCp.createJob(DistCp.java:247)
> at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:176)
> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:155)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:128)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:462)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14631) Distcp should add a default atomicWorkPath properties when using atomic

2017-07-08 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14631:
-
Fix Version/s: 2.7.3

> Distcp should add a default  atomicWorkPath properties when using atomic
> 
>
> Key: HADOOP-14631
> URL: https://issues.apache.org/jira/browse/HADOOP-14631
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.3, 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Fix For: 2.7.3
>
>
> Distcp should add a default  AtomicWorkPath properties when using atomic
> {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work 
> path,
> {code}
> if (context.shouldAtomicCommit()) {
>   Path workDir = context.getAtomicWorkPath();
>   if (workDir == null) {
> workDir = targetPath.getParent();
>   }
>   workDir = new Path(workDir, WIP_PREFIX + targetPath.getName()
> + rand.nextInt());
> {code}
> When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent 
> of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent 
> will be {{null}}, wich means 
> {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + 
> rand.nextInt());}} will throw a nullpoint exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Assigned] (HADOOP-14567) DistCP NullPointerException when -atomic is set but -tmp is not

2017-07-08 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li reassigned HADOOP-14567:


Assignee: Hongyuan Li

> DistCP NullPointerException when -atomic is set but -tmp is not
> ---
>
> Key: HADOOP-14567
> URL: https://issues.apache.org/jira/browse/HADOOP-14567
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.7.3
> Environment: HDP 2.5.0 kerberized cluster -> HDP 2.6.0 kerberized 
> cluster
>Reporter: Hari Sekhon
>Assignee: Hongyuan Li
>Priority: Minor
>
> When running distcp if using -atomic but not specifying -tmp then the 
> following NullPointerException is encountered - removing -atomic avoids this 
> bug:
> {code}
> 17/06/21 16:50:59 ERROR tools.DistCp: Exception encountered
> java.lang.NullPointerException
> at org.apache.hadoop.fs.Path.(Path.java:104)
> at org.apache.hadoop.fs.Path.(Path.java:93)
> at 
> org.apache.hadoop.tools.DistCp.configureOutputFormat(DistCp.java:363)
> at org.apache.hadoop.tools.DistCp.createJob(DistCp.java:247)
> at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:176)
> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:155)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:128)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:462)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14632) add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.

2017-07-08 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079022#comment-16079022
 ] 

Hongyuan Li commented on HADOOP-14632:
--

attched new patch.

> add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can 
> improve the  transfer speed.
> --
>
> Key: HADOOP-14632
> URL: https://issues.apache.org/jira/browse/HADOOP-14632
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha1
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14632-001.patch, HADOOP-14632-002.patch, 
> HADOOP-14632-003.patch, HADOOP-14632-004.patch
>
>
> add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which 
> can improve the  transfer speed.
> Test example shows transfer  performance has improved a lot.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14632) add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.

2017-07-08 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14632:
-
Attachment: HADOOP-14632-004.patch

> add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can 
> improve the  transfer speed.
> --
>
> Key: HADOOP-14632
> URL: https://issues.apache.org/jira/browse/HADOOP-14632
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha1
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14632-001.patch, HADOOP-14632-002.patch, 
> HADOOP-14632-003.patch, HADOOP-14632-004.patch
>
>
> add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which 
> can improve the  transfer speed.
> Test example shows transfer  performance has improved a lot.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14632) add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.

2017-07-08 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079020#comment-16079020
 ] 

Hongyuan Li commented on HADOOP-14632:
--

one of the junit test is related with the patch, will work on it.

> add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can 
> improve the  transfer speed.
> --
>
> Key: HADOOP-14632
> URL: https://issues.apache.org/jira/browse/HADOOP-14632
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha1
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14632-001.patch, HADOOP-14632-002.patch, 
> HADOOP-14632-003.patch
>
>
> add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which 
> can improve the  transfer speed.
> Test example shows transfer  performance has improved a lot.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14631) Distcp should add a default atomicWorkPath properties when using atomic

2017-07-08 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14631:
-
Summary: Distcp should add a default  atomicWorkPath properties when using 
atomic  (was: Distcp should add a default  atomicWorkPath properties when using 
atomic or throw obvious Exception)

> Distcp should add a default  atomicWorkPath properties when using atomic
> 
>
> Key: HADOOP-14631
> URL: https://issues.apache.org/jira/browse/HADOOP-14631
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.3, 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
>
> Distcp should add a default  AtomicWorkPath properties when using atomic
> {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work 
> path,
> {code}
> if (context.shouldAtomicCommit()) {
>   Path workDir = context.getAtomicWorkPath();
>   if (workDir == null) {
> workDir = targetPath.getParent();
>   }
>   workDir = new Path(workDir, WIP_PREFIX + targetPath.getName()
> + rand.nextInt());
> {code}
> When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent 
> of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent 
> will be {{null}}, wich means 
> {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + 
> rand.nextInt());}} will throw a nullpoint exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14632) add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.

2017-07-08 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14632:
-
Attachment: HADOOP-14632-003.patch

> add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can 
> improve the  transfer speed.
> --
>
> Key: HADOOP-14632
> URL: https://issues.apache.org/jira/browse/HADOOP-14632
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha1
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14632-001.patch, HADOOP-14632-002.patch, 
> HADOOP-14632-003.patch
>
>
> add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which 
> can improve the  transfer speed.
> Test example shows transfer  performance has improved a lot.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14632) add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.

2017-07-07 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078964#comment-16078964
 ] 

Hongyuan Li edited comment on HADOOP-14632 at 7/8/17 5:46 AM:
--

Will add funtional test units in next patch.


was (Author: hongyuan li):
Will add funtionational test units in next patch.

> add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can 
> improve the  transfer speed.
> --
>
> Key: HADOOP-14632
> URL: https://issues.apache.org/jira/browse/HADOOP-14632
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha1
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14632-001.patch, HADOOP-14632-002.patch
>
>
> add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which 
> can improve the  transfer speed.
> Test example shows transfer  performance has improved a lot.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14632) add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.

2017-07-07 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078964#comment-16078964
 ] 

Hongyuan Li edited comment on HADOOP-14632 at 7/8/17 5:45 AM:
--

Will add funtionational test units in next patch.


was (Author: hongyuan li):
Will add fntionational test units in next patch.

> add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can 
> improve the  transfer speed.
> --
>
> Key: HADOOP-14632
> URL: https://issues.apache.org/jira/browse/HADOOP-14632
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha1
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14632-001.patch, HADOOP-14632-002.patch
>
>
> add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which 
> can improve the  transfer speed.
> Test example shows transfer  performance has improved a lot.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14632) add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.

2017-07-07 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14632:
-
Summary: add buffer to SFTPFileSystem#create and SFTPFileSystem#open 
method, which can improve the  transfer speed.  (was: add buffersize to 
SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the  
transfer speed.)

> add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can 
> improve the  transfer speed.
> --
>
> Key: HADOOP-14632
> URL: https://issues.apache.org/jira/browse/HADOOP-14632
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha1
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14632-001.patch, HADOOP-14632-002.patch
>
>
> add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which 
> can improve the  transfer speed.
> Test example shows transfer  performance has improved a lot.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14632) add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.

2017-07-07 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078964#comment-16078964
 ] 

Hongyuan Li commented on HADOOP-14632:
--

Will add fntionational test units in next patch.

> add buffer to SFTPFileSystem#create and SFTPFileSystem#open method, which can 
> improve the  transfer speed.
> --
>
> Key: HADOOP-14632
> URL: https://issues.apache.org/jira/browse/HADOOP-14632
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha1
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14632-001.patch, HADOOP-14632-002.patch
>
>
> add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which 
> can improve the  transfer speed.
> Test example shows transfer  performance has improved a lot.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14631) Distcp should add a default atomicWorkPath properties when using atomic or throw obvious Exception

2017-07-07 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078963#comment-16078963
 ] 

Hongyuan Li commented on HADOOP-14631:
--

ok, thanks for your advice. Will working on it. 

> Distcp should add a default  atomicWorkPath properties when using atomic or 
> throw obvious Exception
> ---
>
> Key: HADOOP-14631
> URL: https://issues.apache.org/jira/browse/HADOOP-14631
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.3, 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
>
> Distcp should add a default  AtomicWorkPath properties when using atomic
> {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work 
> path,
> {code}
> if (context.shouldAtomicCommit()) {
>   Path workDir = context.getAtomicWorkPath();
>   if (workDir == null) {
> workDir = targetPath.getParent();
>   }
>   workDir = new Path(workDir, WIP_PREFIX + targetPath.getName()
> + rand.nextInt());
> {code}
> When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent 
> of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent 
> will be {{null}}, wich means 
> {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + 
> rand.nextInt());}} will throw a nullpoint exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14631) Distcp should add a default atomicWorkPath properties when using atomic or throw obvious Exception

2017-07-07 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078563#comment-16078563
 ] 

Hongyuan Li commented on HADOOP-14631:
--

ping [~liuml07] for a suggestion.

> Distcp should add a default  atomicWorkPath properties when using atomic or 
> throw obvious Exception
> ---
>
> Key: HADOOP-14631
> URL: https://issues.apache.org/jira/browse/HADOOP-14631
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.3, 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
>
> Distcp should add a default  AtomicWorkPath properties when using atomic
> {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work 
> path,
> {code}
> if (context.shouldAtomicCommit()) {
>   Path workDir = context.getAtomicWorkPath();
>   if (workDir == null) {
> workDir = targetPath.getParent();
>   }
>   workDir = new Path(workDir, WIP_PREFIX + targetPath.getName()
> + rand.nextInt());
> {code}
> When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent 
> of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent 
> will be {{null}}, wich means 
> {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + 
> rand.nextInt());}} will throw a nullpoint exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14632) add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.

2017-07-07 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14632:
-
Attachment: HADOOP-14632-002.patch

fix findbug warnings

> add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which 
> can improve the  transfer speed.
> --
>
> Key: HADOOP-14632
> URL: https://issues.apache.org/jira/browse/HADOOP-14632
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha1
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14632-001.patch, HADOOP-14632-002.patch
>
>
> add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which 
> can improve the  transfer speed.
> Test example shows transfer  performance has improved a lot.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14631) Distcp should add a default atomicWorkPath properties when using atomic or throw obvious Exception

2017-07-07 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078250#comment-16078250
 ] 

Hongyuan Li edited comment on HADOOP-14631 at 7/7/17 6:02 PM:
--

i don't know which is a good choice? to adding a default atomic workPath or  
throwing obvious Exception.


was (Author: hongyuan li):
i don't know which is a goos idea ? to adding a default atomic workPath or  
throwing obvious Exception.

> Distcp should add a default  atomicWorkPath properties when using atomic or 
> throw obvious Exception
> ---
>
> Key: HADOOP-14631
> URL: https://issues.apache.org/jira/browse/HADOOP-14631
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.3, 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
>
> Distcp should add a default  AtomicWorkPath properties when using atomic
> {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work 
> path,
> {code}
> if (context.shouldAtomicCommit()) {
>   Path workDir = context.getAtomicWorkPath();
>   if (workDir == null) {
> workDir = targetPath.getParent();
>   }
>   workDir = new Path(workDir, WIP_PREFIX + targetPath.getName()
> + rand.nextInt());
> {code}
> When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent 
> of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent 
> will be {{null}}, wich means 
> {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + 
> rand.nextInt());}} will throw a nullpoint exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14632) add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.

2017-07-07 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14632:
-
Status: Patch Available  (was: Open)

> add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which 
> can improve the  transfer speed.
> --
>
> Key: HADOOP-14632
> URL: https://issues.apache.org/jira/browse/HADOOP-14632
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha1
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14632-001.patch
>
>
> add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which 
> can improve the  transfer speed.
> Test example shows transfer  performance has improved a lot.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14632) add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.

2017-07-07 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078215#comment-16078215
 ] 

Hongyuan Li edited comment on HADOOP-14632 at 7/7/17 3:37 PM:
--

attach a patch.


was (Author: hongyuan li):
attach a file

> add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which 
> can improve the  transfer speed.
> --
>
> Key: HADOOP-14632
> URL: https://issues.apache.org/jira/browse/HADOOP-14632
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha1
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14632-001.patch
>
>
> add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which 
> can improve the  transfer speed.
> Test example shows transfer  performance has improved a lot.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14632) add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.

2017-07-07 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14632:
-
Affects Version/s: 3.0.0-alpha1

> add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which 
> can improve the  transfer speed.
> --
>
> Key: HADOOP-14632
> URL: https://issues.apache.org/jira/browse/HADOOP-14632
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha1
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14632-001.patch
>
>
> add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which 
> can improve the  transfer speed.
> Test example shows transfer  performance has improved a lot.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14631) Distcp should add a default atomicWorkPath properties when using atomic or throw obvious Exception

2017-07-07 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078250#comment-16078250
 ] 

Hongyuan Li commented on HADOOP-14631:
--

i don't know which is a goos idea ? to adding a default atomic workPath or  
throwing obvious Exception.

> Distcp should add a default  atomicWorkPath properties when using atomic or 
> throw obvious Exception
> ---
>
> Key: HADOOP-14631
> URL: https://issues.apache.org/jira/browse/HADOOP-14631
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.3, 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
>
> Distcp should add a default  AtomicWorkPath properties when using atomic
> {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work 
> path,
> {code}
> if (context.shouldAtomicCommit()) {
>   Path workDir = context.getAtomicWorkPath();
>   if (workDir == null) {
> workDir = targetPath.getParent();
>   }
>   workDir = new Path(workDir, WIP_PREFIX + targetPath.getName()
> + rand.nextInt());
> {code}
> When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent 
> of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent 
> will be {{null}}, wich means 
> {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + 
> rand.nextInt());}} will throw a nullpoint exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14632) add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.

2017-07-07 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078227#comment-16078227
 ] 

Hongyuan Li commented on HADOOP-14632:
--

the performance test result will be added soon.

> add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which 
> can improve the  transfer speed.
> --
>
> Key: HADOOP-14632
> URL: https://issues.apache.org/jira/browse/HADOOP-14632
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14632-001.patch
>
>
> add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which 
> can improve the  transfer speed.
> Test example shows transfer  performance has improved a lot.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14632) add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.

2017-07-07 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14632:
-
Attachment: HADOOP-14632-001.patch

attach a file

> add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which 
> can improve the  transfer speed.
> --
>
> Key: HADOOP-14632
> URL: https://issues.apache.org/jira/browse/HADOOP-14632
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14632-001.patch
>
>
> add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which 
> can improve the  transfer speed.
> Test example shows transfer  performance has improved a lot.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Assigned] (HADOOP-14632) add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.

2017-07-07 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li reassigned HADOOP-14632:


Assignee: Hongyuan Li

> add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which 
> can improve the  transfer speed.
> --
>
> Key: HADOOP-14632
> URL: https://issues.apache.org/jira/browse/HADOOP-14632
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
>
> add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which 
> can improve the  transfer speed.
> Test example shows transfer  performance has improved a lot.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-14632) add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which can improve the transfer speed.

2017-07-07 Thread Hongyuan Li (JIRA)

Hongyuan Li created HADOOP-14632:


 Summary: add buffersize to SFTPFileSystem#create and 
SFTPFileSystem#open method, which can improve the  transfer speed.
 Key: HADOOP-14632
 URL: https://issues.apache.org/jira/browse/HADOOP-14632
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Hongyuan Li


add buffersize to SFTPFileSystem#create and SFTPFileSystem#open method, which 
can improve the  transfer speed.
Test example shows transfer  performance has improved a lot.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14631) Distcp should add a default atomicWorkPath properties when using atomic or throw obvious Exception

2017-07-07 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14631:
-
Summary: Distcp should add a default  atomicWorkPath properties when using 
atomic or throw obvious Exception  (was: Distcp should add a default  
atomicWorkPath properties when using atomic or throw detailed Exception)

> Distcp should add a default  atomicWorkPath properties when using atomic or 
> throw obvious Exception
> ---
>
> Key: HADOOP-14631
> URL: https://issues.apache.org/jira/browse/HADOOP-14631
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.3, 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
>
> Distcp should add a default  AtomicWorkPath properties when using atomic
> {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work 
> path,
> {code}
> if (context.shouldAtomicCommit()) {
>   Path workDir = context.getAtomicWorkPath();
>   if (workDir == null) {
> workDir = targetPath.getParent();
>   }
>   workDir = new Path(workDir, WIP_PREFIX + targetPath.getName()
> + rand.nextInt());
> {code}
> When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent 
> of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent 
> will be {{null}}, wich means 
> {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + 
> rand.nextInt());}} will throw a nullpoint exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14631) Distcp should add a default atomicWorkPath properties when using atomic or throw detailed Exception

2017-07-07 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14631:
-
Summary: Distcp should add a default  atomicWorkPath properties when using 
atomic or throw detailed Exception  (was: Distcp should add a default  
AtomicWorkPath properties when using atomic)

> Distcp should add a default  atomicWorkPath properties when using atomic or 
> throw detailed Exception
> 
>
> Key: HADOOP-14631
> URL: https://issues.apache.org/jira/browse/HADOOP-14631
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.3, 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
>
> Distcp should add a default  AtomicWorkPath properties when using atomic
> {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work 
> path,
> {code}
> if (context.shouldAtomicCommit()) {
>   Path workDir = context.getAtomicWorkPath();
>   if (workDir == null) {
> workDir = targetPath.getParent();
>   }
>   workDir = new Path(workDir, WIP_PREFIX + targetPath.getName()
> + rand.nextInt());
> {code}
> When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent 
> of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent 
> will be {{null}}, wich means 
> {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + 
> rand.nextInt());}} will throw a nullpoint exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14631) Distcp should add a default AtomicWorkPath properties when using atomic

2017-07-07 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14631:
-
Description: 
Distcp should add a default  AtomicWorkPath properties when using atomic

{{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work 
path,

{code}
if (context.shouldAtomicCommit()) {
  Path workDir = context.getAtomicWorkPath();
  if (workDir == null) {
workDir = targetPath.getParent();
  }
  workDir = new Path(workDir, WIP_PREFIX + targetPath.getName()
+ rand.nextInt());
[code}

When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent 
of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent 
will be {{null}}, wich means 
{{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + 
rand.nextInt());}} will throw a nullpoint exception.

  was:
Distcp should add a default  AtomicWorkPath properties when using atomic

{{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work 
path

{code}
if (context.shouldAtomicCommit()) {
  Path workDir = context.getAtomicWorkPath();
  if (workDir == null) {
workDir = targetPath.getParent();
  }
  workDir = new Path(workDir, WIP_PREFIX + targetPath.getName()
+ rand.nextInt());
[code}

When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent 
of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent 
will be {{null}}, wich means 
{{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + 
rand.nextInt());}} will throw a nullpoint exception.


> Distcp should add a default  AtomicWorkPath properties when using atomic
> 
>
> Key: HADOOP-14631
> URL: https://issues.apache.org/jira/browse/HADOOP-14631
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.3, 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
>
> Distcp should add a default  AtomicWorkPath properties when using atomic
> {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work 
> path,
> {code}
> if (context.shouldAtomicCommit()) {
>   Path workDir = context.getAtomicWorkPath();
>   if (workDir == null) {
> workDir = targetPath.getParent();
>   }
>   workDir = new Path(workDir, WIP_PREFIX + targetPath.getName()
> + rand.nextInt());
> [code}
> When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent 
> of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent 
> will be {{null}}, wich means 
> {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + 
> rand.nextInt());}} will throw a nullpoint exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14631) Distcp should add a default AtomicWorkPath properties when using atomic

2017-07-07 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14631:
-
Description: 
Distcp should add a default  AtomicWorkPath properties when using atomic

{{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work 
path,

{code}
if (context.shouldAtomicCommit()) {
  Path workDir = context.getAtomicWorkPath();
  if (workDir == null) {
workDir = targetPath.getParent();
  }
  workDir = new Path(workDir, WIP_PREFIX + targetPath.getName()
+ rand.nextInt());
{code}

When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent 
of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent 
will be {{null}}, wich means 
{{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + 
rand.nextInt());}} will throw a nullpoint exception.

  was:
Distcp should add a default  AtomicWorkPath properties when using atomic

{{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work 
path,

{code}
if (context.shouldAtomicCommit()) {
  Path workDir = context.getAtomicWorkPath();
  if (workDir == null) {
workDir = targetPath.getParent();
  }
  workDir = new Path(workDir, WIP_PREFIX + targetPath.getName()
+ rand.nextInt());
[code}

When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent 
of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent 
will be {{null}}, wich means 
{{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + 
rand.nextInt());}} will throw a nullpoint exception.


> Distcp should add a default  AtomicWorkPath properties when using atomic
> 
>
> Key: HADOOP-14631
> URL: https://issues.apache.org/jira/browse/HADOOP-14631
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.3, 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
>
> Distcp should add a default  AtomicWorkPath properties when using atomic
> {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work 
> path,
> {code}
> if (context.shouldAtomicCommit()) {
>   Path workDir = context.getAtomicWorkPath();
>   if (workDir == null) {
> workDir = targetPath.getParent();
>   }
>   workDir = new Path(workDir, WIP_PREFIX + targetPath.getName()
> + rand.nextInt());
> {code}
> When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent 
> of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent 
> will be {{null}}, wich means 
> {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + 
> rand.nextInt());}} will throw a nullpoint exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14631) Distcp should add a default AtomicWorkPath properties when using atomic

2017-07-07 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14631:
-
Affects Version/s: 2.7.3
   3.0.0-alpha3

> Distcp should add a default  AtomicWorkPath properties when using atomic
> 
>
> Key: HADOOP-14631
> URL: https://issues.apache.org/jira/browse/HADOOP-14631
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.3, 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
>
> Distcp should add a default  AtomicWorkPath properties when using atomic
> {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work 
> path,
> {code}
> if (context.shouldAtomicCommit()) {
>   Path workDir = context.getAtomicWorkPath();
>   if (workDir == null) {
> workDir = targetPath.getParent();
>   }
>   workDir = new Path(workDir, WIP_PREFIX + targetPath.getName()
> + rand.nextInt());
> [code}
> When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent 
> of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent 
> will be {{null}}, wich means 
> {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + 
> rand.nextInt());}} will throw a nullpoint exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14631) Distcp should add a default AtomicWorkPath properties when using atomic

2017-07-07 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14631:
-
Description: 
Distcp should add a default  AtomicWorkPath properties when using atomic

{{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work 
path

{code}
if (context.shouldAtomicCommit()) {
  Path workDir = context.getAtomicWorkPath();
  if (workDir == null) {
workDir = targetPath.getParent();
  }
  workDir = new Path(workDir, WIP_PREFIX + targetPath.getName()
+ rand.nextInt());
[code}

When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent 
of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent 
will be {{null}}, wich means 
{{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + 
rand.nextInt());}} will throw a nullpoint exception.

  was:
Distcp should add a default  AtomicWorkPath properties when using atomic

{{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work 
path

{code}
if (context.shouldAtomicCommit()) {
  Path workDir = context.getAtomicWorkPath();
  if (workDir == null) {
workDir = targetPath.getParent();
  }
  workDir = new Path(workDir, WIP_PREFIX + targetPath.getName()
+ rand.nextInt());
[code}

When atomic is set and tAtomicWorkPath == null, distcp will get the parent of 
current WorkDir. In this case, if {{workdir}} is {{"/"}}, the parent will be 
{{null}}, wich means 
{{ workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + 
rand.nextInt());}} will throw a nullpoint exception.


> Distcp should add a default  AtomicWorkPath properties when using atomic
> 
>
> Key: HADOOP-14631
> URL: https://issues.apache.org/jira/browse/HADOOP-14631
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Hongyuan Li
>
> Distcp should add a default  AtomicWorkPath properties when using atomic
> {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work 
> path
> {code}
> if (context.shouldAtomicCommit()) {
>   Path workDir = context.getAtomicWorkPath();
>   if (workDir == null) {
> workDir = targetPath.getParent();
>   }
>   workDir = new Path(workDir, WIP_PREFIX + targetPath.getName()
> + rand.nextInt());
> [code}
> When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent 
> of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent 
> will be {{null}}, wich means 
> {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + 
> rand.nextInt());}} will throw a nullpoint exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Assigned] (HADOOP-14631) Distcp should add a default AtomicWorkPath properties when using atomic

2017-07-07 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li reassigned HADOOP-14631:


Assignee: Hongyuan Li

> Distcp should add a default  AtomicWorkPath properties when using atomic
> 
>
> Key: HADOOP-14631
> URL: https://issues.apache.org/jira/browse/HADOOP-14631
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
>
> Distcp should add a default  AtomicWorkPath properties when using atomic
> {{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work 
> path
> {code}
> if (context.shouldAtomicCommit()) {
>   Path workDir = context.getAtomicWorkPath();
>   if (workDir == null) {
> workDir = targetPath.getParent();
>   }
>   workDir = new Path(workDir, WIP_PREFIX + targetPath.getName()
> + rand.nextInt());
> [code}
> When atomic is set and {{AtomicWorkPath}} == null, distcp will get the parent 
> of current {{WorkDir}}. In this case, if {{workdir}} is {{"/"}}, the parent 
> will be {{null}}, wich means 
> {{workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + 
> rand.nextInt());}} will throw a nullpoint exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-14631) Distcp should add a default AtomicWorkPath properties when using atomic

2017-07-07 Thread Hongyuan Li (JIRA)

Hongyuan Li created HADOOP-14631:


 Summary: Distcp should add a default  AtomicWorkPath properties 
when using atomic
 Key: HADOOP-14631
 URL: https://issues.apache.org/jira/browse/HADOOP-14631
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Hongyuan Li


Distcp should add a default  AtomicWorkPath properties when using atomic

{{Distcp}}#{{configureOutputFormat}} using code below to generate atomic work 
path

{code}
if (context.shouldAtomicCommit()) {
  Path workDir = context.getAtomicWorkPath();
  if (workDir == null) {
workDir = targetPath.getParent();
  }
  workDir = new Path(workDir, WIP_PREFIX + targetPath.getName()
+ rand.nextInt());
[code}

When atomic is set and tAtomicWorkPath == null, distcp will get the parent of 
current WorkDir. In this case, if {{workdir}} is {{"/"}}, the parent will be 
{{null}}, wich means 
{{ workDir = new Path(workDir, WIP_PREFIX + targetPath.getName() + 
rand.nextInt());}} will throw a nullpoint exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-12802) local FileContext does not rename .crc file

2017-07-07 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-12802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077727#comment-16077727
 ] 

Hongyuan Li edited comment on HADOOP-12802 at 7/7/17 7:22 AM:
--

[~boky01], i don't think so , the implement of {{rename}} of LocalFileContext 
is different from {{LocalFileSystem}}#{{rename}}. The junit test failure of 
reporter is caused by the nonexistance  of file {{newCrcPath}},

set {{fs.AbstractFileSystem.file.impl}} to 
{{org.apache.hadoop.fs.LocalFileSystem}} may solve this problem. [~y0un5] 

*Update*

forgot this, i misktaked the version of hadoop


was (Author: hongyuan li):
[~boky01], i don't think so , the implement of {{rename}} of LocalFileContext 
is different from {{LocalFileSystem}}#{{rename}}. The junit test failure of 
reporter is caused by the nonexistance  of file {{newCrcPath}},

set {{fs.AbstractFileSystem.file.impl}} to 
{{org.apache.hadoop.fs.LocalFileSystem}} may solve this problem. [~y0un5] 

*Update*

forgot this, the latest patch seems solved this.

> local FileContext does not rename .crc file
> ---
>
> Key: HADOOP-12802
> URL: https://issues.apache.org/jira/browse/HADOOP-12802
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.2, 3.0.0-alpha1
>Reporter: Youngjoon Kim
>Assignee: Andras Bokor
> Attachments: HADOOP-12802.01.patch
>
>
> After run the following code, "old" file is renamed to "new", but ".old.crc" 
> is not renamed to ".new.crc"
> {code}
> Path oldPath = new Path("/tmp/old");
> Path newPath = new Path("/tmp/new");
> Configuration conf = new Configuration();
> FileContext fc = FileContext.getLocalFSFileContext(conf);
> FSDataOutputStream out = fc.create(oldPath, EnumSet.of(CreateFlag.CREATE));
> out.close();
> fc.rename(oldPath, newPath);
> {code}
> On the other hand, local FileSystem successfully renames .crc file.
> {code}
> Path oldPath = new Path("/tmp/old");
> Path newPath = new Path("/tmp/new");
> Configuration conf = new Configuration();
> FileSystem fs = FileSystem.getLocal(conf);
> FSDataOutputStream out = fs.create(oldPath);
> out.close();
> fs.rename(oldPath, newPath);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-12802) local FileContext does not rename .crc file

2017-07-07 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-12802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077727#comment-16077727
 ] 

Hongyuan Li edited comment on HADOOP-12802 at 7/7/17 7:21 AM:
--

[~boky01], i don't think so , the implement of {{rename}} of LocalFileContext 
is different from {{LocalFileSystem}}#{{rename}}. The junit test failure of 
reporter is caused by the nonexistance  of file {{newCrcPath}},

set {{fs.AbstractFileSystem.file.impl}} to 
{{org.apache.hadoop.fs.LocalFileSystem}} may solve this problem. [~y0un5] 

*Update*

forgot this, the latest patch seems solved this.


was (Author: hongyuan li):
[~boky01], i don't think so , the implement of {{rename}} of LocalFileContext 
is different from {{LocalFileSystem}}#{{rename}}. The junit test failure of 
reporter is caused by the nonexistance  of file {{newCrcPath}},
set {{fs.AbstractFileSystem.file.impl}} to 
{{org.apache.hadoop.fs.LocalFileSystem}} may solve this problem. [~y0un5] 

> local FileContext does not rename .crc file
> ---
>
> Key: HADOOP-12802
> URL: https://issues.apache.org/jira/browse/HADOOP-12802
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.2, 3.0.0-alpha1
>Reporter: Youngjoon Kim
>Assignee: Andras Bokor
> Attachments: HADOOP-12802.01.patch
>
>
> After run the following code, "old" file is renamed to "new", but ".old.crc" 
> is not renamed to ".new.crc"
> {code}
> Path oldPath = new Path("/tmp/old");
> Path newPath = new Path("/tmp/new");
> Configuration conf = new Configuration();
> FileContext fc = FileContext.getLocalFSFileContext(conf);
> FSDataOutputStream out = fc.create(oldPath, EnumSet.of(CreateFlag.CREATE));
> out.close();
> fc.rename(oldPath, newPath);
> {code}
> On the other hand, local FileSystem successfully renames .crc file.
> {code}
> Path oldPath = new Path("/tmp/old");
> Path newPath = new Path("/tmp/new");
> Configuration conf = new Configuration();
> FileSystem fs = FileSystem.getLocal(conf);
> FSDataOutputStream out = fs.create(oldPath);
> out.close();
> fs.rename(oldPath, newPath);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-12802) local FileContext does not rename .crc file

2017-07-07 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-12802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077727#comment-16077727
 ] 

Hongyuan Li edited comment on HADOOP-12802 at 7/7/17 7:17 AM:
--

[~boky01], i don't think so , the implement of {{rename}} of LocalFileContext 
is different from {{LocalFileSystem}}#{{rename}}. The junit test failure of 
reporter is caused by the nonexistance  of file {{newCrcPath}},
set {{fs.AbstractFileSystem.file.impl}} to 
{{org.apache.hadoop.fs.LocalFileSystem}} may solve this problem. [~y0un5] 


was (Author: hongyuan li):
[~boky01]], i don't think so , the implement of {{rename}} of LocalFileContext 
is different from {{LocalFileSystem}}#{{rename}}. The junit test failure of 
reporter is caused by the nonexistance  of file {{newCrcPath}},
set {{fs.AbstractFileSystem.file.impl}} to 
{{org.apache.hadoop.fs.LocalFileSystem}} may solve this problem. [~y0un5] 

> local FileContext does not rename .crc file
> ---
>
> Key: HADOOP-12802
> URL: https://issues.apache.org/jira/browse/HADOOP-12802
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.2, 3.0.0-alpha1
>Reporter: Youngjoon Kim
>Assignee: Andras Bokor
> Attachments: HADOOP-12802.01.patch
>
>
> After run the following code, "old" file is renamed to "new", but ".old.crc" 
> is not renamed to ".new.crc"
> {code}
> Path oldPath = new Path("/tmp/old");
> Path newPath = new Path("/tmp/new");
> Configuration conf = new Configuration();
> FileContext fc = FileContext.getLocalFSFileContext(conf);
> FSDataOutputStream out = fc.create(oldPath, EnumSet.of(CreateFlag.CREATE));
> out.close();
> fc.rename(oldPath, newPath);
> {code}
> On the other hand, local FileSystem successfully renames .crc file.
> {code}
> Path oldPath = new Path("/tmp/old");
> Path newPath = new Path("/tmp/new");
> Configuration conf = new Configuration();
> FileSystem fs = FileSystem.getLocal(conf);
> FSDataOutputStream out = fs.create(oldPath);
> out.close();
> fs.rename(oldPath, newPath);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-12802) local FileContext does not rename .crc file

2017-07-07 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-12802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077727#comment-16077727
 ] 

Hongyuan Li edited comment on HADOOP-12802 at 7/7/17 7:17 AM:
--

[~boky01]], i don't think so , the implement of {{rename}} of LocalFileContext 
is different from {{LocalFileSystem}}#{{rename}}. The junit test failure of 
reporter is caused by the nonexistance  of file {{newCrcPath}},
set {{fs.AbstractFileSystem.file.impl}} to 
{{org.apache.hadoop.fs.LocalFileSystem}} may solve this problem. [~y0un5] 


was (Author: hongyuan li):
[~Andras Bokor], i don't think so , the implement of {{rename}} of 
LocalFileContext is different from {{LocalFileSystem}}#{{rename}}. The junit 
test failure of reporter is caused by the nonexistance  of file {{newCrcPath}},
set {{fs.AbstractFileSystem.file.impl}} to 
{{org.apache.hadoop.fs.LocalFileSystem}} may solve this problem. [~y0un5] 

> local FileContext does not rename .crc file
> ---
>
> Key: HADOOP-12802
> URL: https://issues.apache.org/jira/browse/HADOOP-12802
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.2, 3.0.0-alpha1
>Reporter: Youngjoon Kim
>Assignee: Andras Bokor
> Attachments: HADOOP-12802.01.patch
>
>
> After run the following code, "old" file is renamed to "new", but ".old.crc" 
> is not renamed to ".new.crc"
> {code}
> Path oldPath = new Path("/tmp/old");
> Path newPath = new Path("/tmp/new");
> Configuration conf = new Configuration();
> FileContext fc = FileContext.getLocalFSFileContext(conf);
> FSDataOutputStream out = fc.create(oldPath, EnumSet.of(CreateFlag.CREATE));
> out.close();
> fc.rename(oldPath, newPath);
> {code}
> On the other hand, local FileSystem successfully renames .crc file.
> {code}
> Path oldPath = new Path("/tmp/old");
> Path newPath = new Path("/tmp/new");
> Configuration conf = new Configuration();
> FileSystem fs = FileSystem.getLocal(conf);
> FSDataOutputStream out = fs.create(oldPath);
> out.close();
> fs.rename(oldPath, newPath);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-12802) local FileContext does not rename .crc file

2017-07-07 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-12802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077727#comment-16077727
 ] 

Hongyuan Li commented on HADOOP-12802:
--

[~Andras Bokor], i don't think so , the implement of {{rename}} of 
LocalFileContext is different from {{LocalFileSystem}}#{{rename}}. The junit 
test failure of reporter is caused by the nonexistance  of file {{newCrcPath}},
set {{fs.AbstractFileSystem.file.impl}} to 
{{org.apache.hadoop.fs.LocalFileSystem}} may solve this problem. [~y0un5] 

> local FileContext does not rename .crc file
> ---
>
> Key: HADOOP-12802
> URL: https://issues.apache.org/jira/browse/HADOOP-12802
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.2, 3.0.0-alpha1
>Reporter: Youngjoon Kim
>Assignee: Andras Bokor
> Attachments: HADOOP-12802.01.patch
>
>
> After run the following code, "old" file is renamed to "new", but ".old.crc" 
> is not renamed to ".new.crc"
> {code}
> Path oldPath = new Path("/tmp/old");
> Path newPath = new Path("/tmp/new");
> Configuration conf = new Configuration();
> FileContext fc = FileContext.getLocalFSFileContext(conf);
> FSDataOutputStream out = fc.create(oldPath, EnumSet.of(CreateFlag.CREATE));
> out.close();
> fc.rename(oldPath, newPath);
> {code}
> On the other hand, local FileSystem successfully renames .crc file.
> {code}
> Path oldPath = new Path("/tmp/old");
> Path newPath = new Path("/tmp/new");
> Configuration conf = new Configuration();
> FileSystem fs = FileSystem.getLocal(conf);
> FSDataOutputStream out = fs.create(oldPath);
> out.close();
> fs.rename(oldPath, newPath);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14444) New implementation of ftp and sftp filesystems

2017-07-06 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077520#comment-16077520
 ] 

Hongyuan Li edited comment on HADOOP-1 at 7/7/17 3:24 AM:
--

socket is complex, i don't like to open a new socket just to seek.
Jsch and commons-net has plenty of examples, so if you want to make full use of 
it, you should deep into their implements.Also, commons-net's setTimeOut like 
method may stuck in some situations when the network environment is very poor.


was (Author: hongyuan li):
socket is complex, i don't like to open a new socket just to seek.

> New implementation of ftp and sftp filesystems
> --
>
> Key: HADOOP-1
> URL: https://issues.apache.org/jira/browse/HADOOP-1
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Lukas Waldmann
>Assignee: Lukas Waldmann
> Attachments: HADOOP-1.2.patch, HADOOP-1.3.patch, 
> HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.patch
>
>
> Current implementation of FTP and SFTP filesystems have severe limitations 
> and performance issues when dealing with high number of files. Mine patch 
> solve those issues and integrate both filesystems such a way that most of the 
> core functionality is common for both and therefore simplifying the 
> maintainability.
> The core features:
> * Support for HTTP/SOCKS proxies
> * Support for passive FTP
> * Support of connection pooling - new connection is not created for every 
> single command but reused from the pool.
> For huge number of files it shows order of magnitude performance improvement 
> over not pooled connections.
> * Caching of directory trees. For ftp you always need to list whole directory 
> whenever you ask information about particular file.
> Again for huge number of files it shows order of magnitude performance 
> improvement over not cached connections.
> * Support of keep alive (NOOP) messages to avoid connection drops
> * Support for Unix style or regexp wildcard glob - useful for listing a 
> particular files across whole directory tree
> * Support for reestablishing broken ftp data transfers - can happen 
> surprisingly often



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14444) New implementation of ftp and sftp filesystems

2017-07-06 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077520#comment-16077520
 ] 

Hongyuan Li commented on HADOOP-1:
--

socket is complex, i don't like to open a new socket just to seek.

> New implementation of ftp and sftp filesystems
> --
>
> Key: HADOOP-1
> URL: https://issues.apache.org/jira/browse/HADOOP-1
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Lukas Waldmann
>Assignee: Lukas Waldmann
> Attachments: HADOOP-1.2.patch, HADOOP-1.3.patch, 
> HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.patch
>
>
> Current implementation of FTP and SFTP filesystems have severe limitations 
> and performance issues when dealing with high number of files. Mine patch 
> solve those issues and integrate both filesystems such a way that most of the 
> core functionality is common for both and therefore simplifying the 
> maintainability.
> The core features:
> * Support for HTTP/SOCKS proxies
> * Support for passive FTP
> * Support of connection pooling - new connection is not created for every 
> single command but reused from the pool.
> For huge number of files it shows order of magnitude performance improvement 
> over not pooled connections.
> * Caching of directory trees. For ftp you always need to list whole directory 
> whenever you ask information about particular file.
> Again for huge number of files it shows order of magnitude performance 
> improvement over not cached connections.
> * Support of keep alive (NOOP) messages to avoid connection drops
> * Support for Unix style or regexp wildcard glob - useful for listing a 
> particular files across whole directory tree
> * Support for reestablishing broken ftp data transfers - can happen 
> surprisingly often



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0

2017-07-06 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076628#comment-16076628
 ] 

Hongyuan Li edited comment on HADOOP-14623 at 7/7/17 3:19 AM:
--

futuremore, flush method is to confirm that data has been written.

*Update/Crorrection*
sorry, it is the {{putMetrics}} method.
in {{KafkaSink}}#{{putMetrics}} , code lists below， which makes me have a 
different opinion:
{code}
……
Future future = producer.send(data);
jsonLines.setLength(0);
try {
  future.get();
} catch (InterruptedException e) {
  throw new MetricsException("Error sending data", e);
} catch (ExecutionException e) {
  throw new MetricsException("Error sending data", e);
}

……
{code}


was (Author: hongyuan li):
futuremore, flush method is to confirm that data has been written.

> KafkaSink#init should set acks to 1,not 0
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0

2017-07-06 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076628#comment-16076628
 ] 

Hongyuan Li commented on HADOOP-14623:
--

futuremore, flush method is to confirm that data has been written.

> KafkaSink#init should set acks to 1,not 0
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0

2017-07-06 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076154#comment-16076154
 ] 

Hongyuan Li commented on HADOOP-14623:
--

i don't think so, setting it to 1 does not means that it will block.However, i 
think that Ganglia knows the frquency of data lossed, but kafka does not. What 
you have said under estimate kafka.Kafka has more power.Compared to complete 
sync of setting acks to -1, setting acks to 1 is a better choice.

> KafkaSink#init should set acks to 1,not 0
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13743) error message in AzureNativeFileSystemStore.connectUsingAnonymousCredentials has too many spaces

2017-07-05 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074908#comment-16074908
 ] 

Hongyuan Li commented on HADOOP-13743:
--

why not using String.format or stringBuilder ? 

> error message in AzureNativeFileSystemStore.connectUsingAnonymousCredentials 
> has too many spaces
> 
>
> Key: HADOOP-13743
> URL: https://issues.apache.org/jira/browse/HADOOP-13743
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Trivial
> Attachments: HADOOP-13743-branch-2-001.patch, 
> HADOOP-14373-branch-2-002.patch
>
>
> The error message on a failed hadoop fs -ls command against an unauthed azure 
> container has an extra space in {{" them  in"}}
> {code}
> ls: org.apache.hadoop.fs.azure.AzureException: Unable to access container 
> demo in account example.blob.core.windows.net using anonymous credentials, 
> and no credentials found for them  in the configuration.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14444) New implementation of ftp and sftp filesystems

2017-07-05 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074831#comment-16074831
 ] 

Hongyuan Li edited comment on HADOOP-1 at 7/5/17 2:46 PM:
--

1、if you rename {{AbstractFTPFileSystem}} to anything else, i would be more 
happier.This class name may make users thinking sftp is one of ftp, however, it 
isn't. So, please rename it to anything without ftp?
2、{{FTPClient}}#{{retrieStream}} will open a new Data connection, which is the 
reason why i dislike the seek ops.

*Update*
you said {{SFTPChannel}}#{{close}}  just to resue session , but why do you 
disconnect channelSftp


was (Author: hongyuan li):
1、if you rename {{AbstractFTPFileSystem}} to anything else, i would be more 
happier.This class name may make users thinking sftp is one of ftp, however, it 
isn't. So, please rename it to anything without ftp?
2、{{FTPClient}}#{{retrieStream}} will open a new Data connection, which is the 
reason why i dislike the seek ops.


> New implementation of ftp and sftp filesystems
> --
>
> Key: HADOOP-1
> URL: https://issues.apache.org/jira/browse/HADOOP-1
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Lukas Waldmann
>Assignee: Lukas Waldmann
> Attachments: HADOOP-1.2.patch, HADOOP-1.3.patch, 
> HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.patch
>
>
> Current implementation of FTP and SFTP filesystems have severe limitations 
> and performance issues when dealing with high number of files. Mine patch 
> solve those issues and integrate both filesystems such a way that most of the 
> core functionality is common for both and therefore simplifying the 
> maintainability.
> The core features:
> * Support for HTTP/SOCKS proxies
> * Support for passive FTP
> * Support of connection pooling - new connection is not created for every 
> single command but reused from the pool.
> For huge number of files it shows order of magnitude performance improvement 
> over not pooled connections.
> * Caching of directory trees. For ftp you always need to list whole directory 
> whenever you ask information about particular file.
> Again for huge number of files it shows order of magnitude performance 
> improvement over not cached connections.
> * Support of keep alive (NOOP) messages to avoid connection drops
> * Support for Unix style or regexp wildcard glob - useful for listing a 
> particular files across whole directory tree
> * Support for reestablishing broken ftp data transfers - can happen 
> surprisingly often



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14444) New implementation of ftp and sftp filesystems

2017-07-05 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074831#comment-16074831
 ] 

Hongyuan Li edited comment on HADOOP-1 at 7/5/17 2:36 PM:
--

1、if you rename {{AbstractFTPFileSystem}} to anything else, i would be more 
happier.This class name may make users thinking sftp is one of ftp, however, it 
isn't. So, please rename it to anything without ftp?
2、{{FTPClient}}#{{retrieStream}} will open a new Data connection, which is the 
reason why i dislike the seek ops.



was (Author: hongyuan li):
1、if you rename "AbstractFTPFileSystem" to anything else, i would be more 
happier.This class name may make users thinking sftp is one of ftp, however, it 
isn't.
2、FTPClient#retrieStream will open a new Data connection, which is the reason 
why i dislike the seek ops.


> New implementation of ftp and sftp filesystems
> --
>
> Key: HADOOP-1
> URL: https://issues.apache.org/jira/browse/HADOOP-1
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Lukas Waldmann
>Assignee: Lukas Waldmann
> Attachments: HADOOP-1.2.patch, HADOOP-1.3.patch, 
> HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.patch
>
>
> Current implementation of FTP and SFTP filesystems have severe limitations 
> and performance issues when dealing with high number of files. Mine patch 
> solve those issues and integrate both filesystems such a way that most of the 
> core functionality is common for both and therefore simplifying the 
> maintainability.
> The core features:
> * Support for HTTP/SOCKS proxies
> * Support for passive FTP
> * Support of connection pooling - new connection is not created for every 
> single command but reused from the pool.
> For huge number of files it shows order of magnitude performance improvement 
> over not pooled connections.
> * Caching of directory trees. For ftp you always need to list whole directory 
> whenever you ask information about particular file.
> Again for huge number of files it shows order of magnitude performance 
> improvement over not cached connections.
> * Support of keep alive (NOOP) messages to avoid connection drops
> * Support for Unix style or regexp wildcard glob - useful for listing a 
> particular files across whole directory tree
> * Support for reestablishing broken ftp data transfers - can happen 
> surprisingly often



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0

2017-07-05 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074856#comment-16074856
 ] 

Hongyuan Li edited comment on HADOOP-14623 at 7/5/17 2:35 PM:
--

[~jojochuang] hard to write an only junit test to test it.

the infos about acks is from kafka document :
{code}
*request.required.acks* // using old Producer api or the version of kafka is 
less  than 0.9.x
or
*acks* // using new Producer api and kafka version more than 0.9.x

This value controls when a produce request is considered completed. 
Specifically, how many other brokers must have committed the data to their log 
and acknowledged this to the leader? Typical values are
0, which means that the producer never waits for an acknowledgement from the 
broker (the same behavior as 0.7). This option provides the lowest latency but 
the weakest durability guarantees (some data will be lost when a server fails).
1, which means that the producer gets an acknowledgement after the leader 
replica has received the data. This option provides better durability as the 
client waits until the server acknowledges the request as successful (only 
messages that were written to the now-dead leader but not yet replicated will 
be lost).
-1, which means that the producer gets an acknowledgement after all in-sync 
replicas have received the data. This option provides the best durability, we 
guarantee that no messages will be lost as long as at least one in sync replica 
remains.
{code}

[DocumentationKafka 0.8.2|http://kafka.apache.org/082/documentation.html]
[Documentation Kafka 0.9.0|http://kafka.apache.org/090/documentation.html]
FROM the link below, if you use kafka below 0.9.x, should set  
{{request.required.acks = 1}} at least.When use new Producer above 0.9.x, 
should set {{acks = 1}} at least.



was (Author: hongyuan li):
[~jojochuang] hard to write an only junit test to test it.

the infos about acks is from kafka document :
{code}
{{request.required.acks}} // using old Producer api or the version of kafka is 
less  than 0.9.x
or
{{acks}} // using new Producer api and kafka version more than 0.9.x

This value controls when a produce request is considered completed. 
Specifically, how many other brokers must have committed the data to their log 
and acknowledged this to the leader? Typical values are
0, which means that the producer never waits for an acknowledgement from the 
broker (the same behavior as 0.7). This option provides the lowest latency but 
the weakest durability guarantees (some data will be lost when a server fails).
1, which means that the producer gets an acknowledgement after the leader 
replica has received the data. This option provides better durability as the 
client waits until the server acknowledges the request as successful (only 
messages that were written to the now-dead leader but not yet replicated will 
be lost).
-1, which means that the producer gets an acknowledgement after all in-sync 
replicas have received the data. This option provides the best durability, we 
guarantee that no messages will be lost as long as at least one in sync replica 
remains.
{code}

[DocumentationKafka 0.8.2|http://kafka.apache.org/082/documentation.html]
[Documentation Kafka 0.9.0|http://kafka.apache.org/090/documentation.html]
FROM the link below, if you use kafka below 0.9.x, should set  
{{request.required.acks = 1}} at least.When use new Producer above 0.9.x, 
should set {{acks = 1}} at least.


> KafkaSink#init should set acks to 1,not 0
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0

2017-07-05 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074856#comment-16074856
 ] 

Hongyuan Li edited comment on HADOOP-14623 at 7/5/17 2:34 PM:
--

[~jojochuang] hard to write an only junit test to test it.

the infos about acks is from kafka document :
{code}
{{request.required.acks}} // using old Producer api or the version of kafka is 
less  than 0.9.x
or
{{acks}} // using new Producer api and kafka version more than 0.9.x

This value controls when a produce request is considered completed. 
Specifically, how many other brokers must have committed the data to their log 
and acknowledged this to the leader? Typical values are
0, which means that the producer never waits for an acknowledgement from the 
broker (the same behavior as 0.7). This option provides the lowest latency but 
the weakest durability guarantees (some data will be lost when a server fails).
1, which means that the producer gets an acknowledgement after the leader 
replica has received the data. This option provides better durability as the 
client waits until the server acknowledges the request as successful (only 
messages that were written to the now-dead leader but not yet replicated will 
be lost).
-1, which means that the producer gets an acknowledgement after all in-sync 
replicas have received the data. This option provides the best durability, we 
guarantee that no messages will be lost as long as at least one in sync replica 
remains.
{code}

[DocumentationKafka 0.8.2|http://kafka.apache.org/082/documentation.html]
[Documentation Kafka 0.9.0|http://kafka.apache.org/090/documentation.html]
FROM the link below, if you use kafka below 0.9.x, should set  
{{request.required.acks = 1}} at least.When use new Producer above 0.9.x, 
should set {{acks = 1}} at least.



was (Author: hongyuan li):
[~jojochuang] hard to write an only junit test to test it.

the infos about acks is from kafka document :
{code}
request.required.acks   
This value controls when a produce request is considered completed. 
Specifically, how many other brokers must have committed the data to their log 
and acknowledged this to the leader? Typical values are
0, which means that the producer never waits for an acknowledgement from the 
broker (the same behavior as 0.7). This option provides the lowest latency but 
the weakest durability guarantees (some data will be lost when a server fails).
1, which means that the producer gets an acknowledgement after the leader 
replica has received the data. This option provides better durability as the 
client waits until the server acknowledges the request as successful (only 
messages that were written to the now-dead leader but not yet replicated will 
be lost).
-1, which means that the producer gets an acknowledgement after all in-sync 
replicas have received the data. This option provides the best durability, we 
guarantee that no messages will be lost as long as at least one in sync replica 
remains.
{code}

[DocumentationKafka 0.8.2|http://kafka.apache.org/082/documentation.html]

FROM the link below, if you use kafka below 0.9.x, should set  
request.required.acks = 1 at least.When use new Producer above 0.9.x, should 
set acks = 1 at least.


> KafkaSink#init should set acks to 1,not 0
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0

2017-07-05 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074856#comment-16074856
 ] 

Hongyuan Li edited comment on HADOOP-14623 at 7/5/17 2:30 PM:
--

[~jojochuang] hard to write an only junit test to test it.

the infos about acks is from kafka document :
{code}
request.required.acks   
This value controls when a produce request is considered completed. 
Specifically, how many other brokers must have committed the data to their log 
and acknowledged this to the leader? Typical values are
0, which means that the producer never waits for an acknowledgement from the 
broker (the same behavior as 0.7). This option provides the lowest latency but 
the weakest durability guarantees (some data will be lost when a server fails).
1, which means that the producer gets an acknowledgement after the leader 
replica has received the data. This option provides better durability as the 
client waits until the server acknowledges the request as successful (only 
messages that were written to the now-dead leader but not yet replicated will 
be lost).
-1, which means that the producer gets an acknowledgement after all in-sync 
replicas have received the data. This option provides the best durability, we 
guarantee that no messages will be lost as long as at least one in sync replica 
remains.
{code}

[DocumentationKafka 0.8.2|http://kafka.apache.org/082/documentation.html]

FROM the link below, if you use kafka below 0.9.x, should set  
request.required.acks = 1 at least.When use new Producer above 0.9.x, should 
set acks = 1 at least.



was (Author: hongyuan li):
[~jojochuang] i will try to test it, but iam not sure it can be relised.

the infos about acks is from kafka document :
{code}
request.required.acks   
This value controls when a produce request is considered completed. 
Specifically, how many other brokers must have committed the data to their log 
and acknowledged this to the leader? Typical values are
0, which means that the producer never waits for an acknowledgement from the 
broker (the same behavior as 0.7). This option provides the lowest latency but 
the weakest durability guarantees (some data will be lost when a server fails).
1, which means that the producer gets an acknowledgement after the leader 
replica has received the data. This option provides better durability as the 
client waits until the server acknowledges the request as successful (only 
messages that were written to the now-dead leader but not yet replicated will 
be lost).
-1, which means that the producer gets an acknowledgement after all in-sync 
replicas have received the data. This option provides the best durability, we 
guarantee that no messages will be lost as long as at least one in sync replica 
remains.
{code}

[DocumentationKafka 0.8.2|http://kafka.apache.org/082/documentation.html]

FROM the link below, if you use kafka below 0.9.x, should set  
request.required.acks = 1 at least.When use new Producer above 0.9.x, should 
set acks = 1 at least.


> KafkaSink#init should set acks to 1,not 0
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0

2017-07-05 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074856#comment-16074856
 ] 

Hongyuan Li edited comment on HADOOP-14623 at 7/5/17 2:30 PM:
--

[~jojochuang] i will try to test it, but iam not sure it can be relised.

the infos about acks is from kafka document :
{code}
request.required.acks   
This value controls when a produce request is considered completed. 
Specifically, how many other brokers must have committed the data to their log 
and acknowledged this to the leader? Typical values are
0, which means that the producer never waits for an acknowledgement from the 
broker (the same behavior as 0.7). This option provides the lowest latency but 
the weakest durability guarantees (some data will be lost when a server fails).
1, which means that the producer gets an acknowledgement after the leader 
replica has received the data. This option provides better durability as the 
client waits until the server acknowledges the request as successful (only 
messages that were written to the now-dead leader but not yet replicated will 
be lost).
-1, which means that the producer gets an acknowledgement after all in-sync 
replicas have received the data. This option provides the best durability, we 
guarantee that no messages will be lost as long as at least one in sync replica 
remains.
{code}

[DocumentationKafka 0.8.2|http://kafka.apache.org/082/documentation.html]

FROM the link below, if you use kafka below 0.9.x, should set  
request.required.acks = 1 at least.When use new Producer above 0.9.x, should 
set acks = 1 at least.



was (Author: hongyuan li):
[~jojochuang] i will try to test it, but iam not sure it can be relised.

the infos about acks is from kafka document :
{code}
request.required.acks   
This value controls when a produce request is considered completed. 
Specifically, how many other brokers must have committed the data to their log 
and acknowledged this to the leader? Typical values are
0, which means that the producer never waits for an acknowledgement from the 
broker (the same behavior as 0.7). This option provides the lowest latency but 
the weakest durability guarantees (some data will be lost when a server fails).
1, which means that the producer gets an acknowledgement after the leader 
replica has received the data. This option provides better durability as the 
client waits until the server acknowledges the request as successful (only 
messages that were written to the now-dead leader but not yet replicated will 
be lost).
-1, which means that the producer gets an acknowledgement after all in-sync 
replicas have received the data. This option provides the best durability, we 
guarantee that no messages will be lost as long as at least one in sync replica 
remains.
{code}

[DocumentationKafka 0.8.1|http://kafka.apache.org/082/documentation.html]

FROM the link below, if you use kafka below 0.9.x, should set  
request.required.acks = 1 at least.When use new Producer above 0.9.x, should 
set acks = 1 at least.


> KafkaSink#init should set acks to 1,not 0
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0

2017-07-05 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074856#comment-16074856
 ] 

Hongyuan Li commented on HADOOP-14623:
--

[~jojochuang] i will try to test it, but iam not sure it can be relised.

the infos about acks is from kafka document :
{code}
request.required.acks   
This value controls when a produce request is considered completed. 
Specifically, how many other brokers must have committed the data to their log 
and acknowledged this to the leader? Typical values are
0, which means that the producer never waits for an acknowledgement from the 
broker (the same behavior as 0.7). This option provides the lowest latency but 
the weakest durability guarantees (some data will be lost when a server fails).
1, which means that the producer gets an acknowledgement after the leader 
replica has received the data. This option provides better durability as the 
client waits until the server acknowledges the request as successful (only 
messages that were written to the now-dead leader but not yet replicated will 
be lost).
-1, which means that the producer gets an acknowledgement after all in-sync 
replicas have received the data. This option provides the best durability, we 
guarantee that no messages will be lost as long as at least one in sync replica 
remains.
{code}

[DocumentationKafka 0.8.1|http://kafka.apache.org/082/documentation.html]

FROM the link below, if you use kafka below 0.9.x, should set  
request.required.acks = 1 at least.When use new Producer above 0.9.x, should 
set acks = 1 at least.


> KafkaSink#init should set acks to 1,not 0
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14444) New implementation of ftp and sftp filesystems

2017-07-05 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074831#comment-16074831
 ] 

Hongyuan Li commented on HADOOP-1:
--

1、if you rename "AbstractFTPFileSystem" to anything else, i would be more 
happier.This class name may make users thinking sftp is one of ftp, however, it 
isn't.
2、FTPClient#retrieStream will open a new Data connection, which is the reason 
why i dislike the seek ops.


> New implementation of ftp and sftp filesystems
> --
>
> Key: HADOOP-1
> URL: https://issues.apache.org/jira/browse/HADOOP-1
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Lukas Waldmann
>Assignee: Lukas Waldmann
> Attachments: HADOOP-1.2.patch, HADOOP-1.3.patch, 
> HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.patch
>
>
> Current implementation of FTP and SFTP filesystems have severe limitations 
> and performance issues when dealing with high number of files. Mine patch 
> solve those issues and integrate both filesystems such a way that most of the 
> core functionality is common for both and therefore simplifying the 
> maintainability.
> The core features:
> * Support for HTTP/SOCKS proxies
> * Support for passive FTP
> * Support of connection pooling - new connection is not created for every 
> single command but reused from the pool.
> For huge number of files it shows order of magnitude performance improvement 
> over not pooled connections.
> * Caching of directory trees. For ftp you always need to list whole directory 
> whenever you ask information about particular file.
> Again for huge number of files it shows order of magnitude performance 
> improvement over not cached connections.
> * Support of keep alive (NOOP) messages to avoid connection drops
> * Support for Unix style or regexp wildcard glob - useful for listing a 
> particular files across whole directory tree
> * Support for reestablishing broken ftp data transfers - can happen 
> surprisingly often



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14444) New implementation of ftp and sftp filesystems

2017-07-05 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074526#comment-16074526
 ] 

Hongyuan Li commented on HADOOP-1:
--

i have implemented it with some necessary feature.





在 "Steve Loughran (JIRA)" ，2017年7月5日 18:04写道：





    [ 
[1]https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074523#comment-16074523
 ] 

Steve Loughran commented on HADOOP-[2]1:
-

I am watching this, but not putting any effort into looking at the code right 
now. Happy that the two of you are working together to come up with something 
which addresses your needs.

# You don't need to have every feature in immediately, have one up to the level 
where it works slightly better than the current one, enough for it to be 
alongside the older version for one release, then cut the other version once 
stable (s3a, wasb, ADL, all have a one-release-to-stabilise experience).
# regarding caching, I'd go for a name like {{[3]fs.ftp.cache.host}}, with the 
host value coming last. Otherwise you get into trouble with other options in 
future if a hostname matches it.

Now, a quick scan through the latest patch



h2. Build

* all settings for things like java versions, artifact versions should be 
picked up from the base hadoop-project/[4]pom.xml ... we need to manage 
everything in one place

h2. Tests

I like the tests; these are a key part of any new feature

* Use {{GenericTestUtils}} to work with logs; there's ongoing changes there for 
better SLF4J integration & log capture. Please avoid using log4j API calls 
direct
* Add a test timeout rule to {{TestAbstractFTPFileSystem}}, name it 
{{AbstractFTPFileSystemTest}}. 
* Every test suite starting Test* should be able to be executed by 
yetus/jenkins, without any ftp server
* Everything with Test* can be started without any endpoint configured, right?
* Use {{ContractTestUtils}} to work with filesystems and assert about them 
(more diags on failure), especially for the {{assertPathExists() kind of 
assertion, which yuio can move to for things like testFileExists()}}
* and use SLF4J logging, not {{[5]System.err}}
* All assertTrue/assertFalse asserts should have a meaningful string, ideally 
even assertEquals. One trick: have the toString() value of the fs provide some 
details on the connection, so you can include it in the asserts. Another, pull 
out things like {{assertChannelConnected()}} and have the text in one place
* {{[6]TestConnectionPool.testGetChannelFromClosedFS}}. If the unexpected IOE 
is caught, make it the inner cause of the AssertionError raised. 
* Lot of duplication in the contract test createContract() calls...could that 
be shared somehow?
* Have some isolated tests for the cache





--
This message was sent by Atlassian JIRA
(v[14]6.4.14#64029)


[1] 
https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074523#comment-16074523
[2] tel:1
[3] http://fs.ftp.cache.host
[4] http://pom.xml
[5] http://System.err
[6] http://TestConnectionPool.testGetChannelFromClosedFS
[7] tel:1
[8] https://issues.apache.org/jira/browse/HADOOP-1
[9] http://HADOOP-1.2.patch
[10] http://HADOOP-1.3.patch
[11] http://HADOOP-1.4.patch
[12] http://HADOOP-1.5.patch
[13] http://HADOOP-1.patch
[14] tel:641464029


> New implementation of ftp and sftp filesystems
> --
>
> Key: HADOOP-1
> URL: https://issues.apache.org/jira/browse/HADOOP-1
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Lukas Waldmann
>Assignee: Lukas Waldmann
> Attachments: HADOOP-1.2.patch, HADOOP-1.3.patch, 
> HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.patch
>
>
> Current implementation of FTP and SFTP filesystems have severe limitations 
> and performance issues when dealing with high number of files. Mine patch 
> solve those issues and integrate both filesystems such a way that most of the 
> core functionality is common for both and therefore simplifying the 
> maintainability.
> The core features:
> * Support for HTTP/SOCKS proxies
> * Support for passive FTP
> * Support of connection pooling - new connection is not created for every 
> single command but reused from the pool.
> For huge number of files it shows order of magnitude performance improvement 
> over not pooled connections.
> * Caching of directory trees. For ftp you always need to list whole directory 
> whenever you ask information about particular file.
> Again for huge number of files it shows order of

[jira] [Commented] (HADOOP-14444) New implementation of ftp and sftp filesystems

2017-07-05 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074478#comment-16074478
 ] 

Hongyuan Li commented on HADOOP-1:
--

1) ftp is very diffrent from sftp. sftp relies on ssh protocol, which means it 
can get more accurate inf than ftp. FTP allows more connections than sftp.
2) i read your code just because i just planned to implement it when i have 
time.

> New implementation of ftp and sftp filesystems
> --
>
> Key: HADOOP-1
> URL: https://issues.apache.org/jira/browse/HADOOP-1
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Lukas Waldmann
>Assignee: Lukas Waldmann
> Attachments: HADOOP-1.2.patch, HADOOP-1.3.patch, 
> HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.patch
>
>
> Current implementation of FTP and SFTP filesystems have severe limitations 
> and performance issues when dealing with high number of files. Mine patch 
> solve those issues and integrate both filesystems such a way that most of the 
> core functionality is common for both and therefore simplifying the 
> maintainability.
> The core features:
> * Support for HTTP/SOCKS proxies
> * Support for passive FTP
> * Support of connection pooling - new connection is not created for every 
> single command but reused from the pool.
> For huge number of files it shows order of magnitude performance improvement 
> over not pooled connections.
> * Caching of directory trees. For ftp you always need to list whole directory 
> whenever you ask information about particular file.
> Again for huge number of files it shows order of magnitude performance 
> improvement over not cached connections.
> * Support of keep alive (NOOP) messages to avoid connection drops
> * Support for Unix style or regexp wildcard glob - useful for listing a 
> particular files across whole directory tree
> * Support for reestablishing broken ftp data transfers - can happen 
> surprisingly often



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0

2017-07-05 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14623:
-
Summary: KafkaSink#init should set acks to 1,not 0  (was: KafkaSink#init 
should set ack to 1)

> KafkaSink#init should set acks to 1,not 0
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14623) KafkaSink#init should set ack to 1

2017-07-05 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14623:
-
Status: Patch Available  (was: Open)

> KafkaSink#init should set ack to 1
> --
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14623) KafkaSink#init should set ack to 1

2017-07-05 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074394#comment-16074394
 ] 

Hongyuan Li edited comment on HADOOP-14623 at 7/5/17 8:07 AM:
--

ping [~ajisakaa] 、 [~jojochuang] for code review.


was (Author: hongyuan li):
ping [~ajisakaa] [~jojochuang] for code review.

> KafkaSink#init should set ack to 1
> --
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14623) KafkaSink#init should set ack to 1

2017-07-05 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074394#comment-16074394
 ] 

Hongyuan Li commented on HADOOP-14623:
--

ping [~ajisakaa] [~jojochuang] for code review.

> KafkaSink#init should set ack to 1
> --
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14623) KafkaSink#init should set ack to 1

2017-07-05 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14623:
-
Description: 
{{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has been 
written to the broker at least.

current code list below:

{code}
  
props.put("request.required.acks", "0");

{code}

  was:
{{KafkaSink}}#{{init}}  should set ack to 1 to make sure the message has been 
written to the broker at least.
current code list below:

{code}
  
props.put("request.required.acks", "0");

{code}


> KafkaSink#init should set ack to 1
> --
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14623) KafkaSink#init should set ack to 1

2017-07-05 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14623:
-
Component/s: tools
 common

> KafkaSink#init should set ack to 1
> --
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to 1 to make sure the message has been 
> written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14623) KafkaSink#init should set ack to 1

2017-07-05 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14623:
-
Affects Version/s: 3.0.0-alpha3

> KafkaSink#init should set ack to 1
> --
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to 1 to make sure the message has been 
> written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14623) KafkaSink#init should set ack to 1

2017-07-05 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14623:
-
Attachment: HADOOP-14623-001.patch

> KafkaSink#init should set ack to 1
> --
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to 1 to make sure the message has been 
> written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Assigned] (HADOOP-14623) KafkaSink#init should set ack to 1

2017-07-05 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li reassigned HADOOP-14623:


Assignee: Hongyuan Li

> KafkaSink#init should set ack to 1
> --
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
>
> {{KafkaSink}}#{{init}}  should set ack to 1 to make sure the message has been 
> written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-14623) KafkaSink#init should set ack to 1

2017-07-05 Thread Hongyuan Li (JIRA)

Hongyuan Li created HADOOP-14623:


 Summary: KafkaSink#init should set ack to 1
 Key: HADOOP-14623
 URL: https://issues.apache.org/jira/browse/HADOOP-14623
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Hongyuan Li


{{KafkaSink}}#{{init}}  should set ack to 1 to make sure the message has been 
written to the broker at least.
current code list below:

{code}
  
props.put("request.required.acks", "0");

{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14622) Test failure in TestFilterFileSystem and TestHarFileSystem

2017-07-05 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074351#comment-16074351
 ] 

Hongyuan Li commented on HADOOP-14622:
--

{{HarFileSystem}}#{{appendFile}} has been implemented in lastest code on branch 
trunk

> Test failure in TestFilterFileSystem and TestHarFileSystem
> --
>
> Key: HADOOP-14622
> URL: https://issues.apache.org/jira/browse/HADOOP-14622
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 3.0.0-alpha3
>Reporter: Jichao Zhang
>Priority: Trivial
>
> Root Cause:
> Maybe a regression issue introduced by HADOOP-14395. In HADOOP-14395, new 
> method appendFile was added into FileSystem, but didn't update related unit 
> tests in TestHarFileSystem and TestFilterFileSystem.
> Errors:
> 1. org.apache.hadoop.fs.TestHarFileSystem-output.txt
>  checkInvalidPath: har://127.0.0.1/foo.har
>   2017-07-03 13:37:08,191 ERROR fs.TestHarFileSystem 
> (TestHarFileSystem.java:testInheritedMethodsImplemented(365)) - HarFileSystem 
> MUST implement protected org.apache.hadoop.fs.FSDataOutputStreamBuilder 
> org.apache.hadoop.fs.FileSystem.appendFile(org.apache.hadoop.fs.Path)
> 2. org.apache.hadoop.fs.TestFilterFileSystem-output.txt
> 2017-07-03 13:36:18,217 ERROR fs.FileSystem 
> (TestFilterFileSystem.java:testFilterFileSystem(161)) - FilterFileSystem MUST 
> implement protected org.apache.hadoop.fs.FSDataOutputStreamBuilder 
> org.apache.hadoop.fs.FileSystem.appendFile(org.apache.hadoop.fs.Path)
> ~



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14444) New implementation of ftp and sftp filesystems

2017-07-05 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074285#comment-16074285
 ] 

Hongyuan Li commented on HADOOP-1:
--

1、seek cause the client disconnect and connect again, don't think it as a good 
idea to implment it.
2、{{AbstractFTPFileSystem}}  means Abstract base for FTP like FileSystems. 
Sorry to interrupt you, the ftp protocol is not like sftp protocol any 
little.the common between the two is that they use username and password to 
connect to the ftp/sftp server, then doing a lot of ops.Suggest to use another 
name.
3、about the passwd and user
code like below:
{{sftpFile}} is a LsEntry instance.
{code}
{
String longName = sftpFile.getLongname();
String[] splitLongName = longName.split(" ");
String user = getUserOrGroup("user", splitLongName);
String group = getUserOrGroup("group", splitLongName);
 }

  private String getUserOrGroup(String flag, String[] splitLongName) {

int count = 0;
int desPos = getPos(flag);
for (String element : splitLongName) {

  if (count == desPos && !"".equals(element)) {
return element;
  }
  if (!"".equals(element))
count++;
}
return null;
  }

  /**
   * generate the pos
   *
   * @param flag
   * @return
   */
  private int getPos(String flag) {

if ("user".equals(flag)) {
  return 2;
} else {
  return 3;
}
  }

   
{code}

4、{{SFTPChannel}}#{{close}} should close the session as well ?  
{code}
client.getSession().disconnect();
{code}

5、i don't know if i can be seen as a reviewer. I'm just interested in your 
implements,
Good job. :D 

> New implementation of ftp and sftp filesystems
> --
>
> Key: HADOOP-1
> URL: https://issues.apache.org/jira/browse/HADOOP-1
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Lukas Waldmann
>Assignee: Lukas Waldmann
> Attachments: HADOOP-1.2.patch, HADOOP-1.3.patch, 
> HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.patch
>
>
> Current implementation of FTP and SFTP filesystems have severe limitations 
> and performance issues when dealing with high number of files. Mine patch 
> solve those issues and integrate both filesystems such a way that most of the 
> core functionality is common for both and therefore simplifying the 
> maintainability.
> The core features:
> * Support for HTTP/SOCKS proxies
> * Support for passive FTP
> * Support of connection pooling - new connection is not created for every 
> single command but reused from the pool.
> For huge number of files it shows order of magnitude performance improvement 
> over not pooled connections.
> * Caching of directory trees. For ftp you always need to list whole directory 
> whenever you ask information about particular file.
> Again for huge number of files it shows order of magnitude performance 
> improvement over not cached connections.
> * Support of keep alive (NOOP) messages to avoid connection drops
> * Support for Unix style or regexp wildcard glob - useful for listing a 
> particular files across whole directory tree
> * Support for reestablishing broken ftp data transfers - can happen 
> surprisingly often



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14429) FTPFileSystem#getFsAction always returns FsAction.NONE

2017-07-04 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056767#comment-16056767
 ] 

Hongyuan Li edited comment on HADOOP-14429 at 7/4/17 11:58 AM:
---

Thanks [~yzhangal], for your review and commit. Thanks [~brahmareddy] for your 
review. I filed HADOOP-14559 as an improvement to close the ftp.


was (Author: hongyuan li):
Thanks [~yzhangal], for your review and commit. Thanks [~brahmareddy] for your 
review. I file HADOOP-14559 as an improvement to close the ftp.

> FTPFileSystem#getFsAction  always returns FsAction.NONE
> ---
>
> Key: HADOOP-14429
> URL: https://issues.apache.org/jira/browse/HADOOP-14429
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 3.0.0-alpha2
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Fix For: 2.9.0, 3.0.0-alpha4
>
> Attachments: HADOOP-14429-001.patch, HADOOP-14429-002.patch, 
> HADOOP-14429-003.patch, HADOOP-14429-004.patch, HADOOP-14429-005.patch, 
> HADOOP-14429-006.patch, HADOOP-14429-007.patch, HADOOP-14429-008.patch, 
> HADOOP-14429-009.patch
>
>
>   
> {code}
> private FsAction getFsAction(int accessGroup, FTPFile ftpFile) {
>   FsAction action = FsAction.NONE;
>   if (ftpFile.hasPermission(accessGroup, FTPFile.READ_PERMISSION)) {
>   action.or(FsAction.READ);
>   }
> if (ftpFile.hasPermission(accessGroup, FTPFile.WRITE_PERMISSION)) {
>   action.or(FsAction.WRITE);
> }
> if (ftpFile.hasPermission(accessGroup, FTPFile.EXECUTE_PERMISSION)) {
>   action.or(FsAction.EXECUTE);
> }
> return action;
>   }
> {code}
> from code above, we can see that the  getFsAction method doesnot modify the 
> action generated by FsAction action = FsAction.NONE,which means it return 
> FsAction.NONE all the time;



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14469) FTPFileSystem#listStatus get currentPath and parentPath at the same time, causing recursively list action endless

2017-07-04 Thread Hongyuan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongyuan Li updated HADOOP-14469:
-
Affects Version/s: 3.0.0-alpha2

> FTPFileSystem#listStatus get currentPath and parentPath at the same time, 
> causing recursively list action endless
> -
>
> Key: HADOOP-14469
> URL: https://issues.apache.org/jira/browse/HADOOP-14469
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs, tools/distcp
>Affects Versions: 2.6.0, 3.0.0-alpha2
> Environment: ftp build by windows7 + Serv-U_64 12.1.0.8 
> code runs any os
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
>Priority: Critical
> Attachments: HADOOP-14469-001.patch, HADOOP-14469-002.patch, 
> HADOOP-14469-003.patch, HADOOP-14469-004.patch, HADOOP-14469-005.patch, 
> HADOOP-14469-006.patch, HADOOP-14469-007.patch, HADOOP-14469-008.patch
>
>
> for some ftpsystems, liststatus method will return new Path(".") and new 
> Path(".."), thus causing list op looping.for example, Serv-U
> We can see the logic in code below:
> {code}
>   private FileStatus[] listStatus(FTPClient client, Path file)
>   throws IOException {
> ……
> FileStatus[] fileStats = new FileStatus[ftpFiles.length];
> for (int i = 0; i < ftpFiles.length; i++) {
>   fileStats[i] = getFileStatus(ftpFiles[i], absolute);
> }
> return fileStats;
>   }
> {code}
> {code}
> public void test() throws Exception{
> FTPFileSystem ftpFileSystem = new FTPFileSystem();
> ftpFileSystem.initialize(new 
> Path("ftp://test:123456@192.168.44.1/;).toUri(),
> new Configuration());
> FileStatus[] fileStatus  = ftpFileSystem.listStatus(new Path("/new"));
> for(FileStatus fileStatus1 : fileStatus)
>   System.out.println(fileStatus1);
> }
> {code}
> using test code below, the test results list below
> {code}
> FileStatus{path=ftp://test:123456@192.168.44.1/new; isDirectory=true; 
> modification_time=149671698; access_time=0; owner=user; group=group; 
> permission=-; isSymlink=false}
> FileStatus{path=ftp://test:123456@192.168.44.1/; isDirectory=true; 
> modification_time=149671698; access_time=0; owner=user; group=group; 
> permission=-; isSymlink=false}
> FileStatus{path=ftp://test:123456@192.168.44.1/new/hadoop; isDirectory=true; 
> modification_time=149671698; access_time=0; owner=user; group=group; 
> permission=-; isSymlink=false}
> FileStatus{path=ftp://test:123456@192.168.44.1/new/HADOOP-14431-002.patch; 
> isDirectory=false; length=2036; replication=1; blocksize=4096; 
> modification_time=149579778; access_time=0; owner=user; group=group; 
> permission=-; isSymlink=false}
> FileStatus{path=ftp://test:123456@192.168.44.1/new/HADOOP-14486-001.patch; 
> isDirectory=false; length=1322; replication=1; blocksize=4096; 
> modification_time=149671698; access_time=0; owner=user; group=group; 
> permission=-; isSymlink=false}
> FileStatus{path=ftp://test:123456@192.168.44.1/new/hadoop-main; 
> isDirectory=true; modification_time=149579712; access_time=0; owner=user; 
> group=group; permission=-; isSymlink=false}
> {code}
> In results above, {{FileStatus{path=ftp://test:123456@192.168.44.1/new; ……}} 
> is obviously the current Path, and  
> {{FileStatus{path=ftp://test:123456@192.168.44.1/;……}}  is obviously the 
> parent Path.
> So, if we want to walk the directory recursively, it will stuck.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

1 2 3 4 >

1 - 100 of 396 matches

Mail list logo