[jira] [Commented] (HDFS-14564) Add libhdfs APIs for readFully; add readFully to ByteBufferPositionedReadable

2019-09-19 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933541#comment-16933541
 ] 

Sahil Takiar commented on HDFS-14564:
-

Looks like the libhdfs tests are working now. All the failed JUnit tests look 
flaky, and pass when I run them locally. [~smeng], [~jojochuang] any other 
comments?

> Add libhdfs APIs for readFully; add readFully to ByteBufferPositionedReadable
> -
>
> Key: HDFS-14564
> URL: https://issues.apache.org/jira/browse/HDFS-14564
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> Splitting this out from HDFS-14478
> The {{PositionedReadable#readFully}} APIs have existed for a while, but have 
> never been exposed via libhdfs.
> HDFS-3246 added a new interface called {{ByteBufferPositionedReadable}} that 
> provides a {{ByteBuffer}} version of {{PositionedReadable}}, but it does not 
> contain a {{readFully}} method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14846) libhdfs tests are failing on trunk due to jni usage bugs

2019-09-18 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved HDFS-14846.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

> libhdfs tests are failing on trunk due to jni usage bugs
> 
>
> Key: HDFS-14846
> URL: https://issues.apache.org/jira/browse/HDFS-14846
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: 3.3.0
>
>
> While working on HDFS-14564, I noticed that the libhdfs tests are failing on 
> trunk (both on Hadoop QA and locally). I did some digging and found out that 
> the {{-Xcheck:jni}} flag is causing a bunch of crashes. I haven't been able 
> to pinpoint what caused this regression, but my best guess is that an upgrade 
> in the JDK we use in Hadoop QA started causing these failures. I looked back 
> at some old JIRAs and it looks like the tests work on Java 1.8.0_212, but 
> Hadoop QA is running 1.8.0_222 (as is my local env) (I couldn't confirm this 
> theory because I'm having trouble getting Java 1.8.0_212 installed next to 
> 1.8.0_222 on my Ubuntu machine) (even after re-winding the commit history 
> back to a known good commit where the libhdfs passed, the tests still fail, 
> so I don't think a code change caused the regressions).
> The failures are a bunch of "FATAL ERROR in native method: Bad global or 
> local ref passed to JNI" errors. After doing some debugging, it looks like 
> {{-Xcheck:jni}} now errors out if any code tries to pass a local ref to 
> {{DeleteLocalRef}} twice (previously it looked like it didn't complain) (we 
> have some checks to avoid this, but it looks like they don't work as 
> expected).
> There are a few places in the libhdfs code where this pattern causes a crash, 
> as well as one place in {{JniBasedUnixGroupsMapping}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14846) libhdfs tests are failing on trunk due to jni usage bugs

2019-09-17 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14846:

Description: 
While working on HDFS-14564, I noticed that the libhdfs tests are failing on 
trunk (both on Hadoop QA and locally). I did some digging and found out that 
the {{-Xcheck:jni}} flag is causing a bunch of crashes. I haven't been able to 
pinpoint what caused this regression, but my best guess is that an upgrade in 
the JDK we use in Hadoop QA started causing these failures. I looked back at 
some old JIRAs and it looks like the tests work on Java 1.8.0_212, but Hadoop 
QA is running 1.8.0_222 (as is my local env) (I couldn't confirm this theory 
because I'm having trouble getting Java 1.8.0_212 installed next to 1.8.0_222 
on my Ubuntu machine) (even after re-winding the commit history back to a known 
good commit where the libhdfs passed, the tests still fail, so I don't think a 
code change caused the regressions).

The failures are a bunch of "FATAL ERROR in native method: Bad global or local 
ref passed to JNI" errors. After doing some debugging, it looks like 
{{-Xcheck:jni}} now errors out if any code tries to pass a local ref to 
{{DeleteLocalRef}} twice (previously it looked like it didn't complain) (we 
have some checks to avoid this, but it looks like they don't work as expected).

There are a few places in the libhdfs code where this pattern causes a crash, 
as well as one place in {{JniBasedUnixGroupsMapping}}.

  was:
While working on HDFS-14564, I noticed that the libhdfs tests are failing on 
trunk (both on hadoop-yetus and locally). I dig some digging and found out that 
the {{-Xcheck:jni}} flag is causing a bunch of crashes. I haven't been able to 
pinpoint what caused this regression, but my best guess is that an upgrade in 
the JDK we use in hadoop-yetus started causing these failures. I looked back at 
some old JIRAs and it looks like the tests work on Java 1.8.0_212, but yetus is 
running 1.8.0_222 (as is my local env) (I couldn't confirm this theory because 
I'm having trouble getting install 1.8.0_212 next to 1.8.0_222 on my Ubuntu 
machine) (even after re-winding the commit history back to a known good commit 
where the libhdfs passed, the tests still fail, so I don't think a code change 
caused the regressions).

The failures are a bunch of "FATAL ERROR in native method: Bad global or local 
ref passed to JNI" errors. After doing some debugging, it looks like 
{{-Xcheck:jni}} now errors out if any code tries to pass a local ref to 
{{DeleteLocalRef}} twice (previously it looked like it didn't complain) (we 
have some checks to avoid this, but it looks like they don't work as expected).

There are a few places in the libhdfs code where this pattern causes a crash, 
as well as one place in {{JniBasedUnixGroupsMapping}}.


> libhdfs tests are failing on trunk due to jni usage bugs
> 
>
> Key: HDFS-14846
> URL: https://issues.apache.org/jira/browse/HDFS-14846
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While working on HDFS-14564, I noticed that the libhdfs tests are failing on 
> trunk (both on Hadoop QA and locally). I did some digging and found out that 
> the {{-Xcheck:jni}} flag is causing a bunch of crashes. I haven't been able 
> to pinpoint what caused this regression, but my best guess is that an upgrade 
> in the JDK we use in Hadoop QA started causing these failures. I looked back 
> at some old JIRAs and it looks like the tests work on Java 1.8.0_212, but 
> Hadoop QA is running 1.8.0_222 (as is my local env) (I couldn't confirm this 
> theory because I'm having trouble getting Java 1.8.0_212 installed next to 
> 1.8.0_222 on my Ubuntu machine) (even after re-winding the commit history 
> back to a known good commit where the libhdfs passed, the tests still fail, 
> so I don't think a code change caused the regressions).
> The failures are a bunch of "FATAL ERROR in native method: Bad global or 
> local ref passed to JNI" errors. After doing some debugging, it looks like 
> {{-Xcheck:jni}} now errors out if any code tries to pass a local ref to 
> {{DeleteLocalRef}} twice (previously it looked like it didn't complain) (we 
> have some checks to avoid this, but it looks like they don't work as 
> expected).
> There are a few places in the libhdfs code where this pattern causes a crash, 
> as well as one place in {{JniBasedUnixGroupsMapping}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14846) libhdfs tests are failing on trunk due to jni usage bugs

2019-09-13 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929293#comment-16929293
 ] 

Sahil Takiar commented on HDFS-14846:
-

Hadoop QA looks good. [~jojochuang], [~smeng] could you take a look?

> libhdfs tests are failing on trunk due to jni usage bugs
> 
>
> Key: HDFS-14846
> URL: https://issues.apache.org/jira/browse/HDFS-14846
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While working on HDFS-14564, I noticed that the libhdfs tests are failing on 
> trunk (both on hadoop-yetus and locally). I dig some digging and found out 
> that the {{-Xcheck:jni}} flag is causing a bunch of crashes. I haven't been 
> able to pinpoint what caused this regression, but my best guess is that an 
> upgrade in the JDK we use in hadoop-yetus started causing these failures. I 
> looked back at some old JIRAs and it looks like the tests work on Java 
> 1.8.0_212, but yetus is running 1.8.0_222 (as is my local env) (I couldn't 
> confirm this theory because I'm having trouble getting install 1.8.0_212 next 
> to 1.8.0_222 on my Ubuntu machine) (even after re-winding the commit history 
> back to a known good commit where the libhdfs passed, the tests still fail, 
> so I don't think a code change caused the regressions).
> The failures are a bunch of "FATAL ERROR in native method: Bad global or 
> local ref passed to JNI" errors. After doing some debugging, it looks like 
> {{-Xcheck:jni}} now errors out if any code tries to pass a local ref to 
> {{DeleteLocalRef}} twice (previously it looked like it didn't complain) (we 
> have some checks to avoid this, but it looks like they don't work as 
> expected).
> There are a few places in the libhdfs code where this pattern causes a crash, 
> as well as one place in {{JniBasedUnixGroupsMapping}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14846) libhdfs tests are failing on trunk due to jni usage bugs

2019-09-12 Thread Sahil Takiar (Jira)
Sahil Takiar created HDFS-14846:
---

 Summary: libhdfs tests are failing on trunk due to jni usage bugs
 Key: HDFS-14846
 URL: https://issues.apache.org/jira/browse/HDFS-14846
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs, native
Reporter: Sahil Takiar
Assignee: Sahil Takiar


While working on HDFS-14564, I noticed that the libhdfs tests are failing on 
trunk (both on hadoop-yetus and locally). I dig some digging and found out that 
the {{-Xcheck:jni}} flag is causing a bunch of crashes. I haven't been able to 
pinpoint what caused this regression, but my best guess is that an upgrade in 
the JDK we use in hadoop-yetus started causing these failures. I looked back at 
some old JIRAs and it looks like the tests work on Java 1.8.0_212, but yetus is 
running 1.8.0_222 (as is my local env) (I couldn't confirm this theory because 
I'm having trouble getting install 1.8.0_212 next to 1.8.0_222 on my Ubuntu 
machine) (even after re-winding the commit history back to a known good commit 
where the libhdfs passed, the tests still fail, so I don't think a code change 
caused the regressions).

The failures are a bunch of "FATAL ERROR in native method: Bad global or local 
ref passed to JNI" errors. After doing some debugging, it looks like 
{{-Xcheck:jni}} now errors out if any code tries to pass a local ref to 
{{DeleteLocalRef}} twice (previously it looked like it didn't complain) (we 
have some checks to avoid this, but it looks like they don't work as expected).

There are a few places in the libhdfs code where this pattern causes a crash, 
as well as one place in {{JniBasedUnixGroupsMapping}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13984) getFileInfo of libhdfs call NameNode#getFileStatus twice

2019-06-24 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871888#comment-16871888
 ] 

Sahil Takiar commented on HDFS-13984:
-

Yeah, this makes sense to me.

[~yangjiandan]do you have plans to continue working on this?

A few comments:
 * Use {{javaObjectIsOfClass}} in {{jni_helper.h}}
 * The {{FileNotFoundException}} needs to be freed before returning null

> getFileInfo of libhdfs call NameNode#getFileStatus twice
> 
>
> Key: HDFS-13984
> URL: https://issues.apache.org/jira/browse/HDFS-13984
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Major
> Attachments: HDFS-13984.001.patch, HDFS-13984.002.patch, 
> HDFS-13984.003.patch
>
>
> getFileInfo in hdfs.c calls *FileSystem#exists* first, then calls 
> *FileSystem#getFileStatus*.  
> *FileSystem#exists* also call *FileSystem#getFileStatus*, just as follows:
> {code:java}
>   public boolean exists(Path f) throws IOException {
> try {
>   return getFileStatus(f) != null;
> } catch (FileNotFoundException e) {
>   return false;
> }
>   }
> {code}
> and finally this leads to call NameNodeRpcServer#getFileInfo twice.
> Actually we can implement by calling once.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14564) Add libhdfs APIs for readFully; add readFully to ByteBufferPositionedReadable

2019-06-17 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865707#comment-16865707
 ] 

Sahil Takiar commented on HDFS-14564:
-

{{test_libhdfs_threaded_hdfspp_test_shim_static}} is a hdfs++ test, which 
should be unrelated to this feature. Will re-trigger Hadoop QA.

> Add libhdfs APIs for readFully; add readFully to ByteBufferPositionedReadable
> -
>
> Key: HDFS-14564
> URL: https://issues.apache.org/jira/browse/HDFS-14564
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> Splitting this out from HDFS-14478
> The {{PositionedReadable#readFully}} APIs have existed for a while, but have 
> never been exposed via libhdfs.
> HDFS-3246 added a new interface called {{ByteBufferPositionedReadable}} that 
> provides a {{ByteBuffer}} version of {{PositionedReadable}}, but it does not 
> contain a {{readFully}} method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14564) Add libhdfs APIs for readFully; add readFully to ByteBufferPositionedReadable

2019-06-14 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864460#comment-16864460
 ] 

Sahil Takiar commented on HDFS-14564:
-

[~smeng] addressed the checkstyle issues. Ran the failed unit tests locally and 
they pass.

> Add libhdfs APIs for readFully; add readFully to ByteBufferPositionedReadable
> -
>
> Key: HDFS-14564
> URL: https://issues.apache.org/jira/browse/HDFS-14564
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> Splitting this out from HDFS-14478
> The {{PositionedReadable#readFully}} APIs have existed for a while, but have 
> never been exposed via libhdfs.
> HDFS-3246 added a new interface called {{ByteBufferPositionedReadable}} that 
> provides a {{ByteBuffer}} version of {{PositionedReadable}}, but it does not 
> contain a {{readFully}} method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14564) Add libhdfs APIs for readFully; add readFully to ByteBufferPositionedReadable

2019-06-13 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14564:

Summary: Add libhdfs APIs for readFully; add readFully to 
ByteBufferPositionedReadable  (was: Add libhdfs APIs for readFully; add 
readFully to ByteByfferPositionedReadable)

> Add libhdfs APIs for readFully; add readFully to ByteBufferPositionedReadable
> -
>
> Key: HDFS-14564
> URL: https://issues.apache.org/jira/browse/HDFS-14564
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> Splitting this out from HDFS-14478
> The {{PositionedReadable#readFully}} APIs have existed for a while, but have 
> never been exposed via libhdfs.
> HDFS-3246 added a new interface called {{ByteBufferPositionedReadable}} that 
> provides a {{ByteBuffer}} version of {{PositionedReadable}}, but it does not 
> contain a {{readFully}} method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14478) Add libhdfs APIs for openFile

2019-06-12 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14478:

Description: 
HADOOP-15229 added a "FileSystem builder-based openFile() API" that allows 
specifying configuration values for opening files (similar to HADOOP-14365).

Support for {{openFile}} will be a little tricky as it is asynchronous and 
{{FutureDataInputStreamBuilder#build}} returns a {{CompletableFuture}}.

At a high level, the API for {{openFile}} could look something like this:
{code:java}
hdfsFile hdfsOpenFile(hdfsFS fs, const char* path, int flags,
  int bufferSize, short replication, tSize blocksize);

hdfsOpenFileBuilder *hdfsOpenFileBuilderAlloc(hdfsFS fs,
const char *path);

hdfsOpenFileBuilder *hdfsOpenFileBuilderMust(hdfsOpenFileBuilder *builder,
const char *key, const char *value);

hdfsOpenFileBuilder *hdfsOpenFileBuilderOpt(hdfsOpenFileBuilder *builder,
const char *key, const char *value);

hdfsOpenFileFuture *hdfsOpenFileBuilderBuild(hdfsOpenFileBuilder *builder);

void hdfsOpenFileBuilderFree(hdfsOpenFileBuilder *builder);

hdfsFile hdfsOpenFileFutureGet(hdfsOpenFileFuture *future);

hdfsFile hdfsOpenFileFutureGetWithTimeout(hdfsOpenFileFuture *future,
int64_t timeout, javaConcurrentTimeUnit timeUnit);

int hdfsOpenFileFutureCancel(hdfsOpenFileFuture *future,
int mayInterruptIfRunning);

void hdfsOpenFileFutureFree(hdfsOpenFileFuture *future);

{code}
Instead of exposing all the functionality of {{CompleteableFuture}} libhdfs 
would just expose the functionality of {{Future}}.

  was:
HADOOP-15229 added a "FileSystem builder-based openFile() API" that allows 
specifying configuration values for opening files (similar to HADOOP-14365).

 

Support for {{openFile}} will be a little tricky as it is asynchronous and 
{{FutureDataInputStreamBuilder#build}} returns a {{CompletableFuture}}.

At a high level, the API for {{openFile}} could look something like this:
{code:java}
LIBHDFS_EXTERNAL
struct hdfsOpenFileBuilder *hdfsOpenFileBuilderAlloc(hdfsFS fs,
const char *path, int flags);

LIBHDFS_EXTERNAL
void hdfsOpenFileBuilderFree(struct hdfsOpenFileBuilder *bld);

LIBHDFS_EXTERNAL
struct hdfsOpenFileBuilder *hdfsOpenFileBuilderMust(hdfsFS fs,
const char *key, const char *value);

LIBHDFS_EXTERNAL
struct hdfsOpenFileBuilder *hdfsOpenFileBuilderOpt(hdfsFS fs,
const char *key, const char *value);

LIBHDFS_EXTERNAL
hdfsFileFuture *hdfsOpenFileBuilderBuild(struct hdfsOpenFileBuilder *bld);

LIBHDFS_EXTERNAL
void hdfsOpenFileFutureFree(struct hdfsFileFuture *future);

LIBHDFS_EXTERNAL
hdfsFile hdfsOpenFileFutureGet(struct hdfsFileFuture *future);

LIBHDFS_EXTERNAL
hdfsFile hdfsOpenFileFutureGet(struct hdfsFileFuture *future, int64_t
timeout, const char* timeUnit);

LIBHDFS_EXTERNAL
void hdfsOpenFileFutureCancel(struct hdfsFileFuture *future,
bool mayInterruptIfRunning);
{code}
Instead of exposing all the functionality of {{CompleteableFuture}} libhdfs 
would just expose the functionality of {{Future}}.


> Add libhdfs APIs for openFile
> -
>
> Key: HDFS-14478
> URL: https://issues.apache.org/jira/browse/HDFS-14478
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> HADOOP-15229 added a "FileSystem builder-based openFile() API" that allows 
> specifying configuration values for opening files (similar to HADOOP-14365).
> Support for {{openFile}} will be a little tricky as it is asynchronous and 
> {{FutureDataInputStreamBuilder#build}} returns a {{CompletableFuture}}.
> At a high level, the API for {{openFile}} could look something like this:
> {code:java}
> hdfsFile hdfsOpenFile(hdfsFS fs, const char* path, int flags,
>   int bufferSize, short replication, tSize blocksize);
> hdfsOpenFileBuilder *hdfsOpenFileBuilderAlloc(hdfsFS fs,
> const char *path);
> hdfsOpenFileBuilder *hdfsOpenFileBuilderMust(hdfsOpenFileBuilder *builder,
> const char *key, const char *value);
> hdfsOpenFileBuilder *hdfsOpenFileBuilderOpt(hdfsOpenFileBuilder *builder,
> const char *key, const char *value);
> hdfsOpenFileFuture *hdfsOpenFileBuilderBuild(hdfsOpenFileBuilder *builder);
> void hdfsOpenFileBuilderFree(hdfsOpenFileBuilder *builder);
> hdfsFile hdfsOpenFileFutureGet(hdfsOpenFileFuture *future);
> hdfsFile hdfsOpenFileFutureGetWithTimeout(hdfsOpenFileFuture *future,
> int64_t timeout, javaConcurrentTimeUnit timeUnit);
> int hdfsOpenFileFutureCancel(hdfsOpenFileFuture *future,
> int mayInterruptIfRunning);
> void 

[jira] [Updated] (HDFS-14478) Add libhdfs APIs for openFile

2019-06-12 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14478:

Description: 
HADOOP-15229 added a "FileSystem builder-based openFile() API" that allows 
specifying configuration values for opening files (similar to HADOOP-14365).

 

Support for {{openFile}} will be a little tricky as it is asynchronous and 
{{FutureDataInputStreamBuilder#build}} returns a {{CompletableFuture}}.

At a high level, the API for {{openFile}} could look something like this:
{code:java}
LIBHDFS_EXTERNAL
struct hdfsOpenFileBuilder *hdfsOpenFileBuilderAlloc(hdfsFS fs,
const char *path, int flags);

LIBHDFS_EXTERNAL
void hdfsOpenFileBuilderFree(struct hdfsOpenFileBuilder *bld);

LIBHDFS_EXTERNAL
struct hdfsOpenFileBuilder *hdfsOpenFileBuilderMust(hdfsFS fs,
const char *key, const char *value);

LIBHDFS_EXTERNAL
struct hdfsOpenFileBuilder *hdfsOpenFileBuilderOpt(hdfsFS fs,
const char *key, const char *value);

LIBHDFS_EXTERNAL
hdfsFileFuture *hdfsOpenFileBuilderBuild(struct hdfsOpenFileBuilder *bld);

LIBHDFS_EXTERNAL
void hdfsOpenFileFutureFree(struct hdfsFileFuture *future);

LIBHDFS_EXTERNAL
hdfsFile hdfsOpenFileFutureGet(struct hdfsFileFuture *future);

LIBHDFS_EXTERNAL
hdfsFile hdfsOpenFileFutureGet(struct hdfsFileFuture *future, int64_t
timeout, const char* timeUnit);

LIBHDFS_EXTERNAL
void hdfsOpenFileFutureCancel(struct hdfsFileFuture *future,
bool mayInterruptIfRunning);
{code}
Instead of exposing all the functionality of {{CompleteableFuture}} libhdfs 
would just expose the functionality of {{Future}}.

  was:
HADOOP-15229 added a "FileSystem builder-based openFile() API" that allows 
specifying configuration values for opening files (similar to HADOOP-14365).

The {{PositionedReadable#readFully}} APIs have always existed.

Both of these APIs should be exposed by libhdfs.

Adding support for {{readFully}} should be straight-forward (a new libhdfs API 
called {{hdfsPreadFully}} whose implementation is similar to {{hdfsPread}}).

Support for {{openFile}} will be a little tricker as it is asynchronous and 
{{FutureDataInputStreamBuilder#build}} returns a {{CompletableFuture}}.

At a high level, the API for {{openFile}} could look something like this:
{code:java}
LIBHDFS_EXTERNAL
struct hdfsOpenFileBuilder *hdfsOpenFileBuilderAlloc(hdfsFS fs,
const char *path, int flags);

LIBHDFS_EXTERNAL
void hdfsOpenFileBuilderFree(struct hdfsOpenFileBuilder *bld);

LIBHDFS_EXTERNAL
struct hdfsOpenFileBuilder *hdfsOpenFileBuilderMust(hdfsFS fs,
const char *key, const char *value);

LIBHDFS_EXTERNAL
struct hdfsOpenFileBuilder *hdfsOpenFileBuilderOpt(hdfsFS fs,
const char *key, const char *value);

LIBHDFS_EXTERNAL
hdfsFileFuture *hdfsOpenFileBuilderBuild(struct hdfsOpenFileBuilder *bld);

LIBHDFS_EXTERNAL
void hdfsOpenFileFutureFree(struct hdfsFileFuture *future);

LIBHDFS_EXTERNAL
hdfsFile hdfsOpenFileFutureGet(struct hdfsFileFuture *future);

LIBHDFS_EXTERNAL
hdfsFile hdfsOpenFileFutureGet(struct hdfsFileFuture *future, int64_t
timeout, const char* timeUnit);

LIBHDFS_EXTERNAL
void hdfsOpenFileFutureCancel(struct hdfsFileFuture *future,
bool mayInterruptIfRunning);
{code}
Instead of exposing all the functionality of {{CompleteableFuture}} libhdfs 
would just expose the functionality of {{Future}}.


> Add libhdfs APIs for openFile
> -
>
> Key: HDFS-14478
> URL: https://issues.apache.org/jira/browse/HDFS-14478
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> HADOOP-15229 added a "FileSystem builder-based openFile() API" that allows 
> specifying configuration values for opening files (similar to HADOOP-14365).
>  
> Support for {{openFile}} will be a little tricky as it is asynchronous and 
> {{FutureDataInputStreamBuilder#build}} returns a {{CompletableFuture}}.
> At a high level, the API for {{openFile}} could look something like this:
> {code:java}
> LIBHDFS_EXTERNAL
> struct hdfsOpenFileBuilder *hdfsOpenFileBuilderAlloc(hdfsFS fs,
> const char *path, int flags);
> LIBHDFS_EXTERNAL
> void hdfsOpenFileBuilderFree(struct hdfsOpenFileBuilder *bld);
> LIBHDFS_EXTERNAL
> struct hdfsOpenFileBuilder *hdfsOpenFileBuilderMust(hdfsFS fs,
> const char *key, const char *value);
> LIBHDFS_EXTERNAL
> struct hdfsOpenFileBuilder *hdfsOpenFileBuilderOpt(hdfsFS fs,
> const char *key, const char *value);
> LIBHDFS_EXTERNAL
> hdfsFileFuture 

[jira] [Created] (HDFS-14564) Add libhdfs APIs for readFully; add readFully to ByteByfferPositionedReadable

2019-06-12 Thread Sahil Takiar (JIRA)
Sahil Takiar created HDFS-14564:
---

 Summary: Add libhdfs APIs for readFully; add readFully to 
ByteByfferPositionedReadable
 Key: HDFS-14564
 URL: https://issues.apache.org/jira/browse/HDFS-14564
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client, libhdfs, native
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Splitting this out from HDFS-14478

The {{PositionedReadable#readFully}} APIs have existed for a while, but have 
never been exposed via libhdfs.

HDFS-3246 added a new interface called {{ByteBufferPositionedReadable}} that 
provides a {{ByteBuffer}} version of {{PositionedReadable}}, but it does not 
contain a {{readFully}} method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14478) Add libhdfs APIs for openFile

2019-06-12 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14478:

Summary: Add libhdfs APIs for openFile  (was: Add libhdfs APIs for 
readFully and openFile)

> Add libhdfs APIs for openFile
> -
>
> Key: HDFS-14478
> URL: https://issues.apache.org/jira/browse/HDFS-14478
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> HADOOP-15229 added a "FileSystem builder-based openFile() API" that allows 
> specifying configuration values for opening files (similar to HADOOP-14365).
> The {{PositionedReadable#readFully}} APIs have always existed.
> Both of these APIs should be exposed by libhdfs.
> Adding support for {{readFully}} should be straight-forward (a new libhdfs 
> API called {{hdfsPreadFully}} whose implementation is similar to 
> {{hdfsPread}}).
> Support for {{openFile}} will be a little tricker as it is asynchronous and 
> {{FutureDataInputStreamBuilder#build}} returns a {{CompletableFuture}}.
> At a high level, the API for {{openFile}} could look something like this:
> {code:java}
> LIBHDFS_EXTERNAL
> struct hdfsOpenFileBuilder *hdfsOpenFileBuilderAlloc(hdfsFS fs,
> const char *path, int flags);
> LIBHDFS_EXTERNAL
> void hdfsOpenFileBuilderFree(struct hdfsOpenFileBuilder *bld);
> LIBHDFS_EXTERNAL
> struct hdfsOpenFileBuilder *hdfsOpenFileBuilderMust(hdfsFS fs,
> const char *key, const char *value);
> LIBHDFS_EXTERNAL
> struct hdfsOpenFileBuilder *hdfsOpenFileBuilderOpt(hdfsFS fs,
> const char *key, const char *value);
> LIBHDFS_EXTERNAL
> hdfsFileFuture *hdfsOpenFileBuilderBuild(struct hdfsOpenFileBuilder *bld);
> LIBHDFS_EXTERNAL
> void hdfsOpenFileFutureFree(struct hdfsFileFuture *future);
> LIBHDFS_EXTERNAL
> hdfsFile hdfsOpenFileFutureGet(struct hdfsFileFuture *future);
> LIBHDFS_EXTERNAL
> hdfsFile hdfsOpenFileFutureGet(struct hdfsFileFuture *future, int64_t
> timeout, const char* timeUnit);
> LIBHDFS_EXTERNAL
> void hdfsOpenFileFutureCancel(struct hdfsFileFuture *future,
> bool mayInterruptIfRunning);
> {code}
> Instead of exposing all the functionality of {{CompleteableFuture}} libhdfs 
> would just expose the functionality of {{Future}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14482) Crash when using libhdfs with bad classpath

2019-05-13 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838831#comment-16838831
 ] 

Sahil Takiar commented on HDFS-14482:
-

[~tlipcon] thanks for catching this. Opened a PR: 
https://github.com/apache/hadoop/pull/816

> Crash when using libhdfs with bad classpath
> ---
>
> Key: HDFS-14482
> URL: https://issues.apache.org/jira/browse/HDFS-14482
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Todd Lipcon
>Assignee: Sahil Takiar
>Priority: Major
>
> HDFS-14304 added a call to initCachedClasses in getJNIEnv after creating the 
> env but before checking whether it's null. In the case that getJNIEnv() fails 
> to create an env, it returns NULL, and then we crash when calling 
> initCachedClasses() on line 555
> {code}
> 551 state->env = getGlobalJNIEnv();
> 552 mutexUnlock();
> 553 
> 554 jthrowable jthr = NULL;
> 555 jthr = initCachedClasses(state->env);
> 556 if (jthr) {
> 557   printExceptionAndFree(state->env, jthr, PRINT_EXC_ALL,
> 558 "initCachedClasses failed");
> 559   goto fail;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-3246) pRead equivalent for direct read path

2019-05-08 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835622#comment-16835622
 ] 

Sahil Takiar commented on HDFS-3246:


No objections on my end.

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, 
> HDFS-3246.006.patch, HDFS-3246.007.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14478) Add libhdfs APIs for readFully and openFile

2019-05-07 Thread Sahil Takiar (JIRA)
Sahil Takiar created HDFS-14478:
---

 Summary: Add libhdfs APIs for readFully and openFile
 Key: HDFS-14478
 URL: https://issues.apache.org/jira/browse/HDFS-14478
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client, libhdfs, native
Reporter: Sahil Takiar
Assignee: Sahil Takiar


HADOOP-15229 added a "FileSystem builder-based openFile() API" that allows 
specifying configuration values for opening files (similar to HADOOP-14365).

The {{PositionedReadable#readFully}} APIs have always existed.

Both of these APIs should be exposed by libhdfs.

Adding support for {{readFully}} should be straight-forward (a new libhdfs API 
called {{hdfsPreadFully}} whose implementation is similar to {{hdfsPread}}).

Support for {{openFile}} will be a little tricker as it is asynchronous and 
{{FutureDataInputStreamBuilder#build}} returns a {{CompletableFuture}}.

At a high level, the API for {{openFile}} could look something like this:
{code:java}
LIBHDFS_EXTERNAL
struct hdfsOpenFileBuilder *hdfsOpenFileBuilderAlloc(hdfsFS fs,
const char *path, int flags);

LIBHDFS_EXTERNAL
void hdfsOpenFileBuilderFree(struct hdfsOpenFileBuilder *bld);

LIBHDFS_EXTERNAL
struct hdfsOpenFileBuilder *hdfsOpenFileBuilderMust(hdfsFS fs,
const char *key, const char *value);

LIBHDFS_EXTERNAL
struct hdfsOpenFileBuilder *hdfsOpenFileBuilderOpt(hdfsFS fs,
const char *key, const char *value);

LIBHDFS_EXTERNAL
hdfsFileFuture *hdfsOpenFileBuilderBuild(struct hdfsOpenFileBuilder *bld);

LIBHDFS_EXTERNAL
void hdfsOpenFileFutureFree(struct hdfsFileFuture *future);

LIBHDFS_EXTERNAL
hdfsFile hdfsOpenFileFutureGet(struct hdfsFileFuture *future);

LIBHDFS_EXTERNAL
hdfsFile hdfsOpenFileFutureGet(struct hdfsFileFuture *future, int64_t
timeout, const char* timeUnit);

LIBHDFS_EXTERNAL
void hdfsOpenFileFutureCancel(struct hdfsFileFuture *future,
bool mayInterruptIfRunning);
{code}
Instead of exposing all the functionality of {{CompleteableFuture}} libhdfs 
would just expose the functionality of {{Future}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path

2019-04-30 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-3246:
---
   Resolution: Fixed
Fix Version/s: 3.3.0
   Status: Resolved  (was: Patch Available)

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, 
> HDFS-3246.006.patch, HDFS-3246.007.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14417) CryptoInputStream decrypt methods can avoid usage of intermediate buffers

2019-04-05 Thread Sahil Takiar (JIRA)
Sahil Takiar created HDFS-14417:
---

 Summary: CryptoInputStream decrypt methods can avoid usage of 
intermediate buffers
 Key: HDFS-14417
 URL: https://issues.apache.org/jira/browse/HDFS-14417
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Sahil Takiar


Filing this as a follow up to 
[this|https://github.com/apache/hadoop/pull/597#discussion_r269399800] review 
comment on HDFS-3246. In {{CryptoInputStream}} the {{decrypt}} methods rely on 
temporary buffers to help decrypt the given encrypted data. The {{decrypt}} 
methods work by copying the input data chunk by chunk into an "input" buffer 
and then passing the "input" buffer and an "output" buffer to a decryption 
method that reads from the "input" buffer and writes to the "output" buffer. 
The contents of the "output" buffer are then copied back into the user buffer. 
Then the method moves onto the next chunk of the user buffer.

Instead of copying all the data between these buffers, we should be able to 
decrypt them in-place - e.g. the input and output buffer are the same. As the 
comment points out, OpenSSL supports this.

At the very least, we should be able to remove the usage of the "output" buffer 
and just pass the user buffer directly to the decryption classes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14394) Add -std=c99 / -std=gnu99 to libhdfs compile flags

2019-04-02 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808184#comment-16808184
 ] 

Sahil Takiar commented on HDFS-14394:
-

Thanks for the input Todd. One last thing I forgot to mention. Hadoop QA didn't 
run the libhdfs tests for whatever reason. I ran then manually against this 
patch and they all passed. For anyone else having trouble getting the tests to 
reliably work on Linux, I was only able to get them to work properly while 
inside the Hadoop Docker image (run {{./start-build-env.sh}}).

> Add -std=c99 / -std=gnu99 to libhdfs compile flags
> --
>
> Key: HDFS-14394
> URL: https://issues.apache.org/jira/browse/HDFS-14394
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-14394.001.patch
>
>
> libhdfs compilation currently does not enforce a minimum required C version. 
> As of today, the libhdfs build on Hadoop QA works, but when built on a 
> machine with an outdated gcc / cc version where C89 is the default, 
> compilation fails due to errors such as:
> {code}
> /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5:
>  error: ‘for’ loop initial declarations are only allowed in C99 mode
> for (int i = 0; i < numCachedClasses; i++) {
> ^
> /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5:
>  note: use option -std=c99 or -std=gnu99 to compile your code
> {code}
> We should add the -std=c99 / -std=gnu99 flags to libhdfs compilation so that 
> we can enforce C99 as the minimum required version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14394) Add -std=c99 / -std=gnu99 to libhdfs compile flags

2019-04-02 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808067#comment-16808067
 ] 

Sahil Takiar commented on HDFS-14394:
-

I can add the {{-fextended-identifiers}} flag without any issues. I added 
{{-pedantic-errors}} and there are ton of warnings that have now become errors 
(including errors from unit tests and third party libraries). As you pointed 
out, I don't see a good way of applying {{CMAKE_C_FLAGS}} to specific projects, 
but if someone has a smart way of doing so I'm open to at least fixing the 
errors in libhdfs.

> Add -std=c99 / -std=gnu99 to libhdfs compile flags
> --
>
> Key: HDFS-14394
> URL: https://issues.apache.org/jira/browse/HDFS-14394
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-14394.001.patch
>
>
> libhdfs compilation currently does not enforce a minimum required C version. 
> As of today, the libhdfs build on Hadoop QA works, but when built on a 
> machine with an outdated gcc / cc version where C89 is the default, 
> compilation fails due to errors such as:
> {code}
> /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5:
>  error: ‘for’ loop initial declarations are only allowed in C99 mode
> for (int i = 0; i < numCachedClasses; i++) {
> ^
> /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5:
>  note: use option -std=c99 or -std=gnu99 to compile your code
> {code}
> We should add the -std=c99 / -std=gnu99 flags to libhdfs compilation so that 
> we can enforce C99 as the minimum required version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14394) Add -std=c99 / -std=gnu99 to libhdfs compile flags

2019-04-01 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807365#comment-16807365
 ] 

Sahil Takiar commented on HDFS-14394:
-

[~eyang] do those tests pass locally on trunk without the patch posted here? 
I've seen some errors like that before and it usually is environmental. Hadoop 
QA was able to build the patch, but I don't think it ran the libhdfs++ tests.

I'm not sure I understand that link, but I don't think adding {{-std=c99}} to 
{{CMAKE_C_FLAGS}} forces all libhdfs++ code to be compiled using C99. My 
understanding is that {{CMAKE_C_FLAGS}} is for compiling C files and 
{{CMAKE_CXX_FLAGS}} is for compiling C++ files. The libhdfs++ CMake file 
already includes {{-std=c++11}}. There are a few of C files in libhdfs++, but 
most seem to be testing related.

> Add -std=c99 / -std=gnu99 to libhdfs compile flags
> --
>
> Key: HDFS-14394
> URL: https://issues.apache.org/jira/browse/HDFS-14394
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-14394.001.patch
>
>
> libhdfs compilation currently does not enforce a minimum required C version. 
> As of today, the libhdfs build on Hadoop QA works, but when built on a 
> machine with an outdated gcc / cc version where C89 is the default, 
> compilation fails due to errors such as:
> {code}
> /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5:
>  error: ‘for’ loop initial declarations are only allowed in C99 mode
> for (int i = 0; i < numCachedClasses; i++) {
> ^
> /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5:
>  note: use option -std=c99 or -std=gnu99 to compile your code
> {code}
> We should add the -std=c99 / -std=gnu99 flags to libhdfs compilation so that 
> we can enforce C99 as the minimum required version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path

2019-04-01 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-3246:
---
Status: Patch Available  (was: Open)

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, 
> HDFS-3246.006.patch, HDFS-3246.007.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path

2019-04-01 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-3246:
---
Status: Open  (was: Patch Available)

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, 
> HDFS-3246.006.patch, HDFS-3246.007.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14394) Add -std=c99 / -std=gnu99 to libhdfs compile flags

2019-04-01 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807272#comment-16807272
 ] 

Sahil Takiar commented on HDFS-14394:
-

[~eyang] what was the compilation error that you hit? I double checked and I'm 
able to compile the patch locally.

Is there a specific reason we don't want the {{-std=c99}} flag in libhdfs++? 
Would moving the C_FLAGS setting to {{src/main/native/libhdfs/CMakeLists.txt}} 
fix this?

I'm not sure what the policy is on using {{-std=c99}} or {{-std=gnu99}}, 
although we were already using {{gnu99}} for Solaris builds. I don't think 
there is a specific reason we need {{gnu99}} over {{c99}} though.

> Add -std=c99 / -std=gnu99 to libhdfs compile flags
> --
>
> Key: HDFS-14394
> URL: https://issues.apache.org/jira/browse/HDFS-14394
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-14394.001.patch
>
>
> libhdfs compilation currently does not enforce a minimum required C version. 
> As of today, the libhdfs build on Hadoop QA works, but when built on a 
> machine with an outdated gcc / cc version where C89 is the default, 
> compilation fails due to errors such as:
> {code}
> /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5:
>  error: ‘for’ loop initial declarations are only allowed in C99 mode
> for (int i = 0; i < numCachedClasses; i++) {
> ^
> /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5:
>  note: use option -std=c99 or -std=gnu99 to compile your code
> {code}
> We should add the -std=c99 / -std=gnu99 flags to libhdfs compilation so that 
> we can enforce C99 as the minimum required version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14394) Add -std=c99 / -std=gnu99 to libhdfs compile flags

2019-04-01 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807214#comment-16807214
 ] 

Sahil Takiar commented on HDFS-14394:
-

Thanks for the input [~jojochuang], [~tlipcon]. Posted a patch that adds 
{{set(CMAKE_C_FLAGS "-std=gnu99 ${CMAKE_C_FLAGS}")}} to {{HadoopCommon.cmake}}. 
It looks like we already do this for Solaris builds.

[~eyang] not entirely sure if I follow your reasoning for splitting 
{{hadoop-hdfs-native-client}} into several sub-projects, could you expand a bit 
more? Are you referring to 
[this|https://github.com/cmake-maven-project/cmake-maven-project] Maven plugin? 
It certainly looks interesting. However, both of these changes look like 
relatively large projects that are probably out of the scope of what this JIRA 
is trying to address.

> Add -std=c99 / -std=gnu99 to libhdfs compile flags
> --
>
> Key: HDFS-14394
> URL: https://issues.apache.org/jira/browse/HDFS-14394
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-14394.001.patch
>
>
> libhdfs compilation currently does not enforce a minimum required C version. 
> As of today, the libhdfs build on Hadoop QA works, but when built on a 
> machine with an outdated gcc / cc version where C89 is the default, 
> compilation fails due to errors such as:
> {code}
> /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5:
>  error: ‘for’ loop initial declarations are only allowed in C99 mode
> for (int i = 0; i < numCachedClasses; i++) {
> ^
> /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5:
>  note: use option -std=c99 or -std=gnu99 to compile your code
> {code}
> We should add the -std=c99 / -std=gnu99 flags to libhdfs compilation so that 
> we can enforce C99 as the minimum required version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14394) Add -std=c99 / -std=gnu99 to libhdfs compile flags

2019-04-01 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14394:

Attachment: HDFS-14394.001.patch

> Add -std=c99 / -std=gnu99 to libhdfs compile flags
> --
>
> Key: HDFS-14394
> URL: https://issues.apache.org/jira/browse/HDFS-14394
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-14394.001.patch
>
>
> libhdfs compilation currently does not enforce a minimum required C version. 
> As of today, the libhdfs build on Hadoop QA works, but when built on a 
> machine with an outdated gcc / cc version where C89 is the default, 
> compilation fails due to errors such as:
> {code}
> /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5:
>  error: ‘for’ loop initial declarations are only allowed in C99 mode
> for (int i = 0; i < numCachedClasses; i++) {
> ^
> /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5:
>  note: use option -std=c99 or -std=gnu99 to compile your code
> {code}
> We should add the -std=c99 / -std=gnu99 flags to libhdfs compilation so that 
> we can enforce C99 as the minimum required version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14394) Add -std=c99 / -std=gnu99 to libhdfs compile flags

2019-04-01 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14394:

Status: Patch Available  (was: Open)

> Add -std=c99 / -std=gnu99 to libhdfs compile flags
> --
>
> Key: HDFS-14394
> URL: https://issues.apache.org/jira/browse/HDFS-14394
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-14394.001.patch
>
>
> libhdfs compilation currently does not enforce a minimum required C version. 
> As of today, the libhdfs build on Hadoop QA works, but when built on a 
> machine with an outdated gcc / cc version where C89 is the default, 
> compilation fails due to errors such as:
> {code}
> /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5:
>  error: ‘for’ loop initial declarations are only allowed in C99 mode
> for (int i = 0; i < numCachedClasses; i++) {
> ^
> /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5:
>  note: use option -std=c99 or -std=gnu99 to compile your code
> {code}
> We should add the -std=c99 / -std=gnu99 flags to libhdfs compilation so that 
> we can enforce C99 as the minimum required version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14394) Add -std=c99 / -std=gnu99 to libhdfs compile flags

2019-03-27 Thread Sahil Takiar (JIRA)
Sahil Takiar created HDFS-14394:
---

 Summary: Add -std=c99 / -std=gnu99 to libhdfs compile flags
 Key: HDFS-14394
 URL: https://issues.apache.org/jira/browse/HDFS-14394
 Project: Hadoop HDFS
  Issue Type: Task
  Components: hdfs-client, libhdfs, native
Reporter: Sahil Takiar
Assignee: Sahil Takiar


libhdfs compilation currently does not enforce a minimum required C version. As 
of today, the libhdfs build on Hadoop QA works, but when built on a machine 
with an outdated gcc / cc version where C89 is the default, compilation fails 
due to errors such as:

{code}
/build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5:
 error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < numCachedClasses; i++) {
^
/build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5:
 note: use option -std=c99 or -std=gnu99 to compile your code
{code}

We should add the -std=c99 / -std=gnu99 flags to libhdfs compilation so that we 
can enforce C99 as the minimum required version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs

2019-03-26 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14304:

Status: Open  (was: Patch Available)

> High lock contention on hdfsHashMutex in libhdfs
> 
>
> Key: HDFS-14304
> URL: https://issues.apache.org/jira/browse/HDFS-14304
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While doing some performance profiling of an application using libhdfs, we 
> noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in 
> {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}}
> The issue is that every JNI method invocation done by {{hdfs.c}} goes through 
> a helper method called {{invokeMethod}}. {{invokeMethod}} calls 
> {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a 
> lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) 
> (the lock is acquired for both reads and writes). The hash table maps {{char 
> *className}} to {{jclass}} objects, it seems the goal of the hash table is to 
> avoid repeatedly creating {{jclass}} objects for each JNI call.
> For multi-threaded applications, this lock severely limits that rate at which 
> Java methods can be invoked. pstacks show a lot of time being spent on 
> {{hdfsHashMutex}}
> {code:java}
> #0  0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0
> #2  0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x027d8386 in mutexLock ()
> #4  0x027d0e7b in globalClassReference ()
> #5  0x027d1160 in invokeMethod ()
> #6  0x027d4176 in readDirect ()
> #7  0x027d4325 in hdfsRead ()
> {code}
> Same with {{perf report}}
> {code:java}
> +   63.36% 0.01%  [k] system_call_fastpath
> +   61.60% 0.12%  [k] sys_futex 
> +   61.45% 0.13%  [k] do_futex 
> +   57.54% 0.49%  [k] _raw_qspin_lock
> +   57.07% 0.01%  [k] queued_spin_lock_slowpath
> +   55.47%55.47%  [k] native_queued_spin_lock_slowpath
> -   35.68% 0.00%  [k] 0x6f6f6461682f6568
>- 0x6f6f6461682f6568 
>   - 30.55% __lll_lock_wait   
>  - 29.40% system_call_fastpath  
> - 29.39% sys_futex  
>- 29.35% do_futex   
>   - 29.27% futex_wait 
>  - 28.17% futex_wait_setup
> - 27.05% _raw_qspin_lock 
>- 27.05% queued_spin_lock_slowpath
> 26.30% native_queued_spin_lock_slowpath 
>   + 0.67% ret_from_intr 
>  + 0.71% futex_wait_queue_me
>   - 2.00% methodIdFromClass
>  - 1.94% jni_GetMethodID  
> - 1.71% get_method_id   
>  0.96% SymbolTable::lookup_only 
>   - 1.61% invokeMethod
>  - 0.62% jni_CallLongMethodV 
>   0.52% jni_invoke_nonstatic 
> 0.75% pthread_mutex_lock
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs

2019-03-26 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14304:

Status: Patch Available  (was: Open)

> High lock contention on hdfsHashMutex in libhdfs
> 
>
> Key: HDFS-14304
> URL: https://issues.apache.org/jira/browse/HDFS-14304
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While doing some performance profiling of an application using libhdfs, we 
> noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in 
> {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}}
> The issue is that every JNI method invocation done by {{hdfs.c}} goes through 
> a helper method called {{invokeMethod}}. {{invokeMethod}} calls 
> {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a 
> lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) 
> (the lock is acquired for both reads and writes). The hash table maps {{char 
> *className}} to {{jclass}} objects, it seems the goal of the hash table is to 
> avoid repeatedly creating {{jclass}} objects for each JNI call.
> For multi-threaded applications, this lock severely limits that rate at which 
> Java methods can be invoked. pstacks show a lot of time being spent on 
> {{hdfsHashMutex}}
> {code:java}
> #0  0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0
> #2  0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x027d8386 in mutexLock ()
> #4  0x027d0e7b in globalClassReference ()
> #5  0x027d1160 in invokeMethod ()
> #6  0x027d4176 in readDirect ()
> #7  0x027d4325 in hdfsRead ()
> {code}
> Same with {{perf report}}
> {code:java}
> +   63.36% 0.01%  [k] system_call_fastpath
> +   61.60% 0.12%  [k] sys_futex 
> +   61.45% 0.13%  [k] do_futex 
> +   57.54% 0.49%  [k] _raw_qspin_lock
> +   57.07% 0.01%  [k] queued_spin_lock_slowpath
> +   55.47%55.47%  [k] native_queued_spin_lock_slowpath
> -   35.68% 0.00%  [k] 0x6f6f6461682f6568
>- 0x6f6f6461682f6568 
>   - 30.55% __lll_lock_wait   
>  - 29.40% system_call_fastpath  
> - 29.39% sys_futex  
>- 29.35% do_futex   
>   - 29.27% futex_wait 
>  - 28.17% futex_wait_setup
> - 27.05% _raw_qspin_lock 
>- 27.05% queued_spin_lock_slowpath
> 26.30% native_queued_spin_lock_slowpath 
>   + 0.67% ret_from_intr 
>  + 0.71% futex_wait_queue_me
>   - 2.00% methodIdFromClass
>  - 1.94% jni_GetMethodID  
> - 1.71% get_method_id   
>  0.96% SymbolTable::lookup_only 
>   - 1.61% invokeMethod
>  - 0.62% jni_CallLongMethodV 
>   0.52% jni_invoke_nonstatic 
> 0.75% pthread_mutex_lock
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs

2019-03-26 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14304:

Status: Patch Available  (was: Open)

> High lock contention on hdfsHashMutex in libhdfs
> 
>
> Key: HDFS-14304
> URL: https://issues.apache.org/jira/browse/HDFS-14304
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While doing some performance profiling of an application using libhdfs, we 
> noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in 
> {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}}
> The issue is that every JNI method invocation done by {{hdfs.c}} goes through 
> a helper method called {{invokeMethod}}. {{invokeMethod}} calls 
> {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a 
> lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) 
> (the lock is acquired for both reads and writes). The hash table maps {{char 
> *className}} to {{jclass}} objects, it seems the goal of the hash table is to 
> avoid repeatedly creating {{jclass}} objects for each JNI call.
> For multi-threaded applications, this lock severely limits that rate at which 
> Java methods can be invoked. pstacks show a lot of time being spent on 
> {{hdfsHashMutex}}
> {code:java}
> #0  0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0
> #2  0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x027d8386 in mutexLock ()
> #4  0x027d0e7b in globalClassReference ()
> #5  0x027d1160 in invokeMethod ()
> #6  0x027d4176 in readDirect ()
> #7  0x027d4325 in hdfsRead ()
> {code}
> Same with {{perf report}}
> {code:java}
> +   63.36% 0.01%  [k] system_call_fastpath
> +   61.60% 0.12%  [k] sys_futex 
> +   61.45% 0.13%  [k] do_futex 
> +   57.54% 0.49%  [k] _raw_qspin_lock
> +   57.07% 0.01%  [k] queued_spin_lock_slowpath
> +   55.47%55.47%  [k] native_queued_spin_lock_slowpath
> -   35.68% 0.00%  [k] 0x6f6f6461682f6568
>- 0x6f6f6461682f6568 
>   - 30.55% __lll_lock_wait   
>  - 29.40% system_call_fastpath  
> - 29.39% sys_futex  
>- 29.35% do_futex   
>   - 29.27% futex_wait 
>  - 28.17% futex_wait_setup
> - 27.05% _raw_qspin_lock 
>- 27.05% queued_spin_lock_slowpath
> 26.30% native_queued_spin_lock_slowpath 
>   + 0.67% ret_from_intr 
>  + 0.71% futex_wait_queue_me
>   - 2.00% methodIdFromClass
>  - 1.94% jni_GetMethodID  
> - 1.71% get_method_id   
>  0.96% SymbolTable::lookup_only 
>   - 1.61% invokeMethod
>  - 0.62% jni_CallLongMethodV 
>   0.52% jni_invoke_nonstatic 
> 0.75% pthread_mutex_lock
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs

2019-03-26 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14304:

Status: Open  (was: Patch Available)

> High lock contention on hdfsHashMutex in libhdfs
> 
>
> Key: HDFS-14304
> URL: https://issues.apache.org/jira/browse/HDFS-14304
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While doing some performance profiling of an application using libhdfs, we 
> noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in 
> {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}}
> The issue is that every JNI method invocation done by {{hdfs.c}} goes through 
> a helper method called {{invokeMethod}}. {{invokeMethod}} calls 
> {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a 
> lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) 
> (the lock is acquired for both reads and writes). The hash table maps {{char 
> *className}} to {{jclass}} objects, it seems the goal of the hash table is to 
> avoid repeatedly creating {{jclass}} objects for each JNI call.
> For multi-threaded applications, this lock severely limits that rate at which 
> Java methods can be invoked. pstacks show a lot of time being spent on 
> {{hdfsHashMutex}}
> {code:java}
> #0  0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0
> #2  0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x027d8386 in mutexLock ()
> #4  0x027d0e7b in globalClassReference ()
> #5  0x027d1160 in invokeMethod ()
> #6  0x027d4176 in readDirect ()
> #7  0x027d4325 in hdfsRead ()
> {code}
> Same with {{perf report}}
> {code:java}
> +   63.36% 0.01%  [k] system_call_fastpath
> +   61.60% 0.12%  [k] sys_futex 
> +   61.45% 0.13%  [k] do_futex 
> +   57.54% 0.49%  [k] _raw_qspin_lock
> +   57.07% 0.01%  [k] queued_spin_lock_slowpath
> +   55.47%55.47%  [k] native_queued_spin_lock_slowpath
> -   35.68% 0.00%  [k] 0x6f6f6461682f6568
>- 0x6f6f6461682f6568 
>   - 30.55% __lll_lock_wait   
>  - 29.40% system_call_fastpath  
> - 29.39% sys_futex  
>- 29.35% do_futex   
>   - 29.27% futex_wait 
>  - 28.17% futex_wait_setup
> - 27.05% _raw_qspin_lock 
>- 27.05% queued_spin_lock_slowpath
> 26.30% native_queued_spin_lock_slowpath 
>   + 0.67% ret_from_intr 
>  + 0.71% futex_wait_queue_me
>   - 2.00% methodIdFromClass
>  - 1.94% jni_GetMethodID  
> - 1.71% get_method_id   
>  0.96% SymbolTable::lookup_only 
>   - 1.61% invokeMethod
>  - 0.62% jni_CallLongMethodV 
>   0.52% jni_invoke_nonstatic 
> 0.75% pthread_mutex_lock
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-3246) pRead equivalent for direct read path

2019-03-21 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798571#comment-16798571
 ] 

Sahil Takiar commented on HDFS-3246:


[~daryn] done.

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, 
> HDFS-3246.006.patch, HDFS-3246.007.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path

2019-03-21 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-3246:
---
Status: Open  (was: Patch Available)

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, 
> HDFS-3246.006.patch, HDFS-3246.007.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path

2019-03-21 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-3246:
---
Status: Patch Available  (was: Open)

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, 
> HDFS-3246.006.patch, HDFS-3246.007.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path

2019-03-21 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-3246:
---
Status: Open  (was: Patch Available)

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, 
> HDFS-3246.006.patch, HDFS-3246.007.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-3246) pRead equivalent for direct read path

2019-03-21 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798384#comment-16798384
 ] 

Sahil Takiar commented on HDFS-3246:


Thanks for taking a look [~openinx], addressed your comments and updated the PR.

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, 
> HDFS-3246.006.patch, HDFS-3246.007.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path

2019-03-21 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-3246:
---
Status: Patch Available  (was: Open)

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, 
> HDFS-3246.006.patch, HDFS-3246.007.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14348) Fix JNI exception handling issues in libhdfs

2019-03-21 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14348:

Status: Open  (was: Patch Available)

> Fix JNI exception handling issues in libhdfs
> 
>
> Key: HDFS-14348
> URL: https://issues.apache.org/jira/browse/HDFS-14348
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> During some manual digging through the libhdfs code, we found several places 
> where we are not handling exceptions properly.
> Specifically, there seem to be some violation of the following snippet from 
> the JNI Oracle docs 
> (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes):
> {quote}
> *Exceptions and Error Codes*
> Certain JNI functions use the Java exception mechanism to report error 
> conditions. In most cases, JNI functions report error conditions by returning 
> an error code and throwing a Java exception. The error code is usually a 
> special return value (such as NULL) that is outside of the range of normal 
> return values. Therefore, the programmer can quickly check the return value 
> of the last JNI call to determine if an error has occurred, and call a 
> function, ExceptionOccurred(), to obtain the exception object that contains a 
> more detailed description of the error condition.
> There are two cases where the programmer needs to check for exceptions 
> without being able to first check an error code:
> [1] The JNI functions that invoke a Java method return the result of the Java 
> method. The programmer must call ExceptionOccurred() to check for possible 
> exceptions that occurred during the execution of the Java method.
> [2] Some of the JNI array access functions do not return an error code, but 
> may throw an ArrayIndexOutOfBoundsException or ArrayStoreException.
> In all other cases, a non-error return value guarantees that no exceptions 
> have been thrown.
> {quote}
> Here is a running list of issues:
> * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but 
> does not check if an exception has occurred, it only checks if the result of 
> the method (in this case {{Class#getName(String)}}) returns {{NULL}}
> * Exception handling in {{get_current_thread_id}} (both 
> {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) 
> seems to have several issues; lots of JNI methods are called without checking 
> for exceptions
> * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} 
> in {{hdfs.c}} do not check for exceptions properly
> ** e.g. for {{GetObjectArrayElement}} they only check if the result of the 
> operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for 
> pending exceptions as well



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14348) Fix JNI exception handling issues in libhdfs

2019-03-21 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14348:

Status: Patch Available  (was: Open)

> Fix JNI exception handling issues in libhdfs
> 
>
> Key: HDFS-14348
> URL: https://issues.apache.org/jira/browse/HDFS-14348
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> During some manual digging through the libhdfs code, we found several places 
> where we are not handling exceptions properly.
> Specifically, there seem to be some violation of the following snippet from 
> the JNI Oracle docs 
> (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes):
> {quote}
> *Exceptions and Error Codes*
> Certain JNI functions use the Java exception mechanism to report error 
> conditions. In most cases, JNI functions report error conditions by returning 
> an error code and throwing a Java exception. The error code is usually a 
> special return value (such as NULL) that is outside of the range of normal 
> return values. Therefore, the programmer can quickly check the return value 
> of the last JNI call to determine if an error has occurred, and call a 
> function, ExceptionOccurred(), to obtain the exception object that contains a 
> more detailed description of the error condition.
> There are two cases where the programmer needs to check for exceptions 
> without being able to first check an error code:
> [1] The JNI functions that invoke a Java method return the result of the Java 
> method. The programmer must call ExceptionOccurred() to check for possible 
> exceptions that occurred during the execution of the Java method.
> [2] Some of the JNI array access functions do not return an error code, but 
> may throw an ArrayIndexOutOfBoundsException or ArrayStoreException.
> In all other cases, a non-error return value guarantees that no exceptions 
> have been thrown.
> {quote}
> Here is a running list of issues:
> * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but 
> does not check if an exception has occurred, it only checks if the result of 
> the method (in this case {{Class#getName(String)}}) returns {{NULL}}
> * Exception handling in {{get_current_thread_id}} (both 
> {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) 
> seems to have several issues; lots of JNI methods are called without checking 
> for exceptions
> * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} 
> in {{hdfs.c}} do not check for exceptions properly
> ** e.g. for {{GetObjectArrayElement}} they only check if the result of the 
> operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for 
> pending exceptions as well



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs

2019-03-21 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14304:

Status: Patch Available  (was: Open)

> High lock contention on hdfsHashMutex in libhdfs
> 
>
> Key: HDFS-14304
> URL: https://issues.apache.org/jira/browse/HDFS-14304
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While doing some performance profiling of an application using libhdfs, we 
> noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in 
> {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}}
> The issue is that every JNI method invocation done by {{hdfs.c}} goes through 
> a helper method called {{invokeMethod}}. {{invokeMethod}} calls 
> {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a 
> lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) 
> (the lock is acquired for both reads and writes). The hash table maps {{char 
> *className}} to {{jclass}} objects, it seems the goal of the hash table is to 
> avoid repeatedly creating {{jclass}} objects for each JNI call.
> For multi-threaded applications, this lock severely limits that rate at which 
> Java methods can be invoked. pstacks show a lot of time being spent on 
> {{hdfsHashMutex}}
> {code:java}
> #0  0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0
> #2  0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x027d8386 in mutexLock ()
> #4  0x027d0e7b in globalClassReference ()
> #5  0x027d1160 in invokeMethod ()
> #6  0x027d4176 in readDirect ()
> #7  0x027d4325 in hdfsRead ()
> {code}
> Same with {{perf report}}
> {code:java}
> +   63.36% 0.01%  [k] system_call_fastpath
> +   61.60% 0.12%  [k] sys_futex 
> +   61.45% 0.13%  [k] do_futex 
> +   57.54% 0.49%  [k] _raw_qspin_lock
> +   57.07% 0.01%  [k] queued_spin_lock_slowpath
> +   55.47%55.47%  [k] native_queued_spin_lock_slowpath
> -   35.68% 0.00%  [k] 0x6f6f6461682f6568
>- 0x6f6f6461682f6568 
>   - 30.55% __lll_lock_wait   
>  - 29.40% system_call_fastpath  
> - 29.39% sys_futex  
>- 29.35% do_futex   
>   - 29.27% futex_wait 
>  - 28.17% futex_wait_setup
> - 27.05% _raw_qspin_lock 
>- 27.05% queued_spin_lock_slowpath
> 26.30% native_queued_spin_lock_slowpath 
>   + 0.67% ret_from_intr 
>  + 0.71% futex_wait_queue_me
>   - 2.00% methodIdFromClass
>  - 1.94% jni_GetMethodID  
> - 1.71% get_method_id   
>  0.96% SymbolTable::lookup_only 
>   - 1.61% invokeMethod
>  - 0.62% jni_CallLongMethodV 
>   0.52% jni_invoke_nonstatic 
> 0.75% pthread_mutex_lock
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs

2019-03-21 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14304:

Status: Open  (was: Patch Available)

> High lock contention on hdfsHashMutex in libhdfs
> 
>
> Key: HDFS-14304
> URL: https://issues.apache.org/jira/browse/HDFS-14304
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While doing some performance profiling of an application using libhdfs, we 
> noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in 
> {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}}
> The issue is that every JNI method invocation done by {{hdfs.c}} goes through 
> a helper method called {{invokeMethod}}. {{invokeMethod}} calls 
> {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a 
> lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) 
> (the lock is acquired for both reads and writes). The hash table maps {{char 
> *className}} to {{jclass}} objects, it seems the goal of the hash table is to 
> avoid repeatedly creating {{jclass}} objects for each JNI call.
> For multi-threaded applications, this lock severely limits that rate at which 
> Java methods can be invoked. pstacks show a lot of time being spent on 
> {{hdfsHashMutex}}
> {code:java}
> #0  0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0
> #2  0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x027d8386 in mutexLock ()
> #4  0x027d0e7b in globalClassReference ()
> #5  0x027d1160 in invokeMethod ()
> #6  0x027d4176 in readDirect ()
> #7  0x027d4325 in hdfsRead ()
> {code}
> Same with {{perf report}}
> {code:java}
> +   63.36% 0.01%  [k] system_call_fastpath
> +   61.60% 0.12%  [k] sys_futex 
> +   61.45% 0.13%  [k] do_futex 
> +   57.54% 0.49%  [k] _raw_qspin_lock
> +   57.07% 0.01%  [k] queued_spin_lock_slowpath
> +   55.47%55.47%  [k] native_queued_spin_lock_slowpath
> -   35.68% 0.00%  [k] 0x6f6f6461682f6568
>- 0x6f6f6461682f6568 
>   - 30.55% __lll_lock_wait   
>  - 29.40% system_call_fastpath  
> - 29.39% sys_futex  
>- 29.35% do_futex   
>   - 29.27% futex_wait 
>  - 28.17% futex_wait_setup
> - 27.05% _raw_qspin_lock 
>- 27.05% queued_spin_lock_slowpath
> 26.30% native_queued_spin_lock_slowpath 
>   + 0.67% ret_from_intr 
>  + 0.71% futex_wait_queue_me
>   - 2.00% methodIdFromClass
>  - 1.94% jni_GetMethodID  
> - 1.71% get_method_id   
>  0.96% SymbolTable::lookup_only 
>   - 1.61% invokeMethod
>  - 0.62% jni_CallLongMethodV 
>   0.52% jni_invoke_nonstatic 
> 0.75% pthread_mutex_lock
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14386) Improve libhdfs test coverage for failure paths

2019-03-21 Thread Sahil Takiar (JIRA)
Sahil Takiar created HDFS-14386:
---

 Summary: Improve libhdfs test coverage for failure paths
 Key: HDFS-14386
 URL: https://issues.apache.org/jira/browse/HDFS-14386
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client, libhdfs, native
Reporter: Sahil Takiar
Assignee: Sahil Takiar


While working on HDFS-14304 and HDFS-14348, it seems that libhdfs does not have 
great test coverage for failure paths. We found a few places in libhdfs where 
we are not propagating / handling exceptions properly. The goal of this JIRA is 
to improve test coverage for the failure / exception handling code in libhdfs.

I don't have a clear picture of how to do this, but here are some ideas:

(1) Create a dummy {{FileSystem}} where all operations throw an {{Exception}} 
and call into that {{FileSystem}} using libhdfs.

(2) We already do things like trying to open a file that does not exist, we can 
add tests that list a directory that does not exist, etc.

(3) It would be great if we could use some type of method stubbing (like 
Mockito in Java) for JNI methods, so we could test that our usage of the JNI is 
correct - e.g. if {{NewByteArray}} returns {{NULL}} do we actually throw an 
exception?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14348) Fix JNI exception handling issues in libhdfs

2019-03-21 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798228#comment-16798228
 ] 

Sahil Takiar commented on HDFS-14348:
-

Think I found another issue, although this one seems minor. 
{{translateZCRException}} in {{hdfs.c}} does not always seem to free the 
exception; specifically if this code path is hit:

{code}
if (!strcmp(className, "java.lang.UnsupportedOperationException")) {
ret = EPROTONOSUPPORT;
goto done;
}
{code}

The supplied {{jthrowable}} is never freed, because the {{done}} block only 
consists of:

{code}
done:
free(className);
return ret;
{code}

> Fix JNI exception handling issues in libhdfs
> 
>
> Key: HDFS-14348
> URL: https://issues.apache.org/jira/browse/HDFS-14348
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> During some manual digging through the libhdfs code, we found several places 
> where we are not handling exceptions properly.
> Specifically, there seem to be some violation of the following snippet from 
> the JNI Oracle docs 
> (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes):
> {quote}
> *Exceptions and Error Codes*
> Certain JNI functions use the Java exception mechanism to report error 
> conditions. In most cases, JNI functions report error conditions by returning 
> an error code and throwing a Java exception. The error code is usually a 
> special return value (such as NULL) that is outside of the range of normal 
> return values. Therefore, the programmer can quickly check the return value 
> of the last JNI call to determine if an error has occurred, and call a 
> function, ExceptionOccurred(), to obtain the exception object that contains a 
> more detailed description of the error condition.
> There are two cases where the programmer needs to check for exceptions 
> without being able to first check an error code:
> [1] The JNI functions that invoke a Java method return the result of the Java 
> method. The programmer must call ExceptionOccurred() to check for possible 
> exceptions that occurred during the execution of the Java method.
> [2] Some of the JNI array access functions do not return an error code, but 
> may throw an ArrayIndexOutOfBoundsException or ArrayStoreException.
> In all other cases, a non-error return value guarantees that no exceptions 
> have been thrown.
> {quote}
> Here is a running list of issues:
> * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but 
> does not check if an exception has occurred, it only checks if the result of 
> the method (in this case {{Class#getName(String)}}) returns {{NULL}}
> * Exception handling in {{get_current_thread_id}} (both 
> {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) 
> seems to have several issues; lots of JNI methods are called without checking 
> for exceptions
> * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} 
> in {{hdfs.c}} do not check for exceptions properly
> ** e.g. for {{GetObjectArrayElement}} they only check if the result of the 
> operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for 
> pending exceptions as well



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14348) Fix JNI exception handling issues in libhdfs

2019-03-21 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14348:

Status: Patch Available  (was: Open)

> Fix JNI exception handling issues in libhdfs
> 
>
> Key: HDFS-14348
> URL: https://issues.apache.org/jira/browse/HDFS-14348
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> During some manual digging through the libhdfs code, we found several places 
> where we are not handling exceptions properly.
> Specifically, there seem to be some violation of the following snippet from 
> the JNI Oracle docs 
> (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes):
> {quote}
> *Exceptions and Error Codes*
> Certain JNI functions use the Java exception mechanism to report error 
> conditions. In most cases, JNI functions report error conditions by returning 
> an error code and throwing a Java exception. The error code is usually a 
> special return value (such as NULL) that is outside of the range of normal 
> return values. Therefore, the programmer can quickly check the return value 
> of the last JNI call to determine if an error has occurred, and call a 
> function, ExceptionOccurred(), to obtain the exception object that contains a 
> more detailed description of the error condition.
> There are two cases where the programmer needs to check for exceptions 
> without being able to first check an error code:
> [1] The JNI functions that invoke a Java method return the result of the Java 
> method. The programmer must call ExceptionOccurred() to check for possible 
> exceptions that occurred during the execution of the Java method.
> [2] Some of the JNI array access functions do not return an error code, but 
> may throw an ArrayIndexOutOfBoundsException or ArrayStoreException.
> In all other cases, a non-error return value guarantees that no exceptions 
> have been thrown.
> {quote}
> Here is a running list of issues:
> * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but 
> does not check if an exception has occurred, it only checks if the result of 
> the method (in this case {{Class#getName(String)}}) returns {{NULL}}
> * Exception handling in {{get_current_thread_id}} (both 
> {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) 
> seems to have several issues; lots of JNI methods are called without checking 
> for exceptions
> * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} 
> in {{hdfs.c}} do not check for exceptions properly
> ** e.g. for {{GetObjectArrayElement}} they only check if the result of the 
> operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for 
> pending exceptions as well



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14348) Fix JNI exception handling issues in libhdfs

2019-03-21 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14348:

Status: Open  (was: Patch Available)

> Fix JNI exception handling issues in libhdfs
> 
>
> Key: HDFS-14348
> URL: https://issues.apache.org/jira/browse/HDFS-14348
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> During some manual digging through the libhdfs code, we found several places 
> where we are not handling exceptions properly.
> Specifically, there seem to be some violation of the following snippet from 
> the JNI Oracle docs 
> (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes):
> {quote}
> *Exceptions and Error Codes*
> Certain JNI functions use the Java exception mechanism to report error 
> conditions. In most cases, JNI functions report error conditions by returning 
> an error code and throwing a Java exception. The error code is usually a 
> special return value (such as NULL) that is outside of the range of normal 
> return values. Therefore, the programmer can quickly check the return value 
> of the last JNI call to determine if an error has occurred, and call a 
> function, ExceptionOccurred(), to obtain the exception object that contains a 
> more detailed description of the error condition.
> There are two cases where the programmer needs to check for exceptions 
> without being able to first check an error code:
> [1] The JNI functions that invoke a Java method return the result of the Java 
> method. The programmer must call ExceptionOccurred() to check for possible 
> exceptions that occurred during the execution of the Java method.
> [2] Some of the JNI array access functions do not return an error code, but 
> may throw an ArrayIndexOutOfBoundsException or ArrayStoreException.
> In all other cases, a non-error return value guarantees that no exceptions 
> have been thrown.
> {quote}
> Here is a running list of issues:
> * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but 
> does not check if an exception has occurred, it only checks if the result of 
> the method (in this case {{Class#getName(String)}}) returns {{NULL}}
> * Exception handling in {{get_current_thread_id}} (both 
> {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) 
> seems to have several issues; lots of JNI methods are called without checking 
> for exceptions
> * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} 
> in {{hdfs.c}} do not check for exceptions properly
> ** e.g. for {{GetObjectArrayElement}} they only check if the result of the 
> operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for 
> pending exceptions as well



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14348) Fix JNI exception handling issues in libhdfs

2019-03-19 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14348:

Status: Patch Available  (was: Open)

> Fix JNI exception handling issues in libhdfs
> 
>
> Key: HDFS-14348
> URL: https://issues.apache.org/jira/browse/HDFS-14348
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> During some manual digging through the libhdfs code, we found several places 
> where we are not handling exceptions properly.
> Specifically, there seem to be some violation of the following snippet from 
> the JNI Oracle docs 
> (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes):
> {quote}
> *Exceptions and Error Codes*
> Certain JNI functions use the Java exception mechanism to report error 
> conditions. In most cases, JNI functions report error conditions by returning 
> an error code and throwing a Java exception. The error code is usually a 
> special return value (such as NULL) that is outside of the range of normal 
> return values. Therefore, the programmer can quickly check the return value 
> of the last JNI call to determine if an error has occurred, and call a 
> function, ExceptionOccurred(), to obtain the exception object that contains a 
> more detailed description of the error condition.
> There are two cases where the programmer needs to check for exceptions 
> without being able to first check an error code:
> [1] The JNI functions that invoke a Java method return the result of the Java 
> method. The programmer must call ExceptionOccurred() to check for possible 
> exceptions that occurred during the execution of the Java method.
> [2] Some of the JNI array access functions do not return an error code, but 
> may throw an ArrayIndexOutOfBoundsException or ArrayStoreException.
> In all other cases, a non-error return value guarantees that no exceptions 
> have been thrown.
> {quote}
> Here is a running list of issues:
> * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but 
> does not check if an exception has occurred, it only checks if the result of 
> the method (in this case {{Class#getName(String)}}) returns {{NULL}}
> * Exception handling in {{get_current_thread_id}} (both 
> {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) 
> seems to have several issues; lots of JNI methods are called without checking 
> for exceptions
> * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} 
> in {{hdfs.c}} do not check for exceptions properly
> ** e.g. for {{GetObjectArrayElement}} they only check if the result of the 
> operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for 
> pending exceptions as well



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14348) Fix JNI exception handling issues in libhdfs

2019-03-19 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14348:

Status: Open  (was: Patch Available)

> Fix JNI exception handling issues in libhdfs
> 
>
> Key: HDFS-14348
> URL: https://issues.apache.org/jira/browse/HDFS-14348
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> During some manual digging through the libhdfs code, we found several places 
> where we are not handling exceptions properly.
> Specifically, there seem to be some violation of the following snippet from 
> the JNI Oracle docs 
> (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes):
> {quote}
> *Exceptions and Error Codes*
> Certain JNI functions use the Java exception mechanism to report error 
> conditions. In most cases, JNI functions report error conditions by returning 
> an error code and throwing a Java exception. The error code is usually a 
> special return value (such as NULL) that is outside of the range of normal 
> return values. Therefore, the programmer can quickly check the return value 
> of the last JNI call to determine if an error has occurred, and call a 
> function, ExceptionOccurred(), to obtain the exception object that contains a 
> more detailed description of the error condition.
> There are two cases where the programmer needs to check for exceptions 
> without being able to first check an error code:
> [1] The JNI functions that invoke a Java method return the result of the Java 
> method. The programmer must call ExceptionOccurred() to check for possible 
> exceptions that occurred during the execution of the Java method.
> [2] Some of the JNI array access functions do not return an error code, but 
> may throw an ArrayIndexOutOfBoundsException or ArrayStoreException.
> In all other cases, a non-error return value guarantees that no exceptions 
> have been thrown.
> {quote}
> Here is a running list of issues:
> * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but 
> does not check if an exception has occurred, it only checks if the result of 
> the method (in this case {{Class#getName(String)}}) returns {{NULL}}
> * Exception handling in {{get_current_thread_id}} (both 
> {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) 
> seems to have several issues; lots of JNI methods are called without checking 
> for exceptions
> * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} 
> in {{hdfs.c}} do not check for exceptions properly
> ** e.g. for {{GetObjectArrayElement}} they only check if the result of the 
> operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for 
> pending exceptions as well



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work stopped] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs

2019-03-19 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-14304 stopped by Sahil Takiar.
---
> High lock contention on hdfsHashMutex in libhdfs
> 
>
> Key: HDFS-14304
> URL: https://issues.apache.org/jira/browse/HDFS-14304
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While doing some performance profiling of an application using libhdfs, we 
> noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in 
> {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}}
> The issue is that every JNI method invocation done by {{hdfs.c}} goes through 
> a helper method called {{invokeMethod}}. {{invokeMethod}} calls 
> {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a 
> lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) 
> (the lock is acquired for both reads and writes). The hash table maps {{char 
> *className}} to {{jclass}} objects, it seems the goal of the hash table is to 
> avoid repeatedly creating {{jclass}} objects for each JNI call.
> For multi-threaded applications, this lock severely limits that rate at which 
> Java methods can be invoked. pstacks show a lot of time being spent on 
> {{hdfsHashMutex}}
> {code:java}
> #0  0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0
> #2  0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x027d8386 in mutexLock ()
> #4  0x027d0e7b in globalClassReference ()
> #5  0x027d1160 in invokeMethod ()
> #6  0x027d4176 in readDirect ()
> #7  0x027d4325 in hdfsRead ()
> {code}
> Same with {{perf report}}
> {code:java}
> +   63.36% 0.01%  [k] system_call_fastpath
> +   61.60% 0.12%  [k] sys_futex 
> +   61.45% 0.13%  [k] do_futex 
> +   57.54% 0.49%  [k] _raw_qspin_lock
> +   57.07% 0.01%  [k] queued_spin_lock_slowpath
> +   55.47%55.47%  [k] native_queued_spin_lock_slowpath
> -   35.68% 0.00%  [k] 0x6f6f6461682f6568
>- 0x6f6f6461682f6568 
>   - 30.55% __lll_lock_wait   
>  - 29.40% system_call_fastpath  
> - 29.39% sys_futex  
>- 29.35% do_futex   
>   - 29.27% futex_wait 
>  - 28.17% futex_wait_setup
> - 27.05% _raw_qspin_lock 
>- 27.05% queued_spin_lock_slowpath
> 26.30% native_queued_spin_lock_slowpath 
>   + 0.67% ret_from_intr 
>  + 0.71% futex_wait_queue_me
>   - 2.00% methodIdFromClass
>  - 1.94% jni_GetMethodID  
> - 1.71% get_method_id   
>  0.96% SymbolTable::lookup_only 
>   - 1.61% invokeMethod
>  - 0.62% jni_CallLongMethodV 
>   0.52% jni_invoke_nonstatic 
> 0.75% pthread_mutex_lock
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs

2019-03-19 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14304:

Status: Patch Available  (was: Open)

> High lock contention on hdfsHashMutex in libhdfs
> 
>
> Key: HDFS-14304
> URL: https://issues.apache.org/jira/browse/HDFS-14304
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While doing some performance profiling of an application using libhdfs, we 
> noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in 
> {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}}
> The issue is that every JNI method invocation done by {{hdfs.c}} goes through 
> a helper method called {{invokeMethod}}. {{invokeMethod}} calls 
> {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a 
> lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) 
> (the lock is acquired for both reads and writes). The hash table maps {{char 
> *className}} to {{jclass}} objects, it seems the goal of the hash table is to 
> avoid repeatedly creating {{jclass}} objects for each JNI call.
> For multi-threaded applications, this lock severely limits that rate at which 
> Java methods can be invoked. pstacks show a lot of time being spent on 
> {{hdfsHashMutex}}
> {code:java}
> #0  0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0
> #2  0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x027d8386 in mutexLock ()
> #4  0x027d0e7b in globalClassReference ()
> #5  0x027d1160 in invokeMethod ()
> #6  0x027d4176 in readDirect ()
> #7  0x027d4325 in hdfsRead ()
> {code}
> Same with {{perf report}}
> {code:java}
> +   63.36% 0.01%  [k] system_call_fastpath
> +   61.60% 0.12%  [k] sys_futex 
> +   61.45% 0.13%  [k] do_futex 
> +   57.54% 0.49%  [k] _raw_qspin_lock
> +   57.07% 0.01%  [k] queued_spin_lock_slowpath
> +   55.47%55.47%  [k] native_queued_spin_lock_slowpath
> -   35.68% 0.00%  [k] 0x6f6f6461682f6568
>- 0x6f6f6461682f6568 
>   - 30.55% __lll_lock_wait   
>  - 29.40% system_call_fastpath  
> - 29.39% sys_futex  
>- 29.35% do_futex   
>   - 29.27% futex_wait 
>  - 28.17% futex_wait_setup
> - 27.05% _raw_qspin_lock 
>- 27.05% queued_spin_lock_slowpath
> 26.30% native_queued_spin_lock_slowpath 
>   + 0.67% ret_from_intr 
>  + 0.71% futex_wait_queue_me
>   - 2.00% methodIdFromClass
>  - 1.94% jni_GetMethodID  
> - 1.71% get_method_id   
>  0.96% SymbolTable::lookup_only 
>   - 1.61% invokeMethod
>  - 0.62% jni_CallLongMethodV 
>   0.52% jni_invoke_nonstatic 
> 0.75% pthread_mutex_lock
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14348) Fix JNI exception handling issues in libhdfs

2019-03-19 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14348:

Status: Open  (was: Patch Available)

> Fix JNI exception handling issues in libhdfs
> 
>
> Key: HDFS-14348
> URL: https://issues.apache.org/jira/browse/HDFS-14348
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> During some manual digging through the libhdfs code, we found several places 
> where we are not handling exceptions properly.
> Specifically, there seem to be some violation of the following snippet from 
> the JNI Oracle docs 
> (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes):
> {quote}
> *Exceptions and Error Codes*
> Certain JNI functions use the Java exception mechanism to report error 
> conditions. In most cases, JNI functions report error conditions by returning 
> an error code and throwing a Java exception. The error code is usually a 
> special return value (such as NULL) that is outside of the range of normal 
> return values. Therefore, the programmer can quickly check the return value 
> of the last JNI call to determine if an error has occurred, and call a 
> function, ExceptionOccurred(), to obtain the exception object that contains a 
> more detailed description of the error condition.
> There are two cases where the programmer needs to check for exceptions 
> without being able to first check an error code:
> [1] The JNI functions that invoke a Java method return the result of the Java 
> method. The programmer must call ExceptionOccurred() to check for possible 
> exceptions that occurred during the execution of the Java method.
> [2] Some of the JNI array access functions do not return an error code, but 
> may throw an ArrayIndexOutOfBoundsException or ArrayStoreException.
> In all other cases, a non-error return value guarantees that no exceptions 
> have been thrown.
> {quote}
> Here is a running list of issues:
> * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but 
> does not check if an exception has occurred, it only checks if the result of 
> the method (in this case {{Class#getName(String)}}) returns {{NULL}}
> * Exception handling in {{get_current_thread_id}} (both 
> {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) 
> seems to have several issues; lots of JNI methods are called without checking 
> for exceptions
> * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} 
> in {{hdfs.c}} do not check for exceptions properly
> ** e.g. for {{GetObjectArrayElement}} they only check if the result of the 
> operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for 
> pending exceptions as well



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14348) Fix JNI exception handling issues in libhdfs

2019-03-19 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14348:

Status: Patch Available  (was: Open)

> Fix JNI exception handling issues in libhdfs
> 
>
> Key: HDFS-14348
> URL: https://issues.apache.org/jira/browse/HDFS-14348
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> During some manual digging through the libhdfs code, we found several places 
> where we are not handling exceptions properly.
> Specifically, there seem to be some violation of the following snippet from 
> the JNI Oracle docs 
> (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes):
> {quote}
> *Exceptions and Error Codes*
> Certain JNI functions use the Java exception mechanism to report error 
> conditions. In most cases, JNI functions report error conditions by returning 
> an error code and throwing a Java exception. The error code is usually a 
> special return value (such as NULL) that is outside of the range of normal 
> return values. Therefore, the programmer can quickly check the return value 
> of the last JNI call to determine if an error has occurred, and call a 
> function, ExceptionOccurred(), to obtain the exception object that contains a 
> more detailed description of the error condition.
> There are two cases where the programmer needs to check for exceptions 
> without being able to first check an error code:
> [1] The JNI functions that invoke a Java method return the result of the Java 
> method. The programmer must call ExceptionOccurred() to check for possible 
> exceptions that occurred during the execution of the Java method.
> [2] Some of the JNI array access functions do not return an error code, but 
> may throw an ArrayIndexOutOfBoundsException or ArrayStoreException.
> In all other cases, a non-error return value guarantees that no exceptions 
> have been thrown.
> {quote}
> Here is a running list of issues:
> * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but 
> does not check if an exception has occurred, it only checks if the result of 
> the method (in this case {{Class#getName(String)}}) returns {{NULL}}
> * Exception handling in {{get_current_thread_id}} (both 
> {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) 
> seems to have several issues; lots of JNI methods are called without checking 
> for exceptions
> * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} 
> in {{hdfs.c}} do not check for exceptions properly
> ** e.g. for {{GetObjectArrayElement}} they only check if the result of the 
> operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for 
> pending exceptions as well



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs

2019-03-19 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-14304 started by Sahil Takiar.
---
> High lock contention on hdfsHashMutex in libhdfs
> 
>
> Key: HDFS-14304
> URL: https://issues.apache.org/jira/browse/HDFS-14304
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While doing some performance profiling of an application using libhdfs, we 
> noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in 
> {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}}
> The issue is that every JNI method invocation done by {{hdfs.c}} goes through 
> a helper method called {{invokeMethod}}. {{invokeMethod}} calls 
> {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a 
> lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) 
> (the lock is acquired for both reads and writes). The hash table maps {{char 
> *className}} to {{jclass}} objects, it seems the goal of the hash table is to 
> avoid repeatedly creating {{jclass}} objects for each JNI call.
> For multi-threaded applications, this lock severely limits that rate at which 
> Java methods can be invoked. pstacks show a lot of time being spent on 
> {{hdfsHashMutex}}
> {code:java}
> #0  0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0
> #2  0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x027d8386 in mutexLock ()
> #4  0x027d0e7b in globalClassReference ()
> #5  0x027d1160 in invokeMethod ()
> #6  0x027d4176 in readDirect ()
> #7  0x027d4325 in hdfsRead ()
> {code}
> Same with {{perf report}}
> {code:java}
> +   63.36% 0.01%  [k] system_call_fastpath
> +   61.60% 0.12%  [k] sys_futex 
> +   61.45% 0.13%  [k] do_futex 
> +   57.54% 0.49%  [k] _raw_qspin_lock
> +   57.07% 0.01%  [k] queued_spin_lock_slowpath
> +   55.47%55.47%  [k] native_queued_spin_lock_slowpath
> -   35.68% 0.00%  [k] 0x6f6f6461682f6568
>- 0x6f6f6461682f6568 
>   - 30.55% __lll_lock_wait   
>  - 29.40% system_call_fastpath  
> - 29.39% sys_futex  
>- 29.35% do_futex   
>   - 29.27% futex_wait 
>  - 28.17% futex_wait_setup
> - 27.05% _raw_qspin_lock 
>- 27.05% queued_spin_lock_slowpath
> 26.30% native_queued_spin_lock_slowpath 
>   + 0.67% ret_from_intr 
>  + 0.71% futex_wait_queue_me
>   - 2.00% methodIdFromClass
>  - 1.94% jni_GetMethodID  
> - 1.71% get_method_id   
>  0.96% SymbolTable::lookup_only 
>   - 1.61% invokeMethod
>  - 0.62% jni_CallLongMethodV 
>   0.52% jni_invoke_nonstatic 
> 0.75% pthread_mutex_lock
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs

2019-03-19 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14304:

Status: Open  (was: Patch Available)

> High lock contention on hdfsHashMutex in libhdfs
> 
>
> Key: HDFS-14304
> URL: https://issues.apache.org/jira/browse/HDFS-14304
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While doing some performance profiling of an application using libhdfs, we 
> noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in 
> {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}}
> The issue is that every JNI method invocation done by {{hdfs.c}} goes through 
> a helper method called {{invokeMethod}}. {{invokeMethod}} calls 
> {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a 
> lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) 
> (the lock is acquired for both reads and writes). The hash table maps {{char 
> *className}} to {{jclass}} objects, it seems the goal of the hash table is to 
> avoid repeatedly creating {{jclass}} objects for each JNI call.
> For multi-threaded applications, this lock severely limits that rate at which 
> Java methods can be invoked. pstacks show a lot of time being spent on 
> {{hdfsHashMutex}}
> {code:java}
> #0  0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0
> #2  0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x027d8386 in mutexLock ()
> #4  0x027d0e7b in globalClassReference ()
> #5  0x027d1160 in invokeMethod ()
> #6  0x027d4176 in readDirect ()
> #7  0x027d4325 in hdfsRead ()
> {code}
> Same with {{perf report}}
> {code:java}
> +   63.36% 0.01%  [k] system_call_fastpath
> +   61.60% 0.12%  [k] sys_futex 
> +   61.45% 0.13%  [k] do_futex 
> +   57.54% 0.49%  [k] _raw_qspin_lock
> +   57.07% 0.01%  [k] queued_spin_lock_slowpath
> +   55.47%55.47%  [k] native_queued_spin_lock_slowpath
> -   35.68% 0.00%  [k] 0x6f6f6461682f6568
>- 0x6f6f6461682f6568 
>   - 30.55% __lll_lock_wait   
>  - 29.40% system_call_fastpath  
> - 29.39% sys_futex  
>- 29.35% do_futex   
>   - 29.27% futex_wait 
>  - 28.17% futex_wait_setup
> - 27.05% _raw_qspin_lock 
>- 27.05% queued_spin_lock_slowpath
> 26.30% native_queued_spin_lock_slowpath 
>   + 0.67% ret_from_intr 
>  + 0.71% futex_wait_queue_me
>   - 2.00% methodIdFromClass
>  - 1.94% jni_GetMethodID  
> - 1.71% get_method_id   
>  0.96% SymbolTable::lookup_only 
>   - 1.61% invokeMethod
>  - 0.62% jni_CallLongMethodV 
>   0.52% jni_invoke_nonstatic 
> 0.75% pthread_mutex_lock
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-3246) pRead equivalent for direct read path

2019-03-18 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795353#comment-16795353
 ] 

Sahil Takiar commented on HDFS-3246:


[~openinx], [~jojochuang] any additional comments?

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, 
> HDFS-3246.006.patch, HDFS-3246.007.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs

2019-03-18 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14304:

Status: Patch Available  (was: Open)

> High lock contention on hdfsHashMutex in libhdfs
> 
>
> Key: HDFS-14304
> URL: https://issues.apache.org/jira/browse/HDFS-14304
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While doing some performance profiling of an application using libhdfs, we 
> noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in 
> {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}}
> The issue is that every JNI method invocation done by {{hdfs.c}} goes through 
> a helper method called {{invokeMethod}}. {{invokeMethod}} calls 
> {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a 
> lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) 
> (the lock is acquired for both reads and writes). The hash table maps {{char 
> *className}} to {{jclass}} objects, it seems the goal of the hash table is to 
> avoid repeatedly creating {{jclass}} objects for each JNI call.
> For multi-threaded applications, this lock severely limits that rate at which 
> Java methods can be invoked. pstacks show a lot of time being spent on 
> {{hdfsHashMutex}}
> {code:java}
> #0  0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0
> #2  0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x027d8386 in mutexLock ()
> #4  0x027d0e7b in globalClassReference ()
> #5  0x027d1160 in invokeMethod ()
> #6  0x027d4176 in readDirect ()
> #7  0x027d4325 in hdfsRead ()
> {code}
> Same with {{perf report}}
> {code:java}
> +   63.36% 0.01%  [k] system_call_fastpath
> +   61.60% 0.12%  [k] sys_futex 
> +   61.45% 0.13%  [k] do_futex 
> +   57.54% 0.49%  [k] _raw_qspin_lock
> +   57.07% 0.01%  [k] queued_spin_lock_slowpath
> +   55.47%55.47%  [k] native_queued_spin_lock_slowpath
> -   35.68% 0.00%  [k] 0x6f6f6461682f6568
>- 0x6f6f6461682f6568 
>   - 30.55% __lll_lock_wait   
>  - 29.40% system_call_fastpath  
> - 29.39% sys_futex  
>- 29.35% do_futex   
>   - 29.27% futex_wait 
>  - 28.17% futex_wait_setup
> - 27.05% _raw_qspin_lock 
>- 27.05% queued_spin_lock_slowpath
> 26.30% native_queued_spin_lock_slowpath 
>   + 0.67% ret_from_intr 
>  + 0.71% futex_wait_queue_me
>   - 2.00% methodIdFromClass
>  - 1.94% jni_GetMethodID  
> - 1.71% get_method_id   
>  0.96% SymbolTable::lookup_only 
>   - 1.61% invokeMethod
>  - 0.62% jni_CallLongMethodV 
>   0.52% jni_invoke_nonstatic 
> 0.75% pthread_mutex_lock
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs

2019-03-18 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14304:

Status: Open  (was: Patch Available)

> High lock contention on hdfsHashMutex in libhdfs
> 
>
> Key: HDFS-14304
> URL: https://issues.apache.org/jira/browse/HDFS-14304
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While doing some performance profiling of an application using libhdfs, we 
> noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in 
> {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}}
> The issue is that every JNI method invocation done by {{hdfs.c}} goes through 
> a helper method called {{invokeMethod}}. {{invokeMethod}} calls 
> {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a 
> lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) 
> (the lock is acquired for both reads and writes). The hash table maps {{char 
> *className}} to {{jclass}} objects, it seems the goal of the hash table is to 
> avoid repeatedly creating {{jclass}} objects for each JNI call.
> For multi-threaded applications, this lock severely limits that rate at which 
> Java methods can be invoked. pstacks show a lot of time being spent on 
> {{hdfsHashMutex}}
> {code:java}
> #0  0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0
> #2  0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x027d8386 in mutexLock ()
> #4  0x027d0e7b in globalClassReference ()
> #5  0x027d1160 in invokeMethod ()
> #6  0x027d4176 in readDirect ()
> #7  0x027d4325 in hdfsRead ()
> {code}
> Same with {{perf report}}
> {code:java}
> +   63.36% 0.01%  [k] system_call_fastpath
> +   61.60% 0.12%  [k] sys_futex 
> +   61.45% 0.13%  [k] do_futex 
> +   57.54% 0.49%  [k] _raw_qspin_lock
> +   57.07% 0.01%  [k] queued_spin_lock_slowpath
> +   55.47%55.47%  [k] native_queued_spin_lock_slowpath
> -   35.68% 0.00%  [k] 0x6f6f6461682f6568
>- 0x6f6f6461682f6568 
>   - 30.55% __lll_lock_wait   
>  - 29.40% system_call_fastpath  
> - 29.39% sys_futex  
>- 29.35% do_futex   
>   - 29.27% futex_wait 
>  - 28.17% futex_wait_setup
> - 27.05% _raw_qspin_lock 
>- 27.05% queued_spin_lock_slowpath
> 26.30% native_queued_spin_lock_slowpath 
>   + 0.67% ret_from_intr 
>  + 0.71% futex_wait_queue_me
>   - 2.00% methodIdFromClass
>  - 1.94% jni_GetMethodID  
> - 1.71% get_method_id   
>  0.96% SymbolTable::lookup_only 
>   - 1.61% invokeMethod
>  - 0.62% jni_CallLongMethodV 
>   0.52% jni_invoke_nonstatic 
> 0.75% pthread_mutex_lock
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14348) Fix JNI exception handling issues in libhdfs

2019-03-13 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14348:

Status: Open  (was: Patch Available)

> Fix JNI exception handling issues in libhdfs
> 
>
> Key: HDFS-14348
> URL: https://issues.apache.org/jira/browse/HDFS-14348
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> During some manual digging through the libhdfs code, we found several places 
> where we are not handling exceptions properly.
> Specifically, there seem to be some violation of the following snippet from 
> the JNI Oracle docs 
> (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes):
> {quote}
> *Exceptions and Error Codes*
> Certain JNI functions use the Java exception mechanism to report error 
> conditions. In most cases, JNI functions report error conditions by returning 
> an error code and throwing a Java exception. The error code is usually a 
> special return value (such as NULL) that is outside of the range of normal 
> return values. Therefore, the programmer can quickly check the return value 
> of the last JNI call to determine if an error has occurred, and call a 
> function, ExceptionOccurred(), to obtain the exception object that contains a 
> more detailed description of the error condition.
> There are two cases where the programmer needs to check for exceptions 
> without being able to first check an error code:
> [1] The JNI functions that invoke a Java method return the result of the Java 
> method. The programmer must call ExceptionOccurred() to check for possible 
> exceptions that occurred during the execution of the Java method.
> [2] Some of the JNI array access functions do not return an error code, but 
> may throw an ArrayIndexOutOfBoundsException or ArrayStoreException.
> In all other cases, a non-error return value guarantees that no exceptions 
> have been thrown.
> {quote}
> Here is a running list of issues:
> * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but 
> does not check if an exception has occurred, it only checks if the result of 
> the method (in this case {{Class#getName(String)}}) returns {{NULL}}
> * Exception handling in {{get_current_thread_id}} (both 
> {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) 
> seems to have several issues; lots of JNI methods are called without checking 
> for exceptions
> * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} 
> in {{hdfs.c}} do not check for exceptions properly
> ** e.g. for {{GetObjectArrayElement}} they only check if the result of the 
> operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for 
> pending exceptions as well



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14348) Fix JNI exception handling issues in libhdfs

2019-03-13 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14348:

Status: Patch Available  (was: Open)

> Fix JNI exception handling issues in libhdfs
> 
>
> Key: HDFS-14348
> URL: https://issues.apache.org/jira/browse/HDFS-14348
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> During some manual digging through the libhdfs code, we found several places 
> where we are not handling exceptions properly.
> Specifically, there seem to be some violation of the following snippet from 
> the JNI Oracle docs 
> (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes):
> {quote}
> *Exceptions and Error Codes*
> Certain JNI functions use the Java exception mechanism to report error 
> conditions. In most cases, JNI functions report error conditions by returning 
> an error code and throwing a Java exception. The error code is usually a 
> special return value (such as NULL) that is outside of the range of normal 
> return values. Therefore, the programmer can quickly check the return value 
> of the last JNI call to determine if an error has occurred, and call a 
> function, ExceptionOccurred(), to obtain the exception object that contains a 
> more detailed description of the error condition.
> There are two cases where the programmer needs to check for exceptions 
> without being able to first check an error code:
> [1] The JNI functions that invoke a Java method return the result of the Java 
> method. The programmer must call ExceptionOccurred() to check for possible 
> exceptions that occurred during the execution of the Java method.
> [2] Some of the JNI array access functions do not return an error code, but 
> may throw an ArrayIndexOutOfBoundsException or ArrayStoreException.
> In all other cases, a non-error return value guarantees that no exceptions 
> have been thrown.
> {quote}
> Here is a running list of issues:
> * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but 
> does not check if an exception has occurred, it only checks if the result of 
> the method (in this case {{Class#getName(String)}}) returns {{NULL}}
> * Exception handling in {{get_current_thread_id}} (both 
> {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) 
> seems to have several issues; lots of JNI methods are called without checking 
> for exceptions
> * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} 
> in {{hdfs.c}} do not check for exceptions properly
> ** e.g. for {{GetObjectArrayElement}} they only check if the result of the 
> operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for 
> pending exceptions as well



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path

2019-03-13 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-3246:
---
Status: Patch Available  (was: Open)

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, 
> HDFS-3246.006.patch, HDFS-3246.007.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path

2019-03-13 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-3246:
---
Status: Open  (was: Patch Available)

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, 
> HDFS-3246.006.patch, HDFS-3246.007.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14348) Fix JNI exception handling issues in libhdfs

2019-03-13 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14348:

Status: Patch Available  (was: Open)

> Fix JNI exception handling issues in libhdfs
> 
>
> Key: HDFS-14348
> URL: https://issues.apache.org/jira/browse/HDFS-14348
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> During some manual digging through the libhdfs code, we found several places 
> where we are not handling exceptions properly.
> Specifically, there seem to be some violation of the following snippet from 
> the JNI Oracle docs 
> (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes):
> {quote}
> *Exceptions and Error Codes*
> Certain JNI functions use the Java exception mechanism to report error 
> conditions. In most cases, JNI functions report error conditions by returning 
> an error code and throwing a Java exception. The error code is usually a 
> special return value (such as NULL) that is outside of the range of normal 
> return values. Therefore, the programmer can quickly check the return value 
> of the last JNI call to determine if an error has occurred, and call a 
> function, ExceptionOccurred(), to obtain the exception object that contains a 
> more detailed description of the error condition.
> There are two cases where the programmer needs to check for exceptions 
> without being able to first check an error code:
> [1] The JNI functions that invoke a Java method return the result of the Java 
> method. The programmer must call ExceptionOccurred() to check for possible 
> exceptions that occurred during the execution of the Java method.
> [2] Some of the JNI array access functions do not return an error code, but 
> may throw an ArrayIndexOutOfBoundsException or ArrayStoreException.
> In all other cases, a non-error return value guarantees that no exceptions 
> have been thrown.
> {quote}
> Here is a running list of issues:
> * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but 
> does not check if an exception has occurred, it only checks if the result of 
> the method (in this case {{Class#getName(String)}}) returns {{NULL}}
> * Exception handling in {{get_current_thread_id}} (both 
> {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) 
> seems to have several issues; lots of JNI methods are called without checking 
> for exceptions
> * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} 
> in {{hdfs.c}} do not check for exceptions properly
> ** e.g. for {{GetObjectArrayElement}} they only check if the result of the 
> operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for 
> pending exceptions as well



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14348) Fix JNI exception handling issues in libhdfs

2019-03-13 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791757#comment-16791757
 ] 

Sahil Takiar commented on HDFS-14348:
-

PR: https://github.com/apache/hadoop/pull/600

> Fix JNI exception handling issues in libhdfs
> 
>
> Key: HDFS-14348
> URL: https://issues.apache.org/jira/browse/HDFS-14348
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> During some manual digging through the libhdfs code, we found several places 
> where we are not handling exceptions properly.
> Specifically, there seem to be some violation of the following snippet from 
> the JNI Oracle docs 
> (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes):
> {quote}
> *Exceptions and Error Codes*
> Certain JNI functions use the Java exception mechanism to report error 
> conditions. In most cases, JNI functions report error conditions by returning 
> an error code and throwing a Java exception. The error code is usually a 
> special return value (such as NULL) that is outside of the range of normal 
> return values. Therefore, the programmer can quickly check the return value 
> of the last JNI call to determine if an error has occurred, and call a 
> function, ExceptionOccurred(), to obtain the exception object that contains a 
> more detailed description of the error condition.
> There are two cases where the programmer needs to check for exceptions 
> without being able to first check an error code:
> [1] The JNI functions that invoke a Java method return the result of the Java 
> method. The programmer must call ExceptionOccurred() to check for possible 
> exceptions that occurred during the execution of the Java method.
> [2] Some of the JNI array access functions do not return an error code, but 
> may throw an ArrayIndexOutOfBoundsException or ArrayStoreException.
> In all other cases, a non-error return value guarantees that no exceptions 
> have been thrown.
> {quote}
> Here is a running list of issues:
> * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but 
> does not check if an exception has occurred, it only checks if the result of 
> the method (in this case {{Class#getName(String)}}) returns {{NULL}}
> * Exception handling in {{get_current_thread_id}} (both 
> {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) 
> seems to have several issues; lots of JNI methods are called without checking 
> for exceptions
> * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} 
> in {{hdfs.c}} do not check for exceptions properly
> ** e.g. for {{GetObjectArrayElement}} they only check if the result of the 
> operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for 
> pending exceptions as well



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14348) Fix JNI exception handling issues in libhdfs

2019-03-12 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790898#comment-16790898
 ] 

Sahil Takiar commented on HDFS-14348:
-

Found some more issues:

{code}
static jthrowable hadoopRzOptionsGetEnumSet(JNIEnv *env,
struct hadoopRzOptions *opts, jobject *enumSet)
{
...
jclass clazz = (*env)->FindClass(env, READ_OPTION);
if (!clazz) {
jthr = newRuntimeError(env, "failed "
"to find class for %s", READ_OPTION);
goto done;
}
...
{code}

{code}
jthrowable newRuntimeError(JNIEnv *env, const char *fmt, ...)
{
char buf[512];
jobject out, exc;
jstring jstr;
va_list ap;

va_start(ap, fmt);
vsnprintf(buf, sizeof(buf), fmt, ap);
va_end(ap);
jstr = (*env)->NewStringUTF(env, buf);
if (!jstr) {
// We got an out of memory exception rather than a RuntimeException.
// Too bad...
return getPendingExceptionAndClear(env);
}
...
{code}

The issue is that {{FindClass}} can throw an error, but the call to 
{{newRuntimeError}} calls {{NewStringUTF}} without clearing the pending 
exception possibly thrown by {{FindClass}}. According to 
https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exception_handling
 this is illegal; you cannot call {{NewStringUTF}} while there is an exception 
pending.

I think I missed this in HDFS-14321: {{hadoopRzOptionsSetByteBufferPool}} calls 
{{opts->byteBufferPool = (*env)->NewGlobalRef(env, byteBufferPool)}} but does 
not check for exceptions afterwards.

> Fix JNI exception handling issues in libhdfs
> 
>
> Key: HDFS-14348
> URL: https://issues.apache.org/jira/browse/HDFS-14348
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> During some manual digging through the libhdfs code, we found several places 
> where we are not handling exceptions properly.
> Specifically, there seem to be some violation of the following snippet from 
> the JNI Oracle docs 
> (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes):
> {quote}
> *Exceptions and Error Codes*
> Certain JNI functions use the Java exception mechanism to report error 
> conditions. In most cases, JNI functions report error conditions by returning 
> an error code and throwing a Java exception. The error code is usually a 
> special return value (such as NULL) that is outside of the range of normal 
> return values. Therefore, the programmer can quickly check the return value 
> of the last JNI call to determine if an error has occurred, and call a 
> function, ExceptionOccurred(), to obtain the exception object that contains a 
> more detailed description of the error condition.
> There are two cases where the programmer needs to check for exceptions 
> without being able to first check an error code:
> [1] The JNI functions that invoke a Java method return the result of the Java 
> method. The programmer must call ExceptionOccurred() to check for possible 
> exceptions that occurred during the execution of the Java method.
> [2] Some of the JNI array access functions do not return an error code, but 
> may throw an ArrayIndexOutOfBoundsException or ArrayStoreException.
> In all other cases, a non-error return value guarantees that no exceptions 
> have been thrown.
> {quote}
> Here is a running list of issues:
> * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but 
> does not check if an exception has occurred, it only checks if the result of 
> the method (in this case {{Class#getName(String)}}) returns {{NULL}}
> * Exception handling in {{get_current_thread_id}} (both 
> {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) 
> seems to have several issues; lots of JNI methods are called without checking 
> for exceptions
> * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} 
> in {{hdfs.c}} do not check for exceptions properly
> ** e.g. for {{GetObjectArrayElement}} they only check if the result of the 
> operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for 
> pending exceptions as well



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-3246) pRead equivalent for direct read path

2019-03-12 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790786#comment-16790786
 ] 

Sahil Takiar commented on HDFS-3246:


[~openinx] created a PR: https://github.com/apache/hadoop/pull/597

The latest version fixes a few issues reported by Hadoop QA, and adds some more 
Javadocs to {{CryptoInputStream}} to make the changes easier to understand.

Let me see if I can find someone more comfortable with libhdfs to review the 
libhdfs code.

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, 
> HDFS-3246.006.patch, HDFS-3246.007.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path

2019-03-12 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-3246:
---
Status: Patch Available  (was: Open)

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, 
> HDFS-3246.006.patch, HDFS-3246.007.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path

2019-03-12 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-3246:
---
Status: Open  (was: Patch Available)

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, 
> HDFS-3246.006.patch, HDFS-3246.007.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs

2019-03-12 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790661#comment-16790661
 ] 

Sahil Takiar commented on HDFS-14304:
-

Opened a PR since the patch to fix this is quite large: 
https://github.com/apache/hadoop/pull/595 - the PR description provides some 
more details about the approach taken.

> High lock contention on hdfsHashMutex in libhdfs
> 
>
> Key: HDFS-14304
> URL: https://issues.apache.org/jira/browse/HDFS-14304
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While doing some performance profiling of an application using libhdfs, we 
> noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in 
> {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}}
> The issue is that every JNI method invocation done by {{hdfs.c}} goes through 
> a helper method called {{invokeMethod}}. {{invokeMethod}} calls 
> {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a 
> lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) 
> (the lock is acquired for both reads and writes). The hash table maps {{char 
> *className}} to {{jclass}} objects, it seems the goal of the hash table is to 
> avoid repeatedly creating {{jclass}} objects for each JNI call.
> For multi-threaded applications, this lock severely limits that rate at which 
> Java methods can be invoked. pstacks show a lot of time being spent on 
> {{hdfsHashMutex}}
> {code:java}
> #0  0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0
> #2  0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x027d8386 in mutexLock ()
> #4  0x027d0e7b in globalClassReference ()
> #5  0x027d1160 in invokeMethod ()
> #6  0x027d4176 in readDirect ()
> #7  0x027d4325 in hdfsRead ()
> {code}
> Same with {{perf report}}
> {code:java}
> +   63.36% 0.01%  [k] system_call_fastpath
> +   61.60% 0.12%  [k] sys_futex 
> +   61.45% 0.13%  [k] do_futex 
> +   57.54% 0.49%  [k] _raw_qspin_lock
> +   57.07% 0.01%  [k] queued_spin_lock_slowpath
> +   55.47%55.47%  [k] native_queued_spin_lock_slowpath
> -   35.68% 0.00%  [k] 0x6f6f6461682f6568
>- 0x6f6f6461682f6568 
>   - 30.55% __lll_lock_wait   
>  - 29.40% system_call_fastpath  
> - 29.39% sys_futex  
>- 29.35% do_futex   
>   - 29.27% futex_wait 
>  - 28.17% futex_wait_setup
> - 27.05% _raw_qspin_lock 
>- 27.05% queued_spin_lock_slowpath
> 26.30% native_queued_spin_lock_slowpath 
>   + 0.67% ret_from_intr 
>  + 0.71% futex_wait_queue_me
>   - 2.00% methodIdFromClass
>  - 1.94% jni_GetMethodID  
> - 1.71% get_method_id   
>  0.96% SymbolTable::lookup_only 
>   - 1.61% invokeMethod
>  - 0.62% jni_CallLongMethodV 
>   0.52% jni_invoke_nonstatic 
> 0.75% pthread_mutex_lock
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs

2019-03-12 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14304:

Status: Patch Available  (was: Open)

> High lock contention on hdfsHashMutex in libhdfs
> 
>
> Key: HDFS-14304
> URL: https://issues.apache.org/jira/browse/HDFS-14304
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While doing some performance profiling of an application using libhdfs, we 
> noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in 
> {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}}
> The issue is that every JNI method invocation done by {{hdfs.c}} goes through 
> a helper method called {{invokeMethod}}. {{invokeMethod}} calls 
> {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a 
> lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) 
> (the lock is acquired for both reads and writes). The hash table maps {{char 
> *className}} to {{jclass}} objects, it seems the goal of the hash table is to 
> avoid repeatedly creating {{jclass}} objects for each JNI call.
> For multi-threaded applications, this lock severely limits that rate at which 
> Java methods can be invoked. pstacks show a lot of time being spent on 
> {{hdfsHashMutex}}
> {code:java}
> #0  0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0
> #2  0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x027d8386 in mutexLock ()
> #4  0x027d0e7b in globalClassReference ()
> #5  0x027d1160 in invokeMethod ()
> #6  0x027d4176 in readDirect ()
> #7  0x027d4325 in hdfsRead ()
> {code}
> Same with {{perf report}}
> {code:java}
> +   63.36% 0.01%  [k] system_call_fastpath
> +   61.60% 0.12%  [k] sys_futex 
> +   61.45% 0.13%  [k] do_futex 
> +   57.54% 0.49%  [k] _raw_qspin_lock
> +   57.07% 0.01%  [k] queued_spin_lock_slowpath
> +   55.47%55.47%  [k] native_queued_spin_lock_slowpath
> -   35.68% 0.00%  [k] 0x6f6f6461682f6568
>- 0x6f6f6461682f6568 
>   - 30.55% __lll_lock_wait   
>  - 29.40% system_call_fastpath  
> - 29.39% sys_futex  
>- 29.35% do_futex   
>   - 29.27% futex_wait 
>  - 28.17% futex_wait_setup
> - 27.05% _raw_qspin_lock 
>- 27.05% queued_spin_lock_slowpath
> 26.30% native_queued_spin_lock_slowpath 
>   + 0.67% ret_from_intr 
>  + 0.71% futex_wait_queue_me
>   - 2.00% methodIdFromClass
>  - 1.94% jni_GetMethodID  
> - 1.71% get_method_id   
>  0.96% SymbolTable::lookup_only 
>   - 1.61% invokeMethod
>  - 0.62% jni_CallLongMethodV 
>   0.52% jni_invoke_nonstatic 
> 0.75% pthread_mutex_lock
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14348) Fix JNI exception handling issues in libhdfs

2019-03-08 Thread Sahil Takiar (JIRA)
Sahil Takiar created HDFS-14348:
---

 Summary: Fix JNI exception handling issues in libhdfs
 Key: HDFS-14348
 URL: https://issues.apache.org/jira/browse/HDFS-14348
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client, libhdfs, native
Reporter: Sahil Takiar
Assignee: Sahil Takiar


During some manual digging through the libhdfs code, we found several places 
where we are not handling exceptions properly.

Specifically, there seem to be some violation of the following snippet from the 
JNI Oracle docs 
(https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes):

{quote}
*Exceptions and Error Codes*

Certain JNI functions use the Java exception mechanism to report error 
conditions. In most cases, JNI functions report error conditions by returning 
an error code and throwing a Java exception. The error code is usually a 
special return value (such as NULL) that is outside of the range of normal 
return values. Therefore, the programmer can quickly check the return value of 
the last JNI call to determine if an error has occurred, and call a function, 
ExceptionOccurred(), to obtain the exception object that contains a more 
detailed description of the error condition.

There are two cases where the programmer needs to check for exceptions without 
being able to first check an error code:

[1] The JNI functions that invoke a Java method return the result of the Java 
method. The programmer must call ExceptionOccurred() to check for possible 
exceptions that occurred during the execution of the Java method.

[2] Some of the JNI array access functions do not return an error code, but may 
throw an ArrayIndexOutOfBoundsException or ArrayStoreException.

In all other cases, a non-error return value guarantees that no exceptions have 
been thrown.
{quote}

Here is a running list of issues:

* {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but does 
not check if an exception has occurred, it only checks if the result of the 
method (in this case {{Class#getName(String)}}) returns {{NULL}}
* Exception handling in {{get_current_thread_id}} (both 
{{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) seems 
to have several issues; lots of JNI methods are called without checking for 
exceptions
* Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} in 
{{hdfs.c}} do not check for exceptions properly
** e.g. for {{GetObjectArrayElement}} they only check if the result of the 
operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for 
pending exceptions as well



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-3246) pRead equivalent for direct read path

2019-03-08 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16788246#comment-16788246
 ] 

Sahil Takiar commented on HDFS-3246:


Re-based the patch, which required some updates now that HDFS-14111 has been 
merged. Re-basing on top of HDFS-14111 required added {{StreamCapabilities}} 
support, which from the discussion in HBASE-22005, seems like something we 
wanted to add anyway.

Fixed a few checkstyle issues and added a few more code comments.

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, 
> HDFS-3246.006.patch, HDFS-3246.007.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path

2019-03-08 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-3246:
---
Status: Patch Available  (was: Open)

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, 
> HDFS-3246.006.patch, HDFS-3246.007.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path

2019-03-08 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-3246:
---
Attachment: HDFS-3246.007.patch

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, 
> HDFS-3246.006.patch, HDFS-3246.007.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path

2019-03-08 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-3246:
---
Status: Open  (was: Patch Available)

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, 
> HDFS-3246.006.patch, HDFS-3246.007.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14111) hdfsOpenFile on HDFS causes unnecessary IO from file offset 0

2019-03-07 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786918#comment-16786918
 ] 

Sahil Takiar commented on HDFS-14111:
-

Thank you [~jojochuang]!

> hdfsOpenFile on HDFS causes unnecessary IO from file offset 0
> -
>
> Key: HDFS-14111
> URL: https://issues.apache.org/jira/browse/HDFS-14111
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, libhdfs
>Affects Versions: 3.2.0
>Reporter: Todd Lipcon
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14111.001.patch, HDFS-14111.002.patch, 
> HDFS-14111.003.patch
>
>
> hdfsOpenFile() calls readDirect() with a 0-length argument in order to check 
> whether the underlying stream supports bytebuffer reads. With DFSInputStream, 
> the read(0) isn't short circuited, and results in the DFSClient opening a 
> block reader. In the case of a remote block, the block reader will actually 
> issue a read of the whole block, causing the datanode to perform unnecessary 
> IO and network transfers in order to fill up the client's TCP buffers. This 
> causes performance degradation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-3246) pRead equivalent for direct read path

2019-03-07 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786917#comment-16786917
 ] 

Sahil Takiar commented on HDFS-3246:


[~openinx], I agree with Steve. Is it possible for HBase to run their tests 
against a mini-HDFS cluster, that way the tests can use {{DFSInputStream}} 
which does support the {{ByteBuffer}} interfaces.

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, 
> HDFS-3246.006.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-3246) pRead equivalent for direct read path

2019-03-05 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784761#comment-16784761
 ] 

Sahil Takiar commented on HDFS-3246:


[~jojochuang] yes, the implementation of {{decrypt}} was the trickiest part of 
this patch for me, mainly because I'm not familiar with how 
{{CryptoInputStream}} works. If there are other HDFS committers more familiar 
with this code, feedback is welcome. The new {{decrypt}} method is meant to be 
a merge of {{decrypt(long position, byte[] buffer, int offset, int length)}} 
and {{decrypt(ByteBuffer buf, int n, int start)}}. I made sure to add plenty of 
unit tests to make sure the changes in {{CryptoInputStream}} work properly.

As for your specific comments:
{quote}buf.position(start + len); --> not needed?
{quote}
The call to {{inBuffer.put(buf)}} on line 4 increments the position of {{buf}} 
which is why it is necessary to reset the position.
{quote}buf.limit(limit); --> not needed?
{quote}
The limit of the buffer is changed from its original limit on line 3; this call 
just resets the limit of the {{buf}} back to its original value, which is 
necessary because the method is not suppose to change the limit of the buffer.
{quote}len += outBuffer.remaining(); --> len += Math.min(n - len, 
inBuffer.remaining())?
{quote}
The next line calls {{buf.put(outBuffer)}} which will add all remaining bytes 
in {{outBuffer}} to {{buf}} (up until the limit of {{buf}} is reached). So 
adding {{outBuffer.remaining()}} should be fine. Technically, its not an 
accurate representation of how much data has been decrypted because the {{buf}} 
limit can be reached before all {{outBuffer}} bytes are transferred, but given 
how the method is structured it seems to be fine. The logic here is similar to 
what {{decrypt(ByteBuffer buf, int n, int start)}} does.

Essentially, this method decrypts {{buf}} chunk by chunk. It loads a chunk of 
the {{buf}} into a {{localInBuffer}} (which is borrowed from a buffer pool) and 
writes the decrypted data into a {{localOutBuffer}} (also borrowed from a 
buffer pool). Then it overwrites the chunk of {{buf}} that was originally 
loaded into the {{localInBuffer}}. 

Overall, I agree the way this method is structured is a bit confusing, but its 
inline with how the other decrypt methods work, which is why I wrote it this 
way.

I can add some more code comments if that will help as well.

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, 
> HDFS-3246.006.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14111) hdfsOpenFile on HDFS causes unnecessary IO from file offset 0

2019-03-04 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783841#comment-16783841
 ] 

Sahil Takiar commented on HDFS-14111:
-

I think this patch should be good to merge.

Talked to a few other folks about {{errno}} handling, and given that the C docs 
say {{The value in errno is significant only when the return value of the call 
indicated an error (i.e., -1 from most system calls; -1 or NULL from most 
library functions); a function that succeeds is allowed to change errno.}} I 
think the test changes made as part of this patch should be fine.

> hdfsOpenFile on HDFS causes unnecessary IO from file offset 0
> -
>
> Key: HDFS-14111
> URL: https://issues.apache.org/jira/browse/HDFS-14111
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, libhdfs
>Affects Versions: 3.2.0
>Reporter: Todd Lipcon
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-14111.001.patch, HDFS-14111.002.patch, 
> HDFS-14111.003.patch
>
>
> hdfsOpenFile() calls readDirect() with a 0-length argument in order to check 
> whether the underlying stream supports bytebuffer reads. With DFSInputStream, 
> the read(0) isn't short circuited, and results in the DFSClient opening a 
> block reader. In the case of a remote block, the block reader will actually 
> issue a read of the whole block, causing the datanode to perform unnecessary 
> IO and network transfers in order to fill up the client's TCP buffers. This 
> causes performance degradation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path

2019-03-01 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-3246:
---
Status: Patch Available  (was: Open)

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, 
> HDFS-3246.006.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path

2019-03-01 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-3246:
---
Status: Open  (was: Patch Available)

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, 
> HDFS-3246.006.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path

2019-03-01 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-3246:
---
Attachment: HDFS-3246.006.patch

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, 
> HDFS-3246.006.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14321) Fix -Xcheck:jni issues in libhdfs, run ctest with -Xcheck:jni enabled

2019-03-01 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782214#comment-16782214
 ] 

Sahil Takiar commented on HDFS-14321:
-

Digging into this some more, I think {{hadoopRzOptions::byteBufferPool}} should 
be a global ref. If {{hadoopRzOptionsSetByteBufferPool}} simply created a local 
ref to {{hadoopRzOptions::byteBufferPool}} then the reference would be lost 
once the porcess execution returned back to Java. By using a global ref, we 
ensure that {{byteBufferPool}} does not get garbage collected by the JVM. Since 
the {{byteBufferPool}} is expected to live across calls to {{hadoopReadZero}}, 
using a local ref does not make sense.

This is based on my understanding of the JNI and the difference between local 
vs. global references: 
http://journals.ecs.soton.ac.uk/java/tutorial/native1.1/implementing/refs.html 
I'm not a JNI expert, so my understanding might be off, but this patch fixes 
the {{FATAL ERROR}}.

The second part of this patch is to add {{-Xcheck:jni}} to {{LIBHDFS_OPTS}} 
when running all the libhdfs ctests. The drawback here is that adding this 
pollutes the logs with a bunch of warnings about exception handling (see 
above). The benefit is that it ensures we don't make any changes to libhdfs 
that would results in more fatal errors. IMO I think we can live with the 
extraneous logging, but open to changing this if others feel differently.

> Fix -Xcheck:jni issues in libhdfs, run ctest with -Xcheck:jni enabled
> -
>
> Key: HDFS-14321
> URL: https://issues.apache.org/jira/browse/HDFS-14321
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-14321.001.patch
>
>
> The JVM exposes an option called {{-Xcheck:jni}} which runs various checks 
> against JNI usage by applications. Further explanation of this JVM option can 
> be found in: 
> [https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/clopts002.html]
>  and 
> [https://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.0/com.ibm.java.vm.80.doc/docs/jni_debug.html].
>  When run with this option, the JVM will print out any warnings or errors it 
> encounters with the JNI.
> We should run the libhdfs tests with {{-Xcheck:jni}} (can be added to 
> {{LIBHDFS_OPTS}}) and fix any warnings / errors. We should add this option to 
> our ctest runs as well to ensure no regressions are introduced to libhdfs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14321) Fix -Xcheck:jni issues in libhdfs, run ctest with -Xcheck:jni enabled

2019-03-01 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14321:

Attachment: HDFS-14321.001.patch

> Fix -Xcheck:jni issues in libhdfs, run ctest with -Xcheck:jni enabled
> -
>
> Key: HDFS-14321
> URL: https://issues.apache.org/jira/browse/HDFS-14321
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-14321.001.patch
>
>
> The JVM exposes an option called {{-Xcheck:jni}} which runs various checks 
> against JNI usage by applications. Further explanation of this JVM option can 
> be found in: 
> [https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/clopts002.html]
>  and 
> [https://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.0/com.ibm.java.vm.80.doc/docs/jni_debug.html].
>  When run with this option, the JVM will print out any warnings or errors it 
> encounters with the JNI.
> We should run the libhdfs tests with {{-Xcheck:jni}} (can be added to 
> {{LIBHDFS_OPTS}}) and fix any warnings / errors. We should add this option to 
> our ctest runs as well to ensure no regressions are introduced to libhdfs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14321) Fix -Xcheck:jni issues in libhdfs, run ctest with -Xcheck:jni enabled

2019-03-01 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14321:

Status: Patch Available  (was: Open)

> Fix -Xcheck:jni issues in libhdfs, run ctest with -Xcheck:jni enabled
> -
>
> Key: HDFS-14321
> URL: https://issues.apache.org/jira/browse/HDFS-14321
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-14321.001.patch
>
>
> The JVM exposes an option called {{-Xcheck:jni}} which runs various checks 
> against JNI usage by applications. Further explanation of this JVM option can 
> be found in: 
> [https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/clopts002.html]
>  and 
> [https://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.0/com.ibm.java.vm.80.doc/docs/jni_debug.html].
>  When run with this option, the JVM will print out any warnings or errors it 
> encounters with the JNI.
> We should run the libhdfs tests with {{-Xcheck:jni}} (can be added to 
> {{LIBHDFS_OPTS}}) and fix any warnings / errors. We should add this option to 
> our ctest runs as well to ensure no regressions are introduced to libhdfs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14321) Fix -Xcheck:jni issues in libhdfs, run ctest with -Xcheck:jni enabled

2019-03-01 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16778763#comment-16778763
 ] 

Sahil Takiar edited comment on HDFS-14321 at 3/1/19 10:04 PM:
--

When running the existing tests with {{\-Xcheck:jni}} I only see one error: 
{{FATAL ERROR in native method: Invalid global JNI handle passed to 
DeleteGlobalRef}}, which seems to be caused by {{hadoopRzOptionsFree}} calling 
{{DeleteGlobalRef}} on {{opts->byteBufferPool}} which is not a global ref. It's 
not clear to me how big an issue this is, since {{opts->byteBufferPool}} should 
be a local ref that is automatically deleted when the native method exits.

There are a bunch of warnings of the form {{WARNING in native method: JNI call 
made without checking exceptions when required to from ...}} - after debugging 
these warnings, most of them seem to be caused by the JVM itself (e.g. internal 
JDK code). So they would have to fixed within the JDK itself.


was (Author: stakiar):
When running the existing tests with {{-Xcheck:jni}} I only see one error: 
{{FATAL ERROR in native method: Invalid global JNI handle passed to 
DeleteGlobalRef}}, which seems to be caused by {{hadoopRzOptionsFree}} calling 
{{DeleteGlobalRef}} on {{opts->byteBufferPool}} which is not a global ref. It's 
not clear to me how big an issue this is, since {{opts->byteBufferPool}} should 
be a local ref that is automatically deleted when the native method exits.

There are a bunch of warnings of the form {{WARNING in native method: JNI call 
made without checking exceptions when required to from ...}} - after debugging 
these warnings, most of them seem to be caused by the JVM itself (e.g. internal 
JDK code). So they would have to fixed within the JDK itself.

> Fix -Xcheck:jni issues in libhdfs, run ctest with -Xcheck:jni enabled
> -
>
> Key: HDFS-14321
> URL: https://issues.apache.org/jira/browse/HDFS-14321
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> The JVM exposes an option called {{-Xcheck:jni}} which runs various checks 
> against JNI usage by applications. Further explanation of this JVM option can 
> be found in: 
> [https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/clopts002.html]
>  and 
> [https://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.0/com.ibm.java.vm.80.doc/docs/jni_debug.html].
>  When run with this option, the JVM will print out any warnings or errors it 
> encounters with the JNI.
> We should run the libhdfs tests with {{-Xcheck:jni}} (can be added to 
> {{LIBHDFS_OPTS}}) and fix any warnings / errors. We should add this option to 
> our ctest runs as well to ensure no regressions are introduced to libhdfs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path

2019-03-01 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-3246:
---
Status: Open  (was: Patch Available)

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path

2019-03-01 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-3246:
---
Attachment: HDFS-3246.005.patch

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path

2019-03-01 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-3246:
---
Status: Patch Available  (was: Open)

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-3246) pRead equivalent for direct read path

2019-03-01 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782101#comment-16782101
 ] 

Sahil Takiar commented on HDFS-3246:


[~anoop.hbase] {quote} When calling this API with buf remaining size of n and 
the file is having data size > n after given position, is it guaranteed to read 
the whole n bytes into BB in one go? Just wanted to confirm. Thanks. {quote}

Unfortunately, the existing APIs aren't clear on this behavior. The 
{{ByteBufferPositionedReadable}} interface is meant to follow the same 
semantics as {{PositionedReadable}} and {{ByteBufferReadable}}. 
{{PositionedReadable}} says it "Read[s] up to the specified number of bytes" 
and {{ByteBufferReadable}} says it "Reads up to buf.remaining() bytes". In 
practice, it looks like pread in {{DFSInputStream}} follows the behavior you 
have described, e.g. it either reads until {{ByteBuffer#hasRemaining()}} 
returns false, or there are no more bytes in the file. 
{{ByteBufferPositionedReadable}} should follow the same behavior for 
{{DFSInputStream}}.

[~jojochuang] thanks for the review comments. I've got this implemented for 
{{CryptoInputStream}} as well and will post a patch soon.

As far as testing goes. I've tested the libhdfs path via Impala on a real 
cluster and everything seemed to be working as expected (have not tested 
against an encrypted HDFS cluster).

Will post an updated patch shortly.

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14111) hdfsOpenFile on HDFS causes unnecessary IO from file offset 0

2019-02-28 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780727#comment-16780727
 ] 

Sahil Takiar commented on HDFS-14111:
-

Posted updated patch. Fixed comments provided by Todd.

The {{test_hdfs_ext_hdfspp_test_shim_static}} failure seems to be a genuine 
issue caused by this patch. The failure seems to be due to the fact that 
libhdfs++ and libhdfs have inconsistent handling of {{errno}}. For now, I 
disabled the assertion that was failing, and filed a follow up JIRA HDFS-14325. 
HDFS-14325 has more details, once it is fixed, we can add the assertion back in.

> hdfsOpenFile on HDFS causes unnecessary IO from file offset 0
> -
>
> Key: HDFS-14111
> URL: https://issues.apache.org/jira/browse/HDFS-14111
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, libhdfs
>Affects Versions: 3.2.0
>Reporter: Todd Lipcon
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-14111.001.patch, HDFS-14111.002.patch, 
> HDFS-14111.003.patch
>
>
> hdfsOpenFile() calls readDirect() with a 0-length argument in order to check 
> whether the underlying stream supports bytebuffer reads. With DFSInputStream, 
> the read(0) isn't short circuited, and results in the DFSClient opening a 
> block reader. In the case of a remote block, the block reader will actually 
> issue a read of the whole block, causing the datanode to perform unnecessary 
> IO and network transfers in order to fill up the client's TCP buffers. This 
> causes performance degradation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14111) hdfsOpenFile on HDFS causes unnecessary IO from file offset 0

2019-02-28 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14111:

Status: Patch Available  (was: Open)

> hdfsOpenFile on HDFS causes unnecessary IO from file offset 0
> -
>
> Key: HDFS-14111
> URL: https://issues.apache.org/jira/browse/HDFS-14111
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, libhdfs
>Affects Versions: 3.2.0
>Reporter: Todd Lipcon
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-14111.001.patch, HDFS-14111.002.patch, 
> HDFS-14111.003.patch
>
>
> hdfsOpenFile() calls readDirect() with a 0-length argument in order to check 
> whether the underlying stream supports bytebuffer reads. With DFSInputStream, 
> the read(0) isn't short circuited, and results in the DFSClient opening a 
> block reader. In the case of a remote block, the block reader will actually 
> issue a read of the whole block, causing the datanode to perform unnecessary 
> IO and network transfers in order to fill up the client's TCP buffers. This 
> causes performance degradation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14111) hdfsOpenFile on HDFS causes unnecessary IO from file offset 0

2019-02-28 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14111:

Status: Open  (was: Patch Available)

> hdfsOpenFile on HDFS causes unnecessary IO from file offset 0
> -
>
> Key: HDFS-14111
> URL: https://issues.apache.org/jira/browse/HDFS-14111
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, libhdfs
>Affects Versions: 3.2.0
>Reporter: Todd Lipcon
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-14111.001.patch, HDFS-14111.002.patch, 
> HDFS-14111.003.patch
>
>
> hdfsOpenFile() calls readDirect() with a 0-length argument in order to check 
> whether the underlying stream supports bytebuffer reads. With DFSInputStream, 
> the read(0) isn't short circuited, and results in the DFSClient opening a 
> block reader. In the case of a remote block, the block reader will actually 
> issue a read of the whole block, causing the datanode to perform unnecessary 
> IO and network transfers in order to fill up the client's TCP buffers. This 
> causes performance degradation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14111) hdfsOpenFile on HDFS causes unnecessary IO from file offset 0

2019-02-28 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14111:

Attachment: HDFS-14111.003.patch

> hdfsOpenFile on HDFS causes unnecessary IO from file offset 0
> -
>
> Key: HDFS-14111
> URL: https://issues.apache.org/jira/browse/HDFS-14111
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, libhdfs
>Affects Versions: 3.2.0
>Reporter: Todd Lipcon
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-14111.001.patch, HDFS-14111.002.patch, 
> HDFS-14111.003.patch
>
>
> hdfsOpenFile() calls readDirect() with a 0-length argument in order to check 
> whether the underlying stream supports bytebuffer reads. With DFSInputStream, 
> the read(0) isn't short circuited, and results in the DFSClient opening a 
> block reader. In the case of a remote block, the block reader will actually 
> issue a read of the whole block, causing the datanode to perform unnecessary 
> IO and network transfers in order to fill up the client's TCP buffers. This 
> causes performance degradation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14325) Revise usage of errno in libhdfs

2019-02-28 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780677#comment-16780677
 ] 

Sahil Takiar edited comment on HDFS-14325 at 2/28/19 4:05 PM:
--

[~bobhansen], [~anatoli.shein], [~James Clampffer] wondering if you have any 
comments on this since it involves libhdfs++

The TL;DR is that libhdfs and libhdfs++ have inconsistent handling of setting 
{{errno}} on success, which causes issues for some of the shim-based libhdfs++ 
tests.


was (Author: stakiar):
[~bobhansen], [~anatoli.shein], [~James Clampffer] wondering if you have any 
comments on this since it involves libhdfs++. The TL;DR is that libhdfs and 
libhdfs++ have inconsistent handling of setting {{errno}} on success, which 
causes issues for some of the shim-based libhdfs++ tests.

> Revise usage of errno in libhdfs
> 
>
> Key: HDFS-14325
> URL: https://issues.apache.org/jira/browse/HDFS-14325
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> The usage of [errno|http://man7.org/linux/man-pages/man3/errno.3.html] in 
> libhdfs has gone through several changes in the past: HDFS-3675, HDFS-4997, 
> HDFS-3579, HDFS-8407, etc.
> As a result of these changes, some libhdfs functions set {{errno}} to 0 on 
> success ({{hadoopReadZero}}, {{hdfsListDirectory}}), while several set 
> {{errno}} to a meaningful value only on error.
> libhdfs++ on the other hand sets {{errno}} to 0 for all (successful) 
> libhdfs++ operations. See 
> [this|https://issues.apache.org/jira/browse/HDFS-10511?focusedCommentId=15322696=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15322696]
>  comment in HDFS-10511 for why that was done.
> The inconsistent behavior between libhdfs and libhdfs++ causes issues for 
> tests such as {{test_hdfs_ext_hdfspp_test_shim_static}} which uses a shim 
> layer ({{tests/hdfs_shim.c}}) that delegates to both {{hdfs.c}} and 
> {{hdfs.cc}} for various operations (e.g. opening / closing files uses both 
> APIs, {{hdfsWrite}} delegates to libhdfs since libhdfs++ does not support 
> writes yet). The tests expect {{errno}} to be set to 0 after successful 
> operations against the shim layer. Since libhdfs is not guaranteed to set 
> {{errno}} to 0 on success, tests can start failing.
> One example of the inconsistency causing issues is HDFS-14111, the patch for 
> HDFS-14111 happens to change the {{errno}} from 0 to 2 for {{hdfsCloseFile}}. 
> However, from libhdfs's perspective this seems to be by design. Quoting from 
> the {{errno}} C docs:
> {quote}The value in errno is significant only when the return value of the 
> call indicated an error (i.e., -1 from most system calls; -1 or NULL from 
> most library functions); a function that succeeds is allowed to change errno.
> {quote}
> I was not able to pin down why the patch for HDFS-14111 changed the {{errno}} 
> value, but I isolated the change to the {{FileSystem#close}} call. Most 
> likely, some C function invoked as a result of calling {{#close}} failed and 
> changed the {{errno}} value, but the {{#close}} was still able to succeed 
> (this is most likely expected behavior, see 
> [this|https://issues.apache.org/jira/browse/HDFS-8407?focusedCommentId=14551225=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14551225]
>  comment in HDFS-8407 for further validation).
> Going forward we could (1) set {{errno}} to 0 for all successful libhdfs 
> functions, which would make the libhdfs behavior consistent with the 
> libhdfs++ behavior, or (2) we could live with the discrepancy, which would 
> require modifying from the libhdfs++ shim tests (which assert that {{errno}} 
> is 0 after certain operations) and just document the difference in behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14325) Revise usage of errno in libhdfs

2019-02-28 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780677#comment-16780677
 ] 

Sahil Takiar commented on HDFS-14325:
-

[~bobhansen], [~anatoli.shein], [~James Clampffer] wondering if you have any 
comments on this since it involves libhdfs++. The TL;DR is that libhdfs and 
libhdfs++ have inconsistent handling of setting {{errno}} on success, which 
causes issues for some of the shim-based libhdfs++ tests.

> Revise usage of errno in libhdfs
> 
>
> Key: HDFS-14325
> URL: https://issues.apache.org/jira/browse/HDFS-14325
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> The usage of [errno|http://man7.org/linux/man-pages/man3/errno.3.html] in 
> libhdfs has gone through several changes in the past: HDFS-3675, HDFS-4997, 
> HDFS-3579, HDFS-8407, etc.
> As a result of these changes, some libhdfs functions set {{errno}} to 0 on 
> success ({{hadoopReadZero}}, {{hdfsListDirectory}}), while several set 
> {{errno}} to a meaningful value only on error.
> libhdfs++ on the other hand sets {{errno}} to 0 for all (successful) 
> libhdfs++ operations. See 
> [this|https://issues.apache.org/jira/browse/HDFS-10511?focusedCommentId=15322696=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15322696]
>  comment in HDFS-10511 for why that was done.
> The inconsistent behavior between libhdfs and libhdfs++ causes issues for 
> tests such as {{test_hdfs_ext_hdfspp_test_shim_static}} which uses a shim 
> layer ({{tests/hdfs_shim.c}}) that delegates to both {{hdfs.c}} and 
> {{hdfs.cc}} for various operations (e.g. opening / closing files uses both 
> APIs, {{hdfsWrite}} delegates to libhdfs since libhdfs++ does not support 
> writes yet). The tests expect {{errno}} to be set to 0 after successful 
> operations against the shim layer. Since libhdfs is not guaranteed to set 
> {{errno}} to 0 on success, tests can start failing.
> One example of the inconsistency causing issues is HDFS-14111, the patch for 
> HDFS-14111 happens to change the {{errno}} from 0 to 2 for {{hdfsCloseFile}}. 
> However, from libhdfs's perspective this seems to be by design. Quoting from 
> the {{errno}} C docs:
> {quote}The value in errno is significant only when the return value of the 
> call indicated an error (i.e., -1 from most system calls; -1 or NULL from 
> most library functions); a function that succeeds is allowed to change errno.
> {quote}
> I was not able to pin down why the patch for HDFS-14111 changed the {{errno}} 
> value, but I isolated the change to the {{FileSystem#close}} call. Most 
> likely, some C function invoked as a result of calling {{#close}} failed and 
> changed the {{errno}} value, but the {{#close}} was still able to succeed 
> (this is most likely expected behavior, see 
> [this|https://issues.apache.org/jira/browse/HDFS-8407?focusedCommentId=14551225=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14551225]
>  comment in HDFS-8407 for further validation).
> Going forward we could (1) set {{errno}} to 0 for all successful libhdfs 
> functions, which would make the libhdfs behavior consistent with the 
> libhdfs++ behavior, or (2) we could live with the discrepancy, which would 
> require modifying from the libhdfs++ shim tests (which assert that {{errno}} 
> is 0 after certain operations) and just document the difference in behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14325) Revise usage of errno in libhdfs

2019-02-28 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HDFS-14325:

Description: 
The usage of [errno|http://man7.org/linux/man-pages/man3/errno.3.html] in 
libhdfs has gone through several changes in the past: HDFS-3675, HDFS-4997, 
HDFS-3579, HDFS-8407, etc.

As a result of these changes, some libhdfs functions set {{errno}} to 0 on 
success ({{hadoopReadZero}}, {{hdfsListDirectory}}), while several set 
{{errno}} to a meaningful value only on error.

libhdfs++ on the other hand sets {{errno}} to 0 for all (successful) libhdfs++ 
operations. See 
[this|https://issues.apache.org/jira/browse/HDFS-10511?focusedCommentId=15322696=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15322696]
 comment in HDFS-10511 for why that was done.

The inconsistent behavior between libhdfs and libhdfs++ causes issues for tests 
such as {{test_hdfs_ext_hdfspp_test_shim_static}} which uses a shim layer 
({{tests/hdfs_shim.c}}) that delegates to both {{hdfs.c}} and {{hdfs.cc}} for 
various operations (e.g. opening / closing files uses both APIs, {{hdfsWrite}} 
delegates to libhdfs since libhdfs++ does not support writes yet). The tests 
expect {{errno}} to be set to 0 after successful operations against the shim 
layer. Since libhdfs is not guaranteed to set {{errno}} to 0 on success, tests 
can start failing.

One example of the inconsistency causing issues is HDFS-14111, the patch for 
HDFS-14111 happens to change the {{errno}} from 0 to 2 for {{hdfsCloseFile}}. 
However, from libhdfs's perspective this seems to be by design. Quoting from 
the {{errno}} C docs:
{quote}The value in errno is significant only when the return value of the call 
indicated an error (i.e., -1 from most system calls; -1 or NULL from most 
library functions); a function that succeeds is allowed to change errno.
{quote}
I was not able to pin down why the patch for HDFS-14111 changed the {{errno}} 
value, but I isolated the change to the {{FileSystem#close}} call. Most likely, 
some C function invoked as a result of calling {{#close}} failed and changed 
the {{errno}} value, but the {{#close}} was still able to succeed (this is most 
likely expected behavior, see 
[this|https://issues.apache.org/jira/browse/HDFS-8407?focusedCommentId=14551225=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14551225]
 comment in HDFS-8407 for further validation).

Going forward we could (1) set {{errno}} to 0 for all successful libhdfs 
functions, which would make the libhdfs behavior consistent with the libhdfs++ 
behavior, or (2) we could live with the discrepancy, which would require 
modifying from the libhdfs++ shim tests (which assert that {{errno}} is 0 after 
certain operations) and just document the difference in behavior.

  was:
The usage of [errno|http://man7.org/linux/man-pages/man3/errno.3.html] in 
libhdfs has gone through several changes in the past: HDFS-3675, HDFS-4997, 
HDFS-3579, HDFS-8407, etc.

As a result of these changes, some libhdfs functions set {{errno}} to 0 on 
success ({{hadoopReadZero}}, {{hdfsListDirectory}}), while several set 
{{errno}} to a meaningful value only on error.

libhdfs++ on the other hand sets {{errno}} to 0 for all (successful) libhdfs++ 
operations. See this comment in HDFS-10511 for why that was done.

The inconsistent behavior between libhdfs and libhdfs++ causes issues for tests 
such as {{test_hdfs_ext_hdfspp_test_shim_static}} which uses a shim layer 
({{tests/hdfs_shim.c}}) that delegates to both {{hdfs.c}} and {{hdfs.cc}} for 
various operations (e.g. opening / closing files uses both APIs, {{hdfsWrite}} 
delegates to libhdfs since libhdfs++ does not support writes yet). The tests 
expect {{errno}} to be set to 0 after successful operations against the shim 
layer. Since libhdfs is not guaranteed to set {{errno}} to 0 on success, tests 
can start failing.

One example of the inconsistency causing issues is HDFS-14111, the patch for 
HDFS-14111 happens to change the {{errno}} from 0 to 2 for {{hdfsCloseFile}}. 
However, from libhdfs's perspective this seems to be by design. Quoting from 
the {{errno}} C docs:
{quote}The value in errno is significant only when the return value of the call 
indicated an error (i.e., -1 from most system calls; -1 or NULL from most 
library functions); a function that succeeds is allowed to change errno.
{quote}
I was not able to pin down why the patch for HDFS-14111 changed the {{errno}} 
value, but I isolated the change to the {{FileSystem#close}} call. Most likely, 
some C function invoked as a result of calling {{#close}} failed and changed 
the {{errno}} value, but the {{#close}} was still able to succeed (this is most 
likely expected behavior, see this comment in HDFS-8407 for further validation).

Going forward we could (1) set {{errno}} to 0 for all successful libhdfs 
functions, which would make 

[jira] [Created] (HDFS-14325) Revise usage of errno in libhdfs

2019-02-28 Thread Sahil Takiar (JIRA)
Sahil Takiar created HDFS-14325:
---

 Summary: Revise usage of errno in libhdfs
 Key: HDFS-14325
 URL: https://issues.apache.org/jira/browse/HDFS-14325
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client, libhdfs, native
Reporter: Sahil Takiar
Assignee: Sahil Takiar


The usage of [errno|http://man7.org/linux/man-pages/man3/errno.3.html] in 
libhdfs has gone through several changes in the past: HDFS-3675, HDFS-4997, 
HDFS-3579, HDFS-8407, etc.

As a result of these changes, some libhdfs functions set {{errno}} to 0 on 
success ({{hadoopReadZero}}, {{hdfsListDirectory}}), while several set 
{{errno}} to a meaningful value only on error.

libhdfs++ on the other hand sets {{errno}} to 0 for all (successful) libhdfs++ 
operations. See this comment in HDFS-10511 for why that was done.

The inconsistent behavior between libhdfs and libhdfs++ causes issues for tests 
such as {{test_hdfs_ext_hdfspp_test_shim_static}} which uses a shim layer 
({{tests/hdfs_shim.c}}) that delegates to both {{hdfs.c}} and {{hdfs.cc}} for 
various operations (e.g. opening / closing files uses both APIs, {{hdfsWrite}} 
delegates to libhdfs since libhdfs++ does not support writes yet). The tests 
expect {{errno}} to be set to 0 after successful operations against the shim 
layer. Since libhdfs is not guaranteed to set {{errno}} to 0 on success, tests 
can start failing.

One example of the inconsistency causing issues is HDFS-14111, the patch for 
HDFS-14111 happens to change the {{errno}} from 0 to 2 for {{hdfsCloseFile}}. 
However, from libhdfs's perspective this seems to be by design. Quoting from 
the {{errno}} C docs:
{quote}The value in errno is significant only when the return value of the call 
indicated an error (i.e., -1 from most system calls; -1 or NULL from most 
library functions); a function that succeeds is allowed to change errno.
{quote}
I was not able to pin down why the patch for HDFS-14111 changed the {{errno}} 
value, but I isolated the change to the {{FileSystem#close}} call. Most likely, 
some C function invoked as a result of calling {{#close}} failed and changed 
the {{errno}} value, but the {{#close}} was still able to succeed (this is most 
likely expected behavior, see this comment in HDFS-8407 for further validation).

Going forward we could (1) set {{errno}} to 0 for all successful libhdfs 
functions, which would make the libhdfs behavior consistent with the libhdfs++ 
behavior, or (2) we could live with the discrepancy, which would require 
modifying from the libhdfs++ shim tests (which assert that {{errno}} is 0 after 
certain operations) and just document the difference in behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14083) libhdfs logs errors when opened FS doesn't support ByteBufferReadable

2019-02-27 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779549#comment-16779549
 ] 

Sahil Takiar commented on HDFS-14083:
-

[~yzhangal] HDFS-14111 fixes this. It removes the log line 
{{UnsupportedOperationException: Byte-buffer read unsupported by input}} 
completely. Given the approach in HDFS-14111 I don't think the log line is ever 
necessary. The {{StreamCapabilities}} interface in Hadoop captures whether or 
not a stream supports the {{readDirect}} code path.

I suggest we close this JIRA in favor of HDFS-14111. The solution in this JIRA 
is really just masking the problem rather than fixing it. As far I understand, 
it decreases the frequency at which the logging occurs, whereas HDFS-14111 
removes the logging completely. This has multiple benefits (1) it avoids 
collecting a stack trace for each file open, and (2) simply decreasing the 
frequency at which logging occurs still causes confusion for users, the log 
indicates that some error occurred, whereas in reality lack of support for 
{{readDirect}} is an expected limitation.

> libhdfs logs errors when opened FS doesn't support ByteBufferReadable
> -
>
> Key: HDFS-14083
> URL: https://issues.apache.org/jira/browse/HDFS-14083
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs, native
>Affects Versions: 3.0.3
>Reporter: Pranay Singh
>Assignee: Pranay Singh
>Priority: Minor
> Attachments: HADOOP-15928.001.patch, HADOOP-15928.002.patch, 
> HDFS-14083.003.patch, HDFS-14083.004.patch, HDFS-14083.005.patch, 
> HDFS-14083.006.patch, HDFS-14083.007.patch, HDFS-14083.008.patch, 
> HDFS-14083.009.patch
>
>
> Problem:
> 
> There is excessive error logging when a file is opened by libhdfs 
> (DFSClient/HDFS) in S3 environment, this issue is caused because buffered 
> read is not supported in S3 environment, HADOOP-14603 "S3A input stream to 
> support ByteBufferReadable"  
> The following message is printed repeatedly in the error log/ to STDERR:
> {code}
> --
> UnsupportedOperationException: Byte-buffer read unsupported by input 
> streamjava.lang.UnsupportedOperationException: Byte-buffer read unsupported 
> by input stream
> at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:150)
> {code}
> h3. Root cause
> After investigating the issue, it appears that the above exception is printed 
> because
> when a file is opened via {{hdfsOpenFileImpl()}} calls {{readDirect()}} which 
> is hitting this
> exception.
> h3. Fix:
> Since the hdfs client is not initiating the byte buffered read but is 
> happening in a implicit manner, we should not be generating the error log 
> during open of a file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14111) hdfsOpenFile on HDFS causes unnecessary IO from file offset 0

2019-02-27 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779551#comment-16779551
 ] 

Sahil Takiar commented on HDFS-14111:
-

Thanks for the feedback Todd, will address the issue with {{newJavaStr}}. Yes, 
this fixes HDFS-14083 as well. I think we can close HDFS-14083 in favor of the 
approach taken here (just left a comment on HDFS-14083, if no one objects, I 
will close that JIRA).

{{test_hdfs_ext_hdfspp_test_shim_static}} seems to be consistently failing in 
Hadoop QA; still trying to figure out why.

> hdfsOpenFile on HDFS causes unnecessary IO from file offset 0
> -
>
> Key: HDFS-14111
> URL: https://issues.apache.org/jira/browse/HDFS-14111
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, libhdfs
>Affects Versions: 3.2.0
>Reporter: Todd Lipcon
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-14111.001.patch, HDFS-14111.002.patch
>
>
> hdfsOpenFile() calls readDirect() with a 0-length argument in order to check 
> whether the underlying stream supports bytebuffer reads. With DFSInputStream, 
> the read(0) isn't short circuited, and results in the DFSClient opening a 
> block reader. In the case of a remote block, the block reader will actually 
> issue a read of the whole block, causing the datanode to perform unnecessary 
> IO and network transfers in order to fill up the client's TCP buffers. This 
> causes performance degradation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14321) Fix -Xcheck:jni issues in libhdfs, run ctest with -Xcheck:jni enabled

2019-02-26 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16778763#comment-16778763
 ] 

Sahil Takiar commented on HDFS-14321:
-

When running the existing tests with {{-Xcheck:jni}} I only see one error: 
{{FATAL ERROR in native method: Invalid global JNI handle passed to 
DeleteGlobalRef}}, which seems to be caused by {{hadoopRzOptionsFree}} calling 
{{DeleteGlobalRef}} on {{opts->byteBufferPool}} which is not a global ref. It's 
not clear to me how big an issue this is, since {{opts->byteBufferPool}} should 
be a local ref that is automatically deleted when the native method exits.

There are a bunch of warnings of the form {{WARNING in native method: JNI call 
made without checking exceptions when required to from ...}} - after debugging 
these warnings, most of them seem to be caused by the JVM itself (e.g. internal 
JDK code). So they would have to fixed within the JDK itself.

> Fix -Xcheck:jni issues in libhdfs, run ctest with -Xcheck:jni enabled
> -
>
> Key: HDFS-14321
> URL: https://issues.apache.org/jira/browse/HDFS-14321
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> The JVM exposes an option called {{-Xcheck:jni}} which runs various checks 
> against JNI usage by applications. Further explanation of this JVM option can 
> be found in: 
> [https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/clopts002.html]
>  and 
> [https://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.0/com.ibm.java.vm.80.doc/docs/jni_debug.html].
>  When run with this option, the JVM will print out any warnings or errors it 
> encounters with the JNI.
> We should run the libhdfs tests with {{-Xcheck:jni}} (can be added to 
> {{LIBHDFS_OPTS}}) and fix any warnings / errors. We should add this option to 
> our ctest runs as well to ensure no regressions are introduced to libhdfs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   >