Hi,

 it is the first message that I send to this list, so hello everyone.

I have run into an issue running mapreduce jobs importing data from HDFS to Phoenix.

We have some long jobs, and for the ones running >24 hours seems they are running fine even much longer, but eventually the connection to HBase is closed when we create a new connection to do bookeping task at the end of the task.
This does not seem to happen on tasks running less than 24 hours.

I have checked a bit the code, and found that DEFAULT_CLIENT_CONNECTION_CACHE_MAX_DURATION = 86400000, so 24 hours.

But the client can run much more that that, and only fail when creating a new connection ( when the job commits, to do bookkeeping stuff)

It might have to be that are only evicted when the cache is accessed ( Google Guava Cache implementation), so it can be running correctly in the background

So what I think it is happening, is that client connections are cached for up to 24 hours DEFAULT_CLIENT_CONNECTION_CACHE_MAX_DURATION = 86400000) but are only evicted when the cache is accessed ( Google Guava Cache implementation), so it can be running correctly in the background until a new connection is created, and then the previous one is evicted from the cache (closing the connection and possible losing the uncommitted changes in tha last batch)

My question is, is this something known and that the cache is useful in some situations, or is this a bug? So far I have set max value on the cache duration, but I am not sure it might affect other situations.

Thank you,
Álvaro


Reply via email to