Hi,
it is the first message that I send to this list, so hello everyone.
I have run into an issue running mapreduce jobs importing data from HDFS
to Phoenix.
We have some long jobs, and for the ones running >24 hours seems they
are running fine even much longer, but eventually the connection to
HBase is closed when we create a new connection to do bookeping task at
the end of the task.
This does not seem to happen on tasks running less than 24 hours.
I have checked a bit the code, and found that
DEFAULT_CLIENT_CONNECTION_CACHE_MAX_DURATION = 86400000, so 24 hours.
But the client can run much more that that, and only fail when creating
a new connection ( when the job commits, to do bookkeeping stuff)
It might have to be that are only evicted when the cache is accessed (
Google Guava Cache implementation), so it can be running correctly in
the background
So what I think it is happening, is that client connections are cached
for up to 24 hours DEFAULT_CLIENT_CONNECTION_CACHE_MAX_DURATION =
86400000) but are only evicted when the cache is accessed ( Google Guava
Cache implementation), so it can be running correctly in the background
until a new connection is created, and then the previous one is evicted
from the cache (closing the connection and possible losing the
uncommitted changes in tha last batch)
My question is, is this something known and that the cache is useful in
some situations, or is this a bug?
So far I have set max value on the cache duration, but I am not sure it
might affect other situations.
Thank you,
Álvaro