Option #2 is fine. Connections are cheap in Phoenix.

On Sunday, October 2, 2016, anil gupta <anilgupt...@gmail.com> wrote:

> Hi James,
>
> There is a high possibility that we might be sharing connection among
> multiple threads. This MR job is fairly complicated because we spin up a
> separate thread(to kill the download if it doesnt completes in a
> prespecified time.) within Mapper to perform image downloads from internet.
> #1. One solution would be to make the method that is doing upsert to
> synchronize on connection(or PreparedStatement object). We dont write at a
> very high throughput to Phoenix table coz it only logs errors. What do you
> think about that?
>
> #2. I will need to create a connection in every thread. But, AFAIK,
> creating connection everytime and then the PreparedStatement is expensive.
> Right?
>
> Please let me know if there is any other better approach that i am missing.
>
> On Sun, Oct 2, 2016 at 3:50 PM, James Taylor <jamestay...@apache.org
> <javascript:_e(%7B%7D,'cvml','jamestay...@apache.org');>> wrote:
>
>> Hi Anil,
>> Make sure you're not sharing the same Connection between multiple threads
>> as it's not thread safe.
>> Thanks,
>> James
>>
>>
>> On Sunday, October 2, 2016, anil gupta <anilgupt...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','anilgupt...@gmail.com');>> wrote:
>>
>>> Hi,
>>>
>>> We are running HDP2.3.4(HBase 1.1 and Phoenix 4.4). I have a MapReduce
>>> job thats writing data to a very simple Phoenix table. We intermittently
>>> get and due to this our job fails:
>>> java.util.ConcurrentModificationException at
>>> java.util.HashMap$HashIterator.remove(HashMap.java:944) at
>>> org.apache.phoenix.execute.MutationState.commit(MutationState.java:472)
>>> at 
>>> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:461)
>>> at 
>>> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:458)
>>> at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) at
>>> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:458)
>>> at 
>>> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:308)
>>> at 
>>> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:297)
>>> at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) at
>>> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:295)
>>> at org.apache.phoenix.jdbc.PhoenixPreparedStatement.executeUpda
>>> te(PhoenixPreparedStatement.java:200)
>>>
>>> We are running these upserts as part of code in Mapper that executes as
>>> part of ChainReducer. One problem i noticed that we were instantiating
>>> PreparedStatment everytime(conn == Connection object) we were doing an
>>> upsert :
>>>
>>> conn.prepareStatement(TcErrorWritable.buildUpsertNewRowStatement(TC_DOWNLOAD_ERRORS_TABLE));
>>>
>>> This is the only line that seems awkward to me in that code. We have
>>> other projects writing to Phoenix at a much higher throughput and volume of
>>> data but we never ran into this problem. Can anyone provide me more details
>>> on why we are getting ConcurrentModificationException while doing
>>> upserts?
>>>
>>> --
>>> Thanks & Regards,
>>> Anil Gupta
>>>
>>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

Reply via email to