Option #2 is fine. Connections are cheap in Phoenix. On Sunday, October 2, 2016, anil gupta <anilgupt...@gmail.com> wrote:
> Hi James, > > There is a high possibility that we might be sharing connection among > multiple threads. This MR job is fairly complicated because we spin up a > separate thread(to kill the download if it doesnt completes in a > prespecified time.) within Mapper to perform image downloads from internet. > #1. One solution would be to make the method that is doing upsert to > synchronize on connection(or PreparedStatement object). We dont write at a > very high throughput to Phoenix table coz it only logs errors. What do you > think about that? > > #2. I will need to create a connection in every thread. But, AFAIK, > creating connection everytime and then the PreparedStatement is expensive. > Right? > > Please let me know if there is any other better approach that i am missing. > > On Sun, Oct 2, 2016 at 3:50 PM, James Taylor <jamestay...@apache.org > <javascript:_e(%7B%7D,'cvml','jamestay...@apache.org');>> wrote: > >> Hi Anil, >> Make sure you're not sharing the same Connection between multiple threads >> as it's not thread safe. >> Thanks, >> James >> >> >> On Sunday, October 2, 2016, anil gupta <anilgupt...@gmail.com >> <javascript:_e(%7B%7D,'cvml','anilgupt...@gmail.com');>> wrote: >> >>> Hi, >>> >>> We are running HDP2.3.4(HBase 1.1 and Phoenix 4.4). I have a MapReduce >>> job thats writing data to a very simple Phoenix table. We intermittently >>> get and due to this our job fails: >>> java.util.ConcurrentModificationException at >>> java.util.HashMap$HashIterator.remove(HashMap.java:944) at >>> org.apache.phoenix.execute.MutationState.commit(MutationState.java:472) >>> at >>> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:461) >>> at >>> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:458) >>> at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) at >>> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:458) >>> at >>> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:308) >>> at >>> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:297) >>> at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) at >>> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:295) >>> at org.apache.phoenix.jdbc.PhoenixPreparedStatement.executeUpda >>> te(PhoenixPreparedStatement.java:200) >>> >>> We are running these upserts as part of code in Mapper that executes as >>> part of ChainReducer. One problem i noticed that we were instantiating >>> PreparedStatment everytime(conn == Connection object) we were doing an >>> upsert : >>> >>> conn.prepareStatement(TcErrorWritable.buildUpsertNewRowStatement(TC_DOWNLOAD_ERRORS_TABLE)); >>> >>> This is the only line that seems awkward to me in that code. We have >>> other projects writing to Phoenix at a much higher throughput and volume of >>> data but we never ran into this problem. Can anyone provide me more details >>> on why we are getting ConcurrentModificationException while doing >>> upserts? >>> >>> -- >>> Thanks & Regards, >>> Anil Gupta >>> >> > > > -- > Thanks & Regards, > Anil Gupta >