Hi Anoop, Thanks for the hint! Even if it's not fixing my issue, at least my tests are going to be faster.
I will take a look at the documentation to understand what deleteColumn was doing. JM 2012/12/19, Anoop Sam John <[email protected]>: > Jean: just one thought after seeing the description and the code.. Not > related to the missing as such > > You want to delete the row fully right? >>My table is only one CF with one C with one version > And your code is like >> Delete delete_entry_proposed = new Delete(key); >> delete_entry_proposed.deleteColumn(KVs.get(0).getFamily(), >> KVs.get(0).getQualifier()); > > deleteColumn() is useful when you want to delete specific column's specific > version in a row. In your case this may be really not needed. Just Delete > delete_entry_proposed = new Delete(key); may be enough so that the delete > type is ROW delete. > > You can see the javadoc of the deleteColumn() API in which it clearly says > it is an expensive op. At the server side there will be a need to do a Get > call.. > In your case these are really unwanted over head .. I think... > > -Anoop- > ________________________________________ > From: Jean-Marc Spaggiari [[email protected]] > Sent: Tuesday, December 18, 2012 7:07 PM > To: [email protected] > Subject: Re: MR missing lines > > I faced the issue again today... > > RowCounter gave me 104313 lines > Here is the output of the job counters: > 12/12/17 22:32:52 INFO mapred.JobClient: ENTRY_ADDED=81594 > 12/12/17 22:32:52 INFO mapred.JobClient: ENTRY_SIMILAR=434 > 12/12/17 22:32:52 INFO mapred.JobClient: ENTRY_NO_CHANGES=14250 > 12/12/17 22:32:52 INFO mapred.JobClient: ENTRY_DUPLICATE=428 > 12/12/17 22:32:52 INFO mapred.JobClient: NON_DELETED_ROWS=0 > 12/12/17 22:32:52 INFO mapred.JobClient: ENTRY_EXISTING=7605 > 12/12/17 22:32:52 INFO mapred.JobClient: ROWS_PARSED=104311 > > There is a 2 lines difference between ROWS_PARSED and he counter. > ENTRY_ADDED, ENTRY_SIMILAR, ENTRY_NO_CHANGES, ENTRY_DUPLICATE and > ENTRY_EXISTING are the 5 states an entry can have. Total of all those > counters is equal to the ROWS_PARSED value, so it's alligned. Code is > handling all the possibilities. > > The ROWS_PARSED counter is incremented right at the beginning like > that (I removed the comments and javadoc for lisibility): > /** > * The comments ... > */ > @Override > public void map(ImmutableBytesWritable row__, Result values, > Context > context) throws IOException > { > > > context.getCounter(Counters.ROWS_PARSED).increment(1); > List<KeyValue> KVs = values.list(); > try > { > > // Get the current row. > byte[] key = values.getRow(); > > // First thing we do, we mark this line to > be deleted. > Delete delete_entry_proposed = new > Delete(key); > > delete_entry_proposed.deleteColumn(KVs.get(0).getFamily(), > KVs.get(0).getQualifier()); > > deletes_entry_proposed.add(delete_entry_proposed); > > > The deletes_entry_proposed is a list of rows to delete. After each > call to the delete method, the number of remaining lines into this > list is added to NON_DELETED_ROWS which is 0 at the end, so all lines > should be deleted correctly. > > I re-ran the rowcounter after the job, and I still have ROWS=5971 > lines into the table. I check all my "feeding process" and they are > all closed. > > My table is only one CF with one C with one version. > > I can guess that the remaining 5971 lines into the table is an error > on my side, but I'm not able to find where since all the counters are > matching. I will add one counter which will add all the entries in the > delete list before calling the delete method. This should match the > number of rows. > > Again, I will re-feed the table today with fresh data and re-run the job... > > JM > > 2012/12/17, Jean-Marc Spaggiari <[email protected]>: >> The job run the morning, and of course, this time, all the rows got >> processed ;) >> >> So I will give it few other tries and will keep you posted if I'm able >> to reproduce that again. >> >> Thanks, >> >> JM >> >> 2012/12/16, Jean-Marc Spaggiari <[email protected]>: >>> Thanks for the suggestions. >>> >>> I already have logs to display all the exepctions and there is >>> nothing. I can't display the work done, there is to much :( >>> >>> I have counters "counting" the rows processed and they match what is >>> done, minus what is not processed. I have just added few other >>> counters. One right at the beginning, and one to count what are the >>> records remaining on the delete list, as suggested. >>> >>> I will run the job again tomorrow, see the result and keep you posted. >>> >>> JM >>> >>> >>> 2012/12/16, Asaf Mesika <[email protected]>: >>>> Did you check the returned array of the delete method to make sure all >>>> records sent for delete have been deleted? >>>> >>>> Sent from my iPhone >>>> >>>> On 16 בדצמ 2012, at 14:52, Jean-Marc Spaggiari >>>> <[email protected]> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I have a table where I'm running MR each time is exceding 100 000 >>>>> rows. >>>>> >>>>> When the target is reached, all the feeding process are stopped. >>>>> >>>>> Yesterday it reached 123608 rows. So I stopped the feeding process, >>>>> and ran the MR. >>>>> >>>>> For each line, the MR is creating a delete. The delete is placed on a >>>>> list, and when the list reached 10 elements, it's sent to the table. >>>>> In the clean method, the list is sent to the table if there is any >>>>> element in it. >>>>> >>>>> So at the en of the MR, I should have an empty table. >>>>> >>>>> The table is splitted over 128 regions. And I have 8 region servers. >>>>> >>>>> What is disturbing me is that after the MR, I had 38 lines remaining >>>>> on the table. the MR took 348 minutes to run. So I ran the MR again, >>>>> which this time took 2 minutes, and now I have 1 row remaining in the >>>>> table. >>>>> >>>>> I looked at the logs (for the 38 lines run) and there is nothing in >>>>> it. There is some scanner timeout exception for the run of the 100K >>>>> rows. >>>>> >>>>> I'm running HBase 0.94.3. >>>>> >>>>> I will hava another 100K rows today, so I will re-run the job. I will >>>>> increase the timeout to make sure I got no exception, but even when I >>>>> ran the 38 lines with no exception one was remaining... >>>>> >>>>> Any idea why and where I can seach? It's not really an issue for me >>>>> since I can just re-run the job, but this might be an issue for some >>>>> others. >>>>> >>>>> JM >>>> >>> >>
