Re: Why LineRecordWriter.write(..) is synchronized

2013-08-11 Thread Sathwik B P
Hi Harsh, Does it make any sense to keep the method in LRW still synchronized. Isn't it creating unnecessary overhead for non multi threaded implementations. regards, sathwik On Fri, Aug 9, 2013 at 7:16 AM, Harsh J ha...@cloudera.com wrote: I suppose I should have been clearer. There's no

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-11 Thread Harsh J
Yes, I feel we could discuss this over a JIRA to remove it if it hurts perf. too much, but it would have to be a marked incompatible change, and we have to add a note about the lack of thread safety in the javadoc of base Mapper/Reducer classes. On Sun, Aug 11, 2013 at 1:26 PM, Sathwik B P

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-11 Thread Niels Basjes
I expect the impact on the IO speed to be almost 0 because waiting for a single disk seek is longer than many thousands of calls to a synchronized method. Niels On Aug 11, 2013 3:00 PM, Harsh J ha...@cloudera.com wrote: Yes, I feel we could discuss this over a JIRA to remove it if it hurts

Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Sathwik B P
Hi, LineRecordWriter.write(..) is synchronized. I did not find any other RecordWriter implementations define the write as synchronized. Any specific reason for this. regards, sathwik

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Azuryy Yu
because we may use multi-threads to write a single file. On Aug 8, 2013 2:54 PM, Sathwik B P sath...@apache.org wrote: Hi, LineRecordWriter.write(..) is synchronized. I did not find any other RecordWriter implementations define the write as synchronized. Any specific reason for this.

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Sathwik B P
Hi, Thanks for your reply. May I know where does hadoop fork multiple threads to use a single RecordWriter. regards, sathwik On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu azury...@gmail.com wrote: because we may use multi-threads to write a single file. On Aug 8, 2013 2:54 PM, Sathwik B P

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Azuryy Yu
its not hadoop forked threads, we may create a line record writer, then call this writer concurrently. On Aug 8, 2013 4:00 PM, Sathwik B P sathwik...@gmail.com wrote: Hi, Thanks for your reply. May I know where does hadoop fork multiple threads to use a single RecordWriter. regards,

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Harsh J
While we don't fork by default, we do provide a MultithreadedMapper implementation that would require such synchronization. But if you are asking is it necessary, then perhaps the answer is no. On Aug 8, 2013 3:43 PM, Azuryy Yu azury...@gmail.com wrote: its not hadoop forked threads, we may

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Niels Basjes
I may be nitpicking here but if perhaps the answer is no then I conclude: Perhaps the other implementations of RecordWriter are a race condition/file corruption ready to occur. On Thu, Aug 8, 2013 at 12:50 PM, Harsh J ha...@cloudera.com wrote: While we don't fork by default, we do provide a

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Jay Vyas
Then is this a bug? Synchronization in absence of any race condition is normally considered bad. In any case id like to know why this writer is synchronized whereas the other one are not.. That is, I think, then point at issue: either other writers should be synchronized or else this one

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Sathwik B P
Hi Harsh, Do you want me to raise a Jira on this. regards, sathwik On Thu, Aug 8, 2013 at 5:23 PM, Jay Vyas jayunit...@gmail.com wrote: Then is this a bug? Synchronization in absence of any race condition is normally considered bad. In any case id like to know why this writer is

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Niels Basjes
I would say yes make this a Jira. The actual change can fall (as proposed by Jay) in two directions: Put in synchronization in all implementations OR take it out of all implementations. I think the first thing to determine is why the synchronization was put into the LineRecordWriter in the first

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Azuryy Yu
sequence writer is also synchronized, I dont think this is bad. if you call HDFS api to write concurrently, then its necessary. On Aug 8, 2013 7:53 PM, Jay Vyas jayunit...@gmail.com wrote: Then is this a bug? Synchronization in absence of any race condition is normally considered bad. In

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Harsh J
I suppose I should have been clearer. There's no problem out of box if people stick to the libraries we offer :) Yes the LRW was marked synchronized at some point over 8 years ago [1] in support for multi-threaded maps, but the framework has changed much since then. The MultithreadedMapper/etc.