[jira] [Resolved] (HADOOP-1222) Record IO C++ binding: buffer type not handled correctly

2011-07-16 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-1222.
-

Resolution: Won't Fix

Resolving as Won't Fix, since the whole recordio component is now deprecated in 
favor of Avro (and technically ought to be removed in 0.22/0.23).

Please see https://issues.apache.org/jira/browse/HADOOP-6155

 Record IO C++ binding: buffer type not handled correctly
 

 Key: HADOOP-1222
 URL: https://issues.apache.org/jira/browse/HADOOP-1222
 Project: Hadoop Common
  Issue Type: Bug
  Components: record
Reporter: David Bowen
 Attachments: test.cc


  
 I added this code to the test, which currently only tests 
 serialization/deserialization of an empty buffer.
std::string b = r1.getBufferVal();
 static char buffer[] = {0, 1, 2, 3, 4, 5};
 for (int i = 0; i  6; i++) {
   b.push_back(buffer[i]);
 }
 The csv test fails.  The generated file looks like this.
 T,102,4567,99344109427290,3.145000,1.523400,',# 0 1 2 3 4 5 0 1 2 3 4 
 5,v{},m{}
 The xml test passes, but the data in the xml file is wrong:
 valuestring000102030405000102030405000102030405/string/value

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-1223) Record IO C++ binding: non-empty vector of strings does not work

2011-07-16 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-1223.
-

Resolution: Won't Fix

Resolving as Won't Fix, since the whole recordio component is now deprecated in 
favor of Avro (and technically ought to be removed in 0.22/0.23).

Please see https://issues.apache.org/jira/browse/HADOOP-6155

 Record IO C++ binding: non-empty vector of strings does not work
 

 Key: HADOOP-1223
 URL: https://issues.apache.org/jira/browse/HADOOP-1223
 Project: Hadoop Common
  Issue Type: Bug
  Components: record
Reporter: David Bowen
 Attachments: test.cc


 It works in the binary case, but not in CSV or XML.
 Here is the code to put some strings in the vector.
 std::vectorstd::string v = r1.getVectorVal();
 v.push_back(hello);
 v.push_back(world);
 In the CSV file, the strings appear twice, for some reason.  In the XML file 
 they appear three times.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-1277) The class generated by Hadoop Record rcc should provide a static method to return the DDL string

2011-07-16 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-1277.
-

Resolution: Won't Fix

Resolving as Won't Fix, since the whole recordio component is now deprecated in 
favor of Avro (and technically ought to be removed in 0.22/0.23).

Please see https://issues.apache.org/jira/browse/HADOOP-6155

 The class generated by Hadoop Record rcc should provide a static method to 
 return the DDL string
 

 Key: HADOOP-1277
 URL: https://issues.apache.org/jira/browse/HADOOP-1277
 Project: Hadoop Common
  Issue Type: New Feature
  Components: record
Reporter: Runping Qi

 The method will look like:
 public static string getDDL();
 With this class, when a map/reduce job write out sequence file swith such a 
 generated class as its value class, the job
 can also save the DDL of the class into a file.
 With such a file around, we can implement a record reader that can generate 
 the required class on demand, thus, can read 
 a sequence file of Hadoop Records without having the class a priori.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-1225) Record IO class should provide a toString(String charset) method

2011-07-16 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-1225.
-

Resolution: Won't Fix

Resolving as Won't Fix, since the whole recordio component is now deprecated in 
favor of Avro (and technically ought to be removed in 0.22/0.23).

Please see https://issues.apache.org/jira/browse/HADOOP-6155

 Record IO class should provide a toString(String charset) method
 

 Key: HADOOP-1225
 URL: https://issues.apache.org/jira/browse/HADOOP-1225
 Project: Hadoop Common
  Issue Type: Improvement
  Components: record
Reporter: Runping Qi
Assignee: Sameer Paranjpye

 Currently, the toString() function returns the csv format serialized form of 
 the record object.
 Unfortunately, all the fields of Buffer type are serialized into hex string. 
 Although this is a loss less conversion, it is not 
 the most convenient form, when perople use Buffer to store international 
 texts. With
 a new function toString(String charset) , the user can pass a charset to 
 indicate the desired way to convert a Buffer to a String.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-1095) Provide ByteStreams in C++ version of record I/O

2011-07-16 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-1095.
-

Resolution: Won't Fix

Resolving as Won't Fix, since the whole recordio component is now deprecated in 
favor of Avro (and technically ought to be removed in 0.22/0.23).

Please see https://issues.apache.org/jira/browse/HADOOP-6155

 Provide ByteStreams in C++ version of record I/O
 

 Key: HADOOP-1095
 URL: https://issues.apache.org/jira/browse/HADOOP-1095
 Project: Hadoop Common
  Issue Type: Improvement
  Components: record
Affects Versions: 0.12.0
 Environment: All
Reporter: Milind Bhandarkar
Assignee: Vivek Ratan

  Implement ByteInStream and ByteOutStream for C++ runtime, as they will be 
 needed for using Hadoop Record I/O with forthcoming C++ MapReduce framework 
 (currently, only FileStreams are provided.) 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-1227) Record IO C++ binding: cannot write more than one record to an XML stream and read them back

2011-07-16 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-1227.
-

Resolution: Won't Fix

Resolving as Won't Fix, since the whole recordio component is now deprecated in 
favor of Avro (and technically ought to be removed in 0.22/0.23).

Please see https://issues.apache.org/jira/browse/HADOOP-6155

 Record IO C++ binding: cannot write more than one record to an XML stream and 
 read them back
 

 Key: HADOOP-1227
 URL: https://issues.apache.org/jira/browse/HADOOP-1227
 Project: Hadoop Common
  Issue Type: Bug
  Components: record
Reporter: David Bowen

 I tried just writing the same record twice and then reading it back twice, 
 and got a segmentation fault.
 This works fine in the binary and csv cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-6712) librecordio support for xerces 3, eliminate compiler warnings and the (optional) ability to compile in the source directory

2011-07-16 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-6712.
-

Resolution: Won't Fix

Resolving as Won't Fix, since the whole recordio component is now deprecated in 
favor of Avro (and technically ought to be removed in 0.22/0.23).

Please see https://issues.apache.org/jira/browse/HADOOP-6155

Do reopen if you wish to maintain recordio over 0.20.x branches. Although Avro 
works with 0.20 as well.

 librecordio support for xerces 3, eliminate compiler warnings and the 
 (optional) ability to compile in the source directory
 ---

 Key: HADOOP-6712
 URL: https://issues.apache.org/jira/browse/HADOOP-6712
 Project: Hadoop Common
  Issue Type: Bug
  Components: record
 Environment: 64-bit linux w/gcc 4.4.3 w/xerces 3
Reporter: John Plevyak
 Attachments: librecordio-jp-v1.patch


 I don't know if this code is current supported, but since it is in the tree 
 here are some fixes:
 1. support for xerces 3.X as well as 2.X
 the patch checks XERCES_VERSION_MAJOR and I have tested on 3.X but before
committing, someone should retest on 2.X
 2. gcc 4.4.3 on 64-bit complains about using %lld with int64_t.  Casting
 to 'long long int' solves the issue
 3. since there is currently no ant target, check if LIBRECORDIO_BUILD_DIR
 is undefined and if so assume '.' to support compiling in the source 
 directory
 This should not effect normal compilation if/when an ant target is 
 created.
 patch attached

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-1916) FSShell put or CopyFromLocal incorrectly treats .

2011-07-16 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-1916.
-

Resolution: Fixed

Fixed by a redesign on trunk.

Please https://issues.apache.org/jira/browse/HADOOP-7176 for the umbrella of 
changes leading to.

 FSShell put or CopyFromLocal incorrectly treats .
 ---

 Key: HADOOP-1916
 URL: https://issues.apache.org/jira/browse/HADOOP-1916
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 0.14.1
Reporter: Konstantin Shvachko
Assignee: Chris Douglas
 Attachments: 1916.patch


 The following dfs shell command
 {code}
 bin/hadoop dfs -put README.txt .
 {code}
 results in creating a file /user/user name with the contents of README.txt.
 A correct behavior would be creating a directory and a file in it: 
 /user/user name/README.txt
 The put command works correctly if /user/user name already exists.
 So the following sequence of command leads to the desired result:
 {code}
 bin/hadoop dfs -mkdir .
 bin/hadoop dfs -put README.txt .
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-1919) Add option to allow Binding Jetty to localhost

2011-07-16 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-1919.
-

Resolution: Not A Problem

Not a problem.

Can be doable by the previous comment, with logical repercussions.

 Add option to allow Binding Jetty to localhost
 --

 Key: HADOOP-1919
 URL: https://issues.apache.org/jira/browse/HADOOP-1919
 Project: Hadoop Common
  Issue Type: New Feature
Affects Versions: 0.14.0
Reporter: Thurman Turner
Priority: Minor

 We would like a configurable option to have Jetty bound to the loopback 
 address of the machine so that the dfs-browser is not accessible from outside 
 the host.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-1994) Variable names generated by Record I/O should not clash with user fields

2011-07-16 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-1994.
-

Resolution: Won't Fix

Record IO has been deprecated with the advent of Avro.

Please see HADOOP-6155

Resolving as Won't Fix.

 Variable names generated by Record I/O should not clash with user fields
 

 Key: HADOOP-1994
 URL: https://issues.apache.org/jira/browse/HADOOP-1994
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Vivek Ratan
Assignee: Vivek Ratan

 The code (Java and C++) spit out by the Record I/O compiler contains 
 variables. We need to make sure these variable names don't clash with names 
 used by users in the DDL, otherwise the generated code will not compile. 
 Variable names such as 'a', 'peer', etc, are used. We need better names. For 
 example, if I have a DDL of the form
 {code}
 class s1 {
   int a;
   boolean peer;
   int a_;
 }
 {code}
 Both the Java and C++ code will not compile. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: java.lang.Throwable: Child Error And Task process exit with nonzero status of 1.

2011-07-12 Thread Harsh J
The job may have succeeded due to the task having run successfully on
another tasktracker after a retry attempt was scheduled. This probably
means one of your TT has something bad on it, and should be easily
identifiable from the UI.

If all TTs are bad, your job would fail -- so yes, better to fix than
worry about expecting failures.

On Mon, Jul 11, 2011 at 11:53 PM, C.V.Krishnakumar Iyer
f2004...@gmail.com wrote:
 Hi,

 I get this error too. But the Job completes properly. Is this error any cause 
 for concern? As in, would any computation be hampered because of this?

 Thanks !

 Regards,
 Krishnakumar
 On Jul 11, 2011, at 10:53 AM, Bharath Mundlapudi wrote:

 That number is around 40K (I think). I am not sure if you have certain 
 configurations to cleanup user task logs periodically. We have solved this 
 problem in MAPREDUCE-2415 which part of 0.20.204.


 But you cleanup the task logs periodically, you will not run into this 
 problem.

 -Bharath





-- 
Harsh J


Re: XXXWritable

2011-07-04 Thread Harsh J
Do have a look at Apache Avro's use with MapReduce. It helps solve
some issues related with serialization in the way you are talking
about: http://avro.apache.org

On Sat, Jul 2, 2011 at 7:59 AM, Raja Nagendra Kumar
nagendra.r...@tejasoft.com wrote:

 Hi,

 I read in Definitive guide that, LongWritable and xxWritables are more
 optimized versions for network serialization of normal Java Long etc..

 If that is so, would be not be easy for developers to use normal Java long
 and string, so hadoop framework can internally convert the developer written
 code to use the longwritable etc or through some intermediate code
 conversions. This approach can greatly reduce the no of api and helps in
 faster learning.

 Not sure, why developer had to write and use XXXWritable etc in the context
 of optimized for network serialization


 Regards,
 Raja Nagendra Kumar,
 C.T.O
 www.tejasoft.com
 -Hadoop Adoption Consulting
 --
 View this message in context: 
 http://old.nabble.com/XXXWritable-tp31977841p31977841.html
 Sent from the Hadoop core-dev mailing list archive at Nabble.com.





-- 
Harsh J


[jira] [Resolved] (HADOOP-7396) The information returned by the wrong usage of the command hadoop job -events job-id from-event-# #-of-events is not appropriate

2011-06-18 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-7396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-7396.
-

Resolution: Duplicate

 The information returned by the wrong usage of the command hadoop job 
 -events job-id from-event-# #-of-events is not appropriate
 

 Key: HADOOP-7396
 URL: https://issues.apache.org/jira/browse/HADOOP-7396
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 0.23.0
Reporter: Yan Jinshuang
Priority: Minor
 Fix For: 0.23.0


 With wrong value of from-event-# and #-of-events, though the from-events-# 
 after the #-of-events for example from 1000 to 1, the command always return 
 0.It is expected to show detailed information, like the start number should 
 be less than the end number for range of events.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: HADOOP-7328. Improve the SerializationFactory functions.

2011-06-16 Thread Harsh J

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/884/
---

(Updated 2011-06-16 12:13:34.081758)


Review request for hadoop-common and Todd Lipcon.


Changes
---

Throw exceptions (getting rid of nulls). Add appropriate javadocs and fix one 
checkstyle nit.


Summary
---

Since getSerialization() can possibly return a null, it is only right that 
getSerializer() and getDeserializer() usage functions do the same, instead of 
throwing up NPEs.

Related issue to which this improvement is required: 
https://issues.apache.org/jira/browse/MAPREDUCE-2584


This addresses bug HADOOP-7328.
http://issues.apache.org/jira/browse/HADOOP-7328


Diffs (updated)
-

  src/java/org/apache/hadoop/io/serializer/SerializationFactory.java dee314a 

Diff: https://reviews.apache.org/r/884/diff


Testing
---

Existing SequenceFile serialization factory tests pass. The change is merely to 
make the functions return null instead of throwing an NPE within.


Thanks,

Harsh



[jira] [Resolved] (HADOOP-3436) Useless synchronized in JobTracker

2011-06-15 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-3436.
-

Resolution: Not A Problem

Does not appear to be a problem w.r.t. trunk. There is no such variable held (a 
collection is used instead, and that requires to hold JT lock and is 
synchronized (per comments)).

Resolving as Not a problem (anymore). Stale issue.

 Useless synchronized in JobTracker
 --

 Key: HADOOP-3436
 URL: https://issues.apache.org/jira/browse/HADOOP-3436
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Brice Arnould
Assignee: Brice Arnould
Priority: Trivial

 In the original code, numTaskTrackers is fetch in a synchronized way, which 
 is useless because anyway it might be change during the running of the 
 algorithm.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Problems about the job counters

2011-06-14 Thread Harsh J
Hello,

When you have a Reduce phase, the mapper needs to (sort and)
materialize KVs to local files to let reducers fetch it. This is where
the FILE_BYTES_* counters appear from. Similarly, the Reducer fetches
and stores on local disk and merge sorts them again, thus they appear
for reduce phase as well.

In a map-only job, you should not generally see any FILE_BYTES_* counters.

On Wed, Jun 15, 2011 at 9:32 AM, hailong.yang1115
hailong.yang1...@gmail.com wrote:

 Dear all,

 I am trying to the built-in example wordcount with nearly 15GB input. When 
 the Hadoop job finished, I got the following counters.


 CounterMapReduceTotal
 Job CountersLaunched reduce tasks001
 Rack-local map tasks0035
 Launched map tasks002,318
 Data-local map tasks002,283
 FileSystemCountersFILE_BYTES_READ22,863,580,65617,654,943,34140,518,523,997
 HDFS_BYTES_READ154,400,997,4590154,400,997,459
 FILE_BYTES_WRITTEN33,490,829,40317,654,943,34151,145,772,744
 HDFS_BYTES_WRITTEN02,747,356,7042,747,356,704


 My question is what does the FILE_BYTES_READ counter mean? And what is the 
 difference between FILE_BYTES_READ and HDFS_BYTES_READ? In my opinion, all 
 the input is located in HDFS, so where does FILE_BYTES_READ come from during 
 the map phase?


 Any help will be appreciated!

 Hailong

 2011-06-15



 ***
 * Hailong Yang, PhD. Candidate
 * Sino-German Joint Software Institute,
 * School of Computer ScienceEngineering, Beihang University
 * Phone: (86-010)82315908
 * Email: hailong.yang1...@gmail.com
 * Address: G413, New Main Building in Beihang University,
 *              No.37 XueYuan Road,HaiDian District,
 *              Beijing,P.R.China,100191
 ***




-- 
Harsh J


[jira] [Reopened] (HADOOP-6219) Add dumpConfiguration option in hadoop help message

2011-06-12 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reopened HADOOP-6219:
-


Sorry, pretty strange that both link to same issue? Ideally it should be under 
mapred project now.

 Add dumpConfiguration option in hadoop help message
 ---

 Key: HADOOP-6219
 URL: https://issues.apache.org/jira/browse/HADOOP-6219
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: Ramya R
Assignee: V.V.Chaitanya Krishna
Priority: Trivial
 Fix For: 0.23.0

 Attachments: HADOOP-6184-ydist.patch, HADOOP-6219-ydist.patch, 
 MAPREDUCE-919.patch, MAPREDUCE-919.patch


 Execution of bin/hadoop should show the -dumpConfiguration option introduced 
 in MAPREDUCE-768

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-5624) @Override cleanup for Eclipse

2011-06-12 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-5624.
-

Resolution: Not A Problem

Eclipse does not seem to complain for any of the patch's changes.

I do not see any @Override issues on trunk of all three projects right now. 
Please do not hesitate to re-open in case it is still an issue as per.

 @Override cleanup for Eclipse
 -

 Key: HADOOP-5624
 URL: https://issues.apache.org/jira/browse/HADOOP-5624
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: Carlos Valiente
Priority: Trivial
 Attachments: HADOOP-5624.patch


 Eclipse complains about several methods which are marked as {{@Override}}, 
 but which are not defined in any superclass.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-6936) broken links in http://wiki.apache.org/hadoop/FAQ#A12//s

2011-06-12 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-6936.
-

Resolution: Fixed

Just for a history lesson ref. here: There used to be hadoop-*.xml files once 
upon a time. Its now split over to core-*, hdfs-*, mapred-* files (* - {site, 
default}).

Closing as the HowToConfigure link has also been updated by me. Although it 
needs more love in general (We should switch to confluence… its more 
encouraging).

 broken links in http://wiki.apache.org/hadoop/FAQ#A12//s
 

 Key: HADOOP-6936
 URL: https://issues.apache.org/jira/browse/HADOOP-6936
 Project: Hadoop Common
  Issue Type: Bug
  Components: documentation
Reporter: Eugene Koontz
Priority: Trivial

 http://wiki.apache.org/hadoop/FAQ#A12//s 
 has links to :
 http://hadoop.apache.org/core/docs/current/hadoop-default.html#dfs.replication.min
 http://hadoop.apache.org/common/docs/current/hadoop-default.html#dfs.safemode.threshold.pct
 both of which are 404 as of time of filing this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: HADOOP-7328. Improve the SerializationFactory functions.

2011-06-11 Thread Harsh J


 On 2011-06-12 02:24:55, Todd Lipcon wrote:
  Looks good to me. Can you upload this rev of the patch to the JIRA so the 
  QA Bot runs on it?

Submitted on JIRA. Thanks for the review Todd!


- Harsh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/884/#review805
---


On 2011-06-11 22:10:17, Harsh J wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/884/
 ---
 
 (Updated 2011-06-11 22:10:17)
 
 
 Review request for hadoop-common and Todd Lipcon.
 
 
 Summary
 ---
 
 Since getSerialization() can possibly return a null, it is only right that 
 getSerializer() and getDeserializer() usage functions do the same, instead of 
 throwing up NPEs.
 
 Related issue to which this improvement is required: 
 https://issues.apache.org/jira/browse/MAPREDUCE-2584
 
 
 This addresses bug HADOOP-7328.
 http://issues.apache.org/jira/browse/HADOOP-7328
 
 
 Diffs
 -
 
   src/java/org/apache/hadoop/io/serializer/SerializationFactory.java dee314a 
 
 Diff: https://reviews.apache.org/r/884/diff
 
 
 Testing
 ---
 
 Existing SequenceFile serialization factory tests pass. The change is merely 
 to make the functions return null instead of throwing an NPE within.
 
 
 Thanks,
 
 Harsh
 




Re: 404 on Learn about link

2011-06-05 Thread Harsh J
Bruno,

While someone would eventually get to fix this live link error, the
right page for the current release is at:
http://hadoop.apache.org/common/docs/current/ instead of stable (just
in case one does not know).

On Sun, Jun 5, 2011 at 8:39 AM, Bruno P. Kinoshita
brunodepau...@yahoo.com.br wrote:
 Hi there,

 I am receiving 404 when I click on Learn about link in Hadoop Common page [2].

 Could somebody with karma check to see if it is a problem or if it is just 
 down for maintenance or something similar, please?

 TYIA,
 Bruno

 [1] http://hadoop.apache.org/common/docs/stable/
 [2] http://hadoop.apache.org/common/




-- 
Harsh J


Re: Question regarding network data transfer

2011-05-28 Thread Harsh J
Aishwarya,

On Sun, May 29, 2011 at 6:49 AM, Aishwarya Venkataraman
avenk...@cs.ucsd.edu wrote:
 So how does reducer obtain the mapper's output ? Does it make a network call
 and read data from mappers local storage or does the mapper send the data ?

The mappers store the files at a location that is accessibly by the
TaskTracker's HTTP servlet. The reducer fetches all successful map
attempt outputs from the TaskTrackers when they initialize.

-- 
Harsh J


[jira] [Created] (HADOOP-7328) Give more information about a missing Serializer class

2011-05-24 Thread Harsh J Chouraria (JIRA)
Give more information about a missing Serializer class
--

 Key: HADOOP-7328
 URL: https://issues.apache.org/jira/browse/HADOOP-7328
 Project: Hadoop Common
  Issue Type: Improvement
  Components: io
Affects Versions: 0.20.2
Reporter: Harsh J Chouraria
Assignee: Harsh J Chouraria
 Fix For: 0.23.0


When you have a key/value class that's non Writable and you forget to attach 
io.serializers for the same, an NPE is thrown by the tasks with no information 
on why or what's missing and what led to it. I think a better exception can be 
thrown by SerializationFactory instead of an NPE when a class is not found 
accepted by any of the loaded ones.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (HADOOP-7297) Error in the documentation regarding Checkpoint/Backup Node

2011-05-19 Thread Harsh J Chouraria (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-7297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J Chouraria reopened HADOOP-7297:
---


Reopening since the issue of docs is valid. There are CN and BN node docs on 
the tagged svn rev: 
http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.20.203.0/src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml

 Error in the documentation regarding Checkpoint/Backup Node
 ---

 Key: HADOOP-7297
 URL: https://issues.apache.org/jira/browse/HADOOP-7297
 Project: Hadoop Common
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.20.203.0
Reporter: arnaud p
Priority: Trivial

 On 
 http://hadoop.apache.org/common/docs/r0.20.203.0/hdfs_user_guide.html#Checkpoint+Node:
  the command bin/hdfs namenode -checkpoint required to launch the 
 backup/checkpoint node does not exist.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: MapReduce compilation error

2011-05-18 Thread Harsh J
Its a streaming test thing. Have a look at:
https://issues.apache.org/jira/browse/MAPREDUCE-1686

On Thu, May 19, 2011 at 1:30 AM, Niels Basjes ni...@basjes.nl wrote:
 Today I ran into the same error and I was puzzled by the content of this file.
 What is the purpose of a test file that appears to have a deliberate
 error and no code what so ever?


 2011/3/19 Harsh J qwertyman...@gmail.com:
 This shouldn't really interfere with your development. You may try to
 exclude it from Eclipse's build, perhaps.

 On Sat, Mar 19, 2011 at 1:39 AM, bikash sharma sharmabiks...@gmail.com 
 wrote:
 Hi,
 When I am compiling MapReduce source code after checking-in Eclipse, I am
 getting the following error:

 The declared package  does not match the expected package testjar
 ClassWithNoPackage.java Hadoop-MR/src/test/mapred/testjar

 Any thoughts?

 Thanks,
 Bikash




 --
 Harsh J
 http://harshj.com




 --
 Met vriendelijke groeten,

 Niels Basjes




-- 
Harsh J


Re: How HDFS decides where to put the block

2011-04-18 Thread Harsh J
Hello,

On Mon, Apr 18, 2011 at 7:16 PM, Nan Zhu zhunans...@gmail.com wrote:
 Hi, all

 I'm confused by a question that how does the HDFS decide where to put the
 data blocks 

 I mean that the user invokes some commands like ./hadoop put ***, we
 assume that this  file consistes of 3 blocks, but how HDFS decides where
 these 3 blocks to be put?

 Most of the materials don't involve this issue, but just introduce the data
 replica where talking about blocks in HDFS,


I'm guessing you're looking for the BlockPlacementPolicy
implementations [1] and how it is applied in the HDFS.

Basically, the NameNode chooses the set of DNs for every new-block
request (from a client) using this policy, and the DFSClient gets a
list of all the nodes. It goes on to pick the first one among them to
write the data to. The replication happens async, later.

[1] - BlockPolicyPlacementDefault is the default implementation in
use. It's source available in the o.a.h.hdfs.server.namenode package.

-- 
Harsh J


Re: Getting error in Eclipse setup (SVN issue)

2011-04-16 Thread Harsh J
Hello Shyam,

On Sat, Apr 16, 2011 at 4:05 AM, Shyam Sarkar shyam.s.sar...@gmail.com wrote:
 Hello,

 When I try to download from main trunk of Hadoop SVN I get following error :

 SVN connector cannot be loaded.

This doesn't appear to be a Hadoop issue really. You need to verify
your Eclipse's SVN plugin installation, etc. (sounds like it does not
have a proper connector installed).

You can alternatively get the svn copy using the local 'svn' program
and do an 'ant eclipse' to get Eclipse project files.

-- 
Harsh J


Re: MapReduce compilation error

2011-03-18 Thread Harsh J
This shouldn't really interfere with your development. You may try to
exclude it from Eclipse's build, perhaps.

On Sat, Mar 19, 2011 at 1:39 AM, bikash sharma sharmabiks...@gmail.com wrote:
 Hi,
 When I am compiling MapReduce source code after checking-in Eclipse, I am
 getting the following error:

 The declared package  does not match the expected package testjar
 ClassWithNoPackage.java Hadoop-MR/src/test/mapred/testjar

 Any thoughts?

 Thanks,
 Bikash




-- 
Harsh J
http://harshj.com


Re: pointers to Hadoop eclipse

2011-03-17 Thread Harsh J
http://wiki.apache.org/hadoop/EclipseEnvironment

On Thu, Mar 17, 2011 at 8:17 PM, bikash sharma sharmabiks...@gmail.com wrote:
 Hi,
 Can someone please point to any good reference that tells clearly how to
 checkout Hadoop code base in eclipse, make any changes and re-compile.
 Actually, I wanted to change some part in Hadoop, so wants to see the above
 effect, preferrably in eclipse.

 Thanks,
 Bikash




-- 
Harsh J
http://harshj.com


[jira] Created: (HADOOP-7192) fs -stat docs aren't updated to reflect the format features

2011-03-15 Thread Harsh J Chouraria (JIRA)
fs -stat docs aren't updated to reflect the format features
---

 Key: HADOOP-7192
 URL: https://issues.apache.org/jira/browse/HADOOP-7192
 Project: Hadoop Common
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.21.0
 Environment: Linux / 0.21
Reporter: Harsh J Chouraria
Assignee: Harsh J Chouraria
Priority: Trivial
 Fix For: 0.23.0


The html docs of the 'fs -stat' command (that is found listed in the File 
System Shell Guide), does not seem to have the formatting abilities of -stat 
explained (along with the options).

Like 'fs -help', the docs must also reflect the latest available features.

I shall attach a doc-fix patch shortly.

If anyone has other discrepancies to point out in the web version of the guide, 
please do so :)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: File access pattern on HDFS?

2011-03-07 Thread Harsh J
There is no such information (history of atime changes, although atime
is held for every file in the NN) held by the NameNode right now. I
think HDFS-782 is slightly relevant to maintaining a 'hot-zone' info,
although at a block level and among datanodes. I couldn't find a jira
that talks about keeping a list of atime modifications on the
NameNode.

On Mon, Mar 7, 2011 at 4:00 AM, Gautam Singaraju
gautam.singar...@gmail.com wrote:
 Hi,

 Is there a mechanism to get the list of files accessed on HDFS at the
 NameNode?
 Thanks!
 ---
 Gautam




-- 
Harsh J
www.harshj.com


Re: measure the resource usage of each map/reduce task

2011-03-01 Thread Harsh J
Hello,

On Tue, Mar 1, 2011 at 7:29 PM, bikash sharma sharmabiks...@gmail.com wrote:
 Hi,
 As a follow-up question, do map/reduce tasks run as threads or processes?

Every launched Task runs as an independent process, communicating over
a network interface (lo) with the TaskTracker for reporting/etc.
purposes.

-- 
Harsh J
www.harshj.com


Re: source versioning question

2011-01-10 Thread Harsh J
0.22 had been branched from the trunk quite a while ago (I think that
signifies a feature freeze). So the trunk is now heading for the 0.23
development.

On Tue, Jan 11, 2011 at 11:22 AM, Noah Watkins jayh...@cs.ucsc.edu wrote:
 What is the relation between the current trunk and branch-0.22? Is trunk the 
 current dev for 0.23 or 0.22?

 Thanks,
 Noah



-- 
Harsh J
www.harshj.com


Re: Developing Hadoop in Eclipse

2010-12-13 Thread Harsh J
You can launch them (The daemons) from Eclipse itself -- there must be
a launch target provided in 0.21 if am right, OR you can build a fresh
tar using `ant tar` target.

Schedulers are also pluggable in Hadoop, so you can develop one
without needing to edit Hadoop's sources. Check contrib/ for the
capacity/fair schedulers, for example.

-- 
Harsh J
www.harshj.com


Re: Process ID and Hadoop job ID

2010-12-08 Thread Harsh J
Hi,

On Wed, Dec 8, 2010 at 3:18 PM, radheshyam nanduri
radheshyam.nand...@gmail.com wrote:
 Hi,

 I want to know if there is any way to find out the process id (PID) of a
 task running on a TaskTracker corresponding to a particular Hadoop job ID.
 All the Hadoop tasks are launched as java processes. So, is there any way to
 differentiate among them to get the PID of a particular task of a particular
 Hadoop job.


Not sure if there's a way to get the launched PIDs, but there are
TaskIDs available for every TaskInProgress decided for a job (and
every execution attempt thereof).

-- 
Harsh J
www.harshj.com


Re: FileInputFormat.setInputPaths Problem

2010-12-04 Thread Harsh J
Hi,

2010/12/4 Rawan AlSaad rawan.als...@hotmail.com:
 I need to know how to pass the input folder path to the java class throught 
 the function( FileInputFormat.setInputPaths(conf, new Path(input))

Try FileInputFormat.addInputPath(...) for a single path entry at a
time perhaps? I'm not sure what's going wrong here though.

-- 
Harsh J
www.harshj.com


Re: Configure hadoop in eclipse

2010-11-05 Thread Harsh J
Hi,

On Fri, Nov 5, 2010 at 8:12 PM, Rafael Braga rafaeltelemat...@gmail.com wrote:
 anybody?

 Saw in the link:
 http://osdir.com/ml/dev-harmony-apache/2010-10/msg00017.html
 that it's necessary includ the jar: sun-javadoc.jar, it's correct?

I don't seem to have that JAR on my build path here in Eclipse. All
default Sun JRE jars + Hadoop lib jars + ANT_HOME env property seem
enough.


 And when I tell of connection, I talk about attchment, sorry.

Don't think the ML allows attachments.



 On Thu, Nov 4, 2010 at 1:06 PM, Rafael Braga 
 rafaeltelemat...@gmail.comwrote:

 Sorry, it was a problem on my connection.

 thanks,


 On Thu, Nov 4, 2010 at 1:00 PM, Nan Zhu zhunans...@gmail.com wrote:

 attachment missed?

 Nan

 On Thu, Nov 4, 2010 at 11:35 PM, Rafael Braga rafaeltelemat...@gmail.com
 wrote:

  Hi everybody,
 
        I follow the tutorial:
  http://wiki.apache.org/hadoop/EclipseEnvironment and
  saw the screencast: http://vimeo.com/4193623. The buld's.xml ran
 whitout
  problems
  but after I turn on Project...Build Automatically erros happen in the
  class:
  ExcludePrivateAnnotationsJDiffDoclet (see attachment). And in the view
  Problems of eclipse
  same erros are shown too (see attachment).
 
  what might be wrong?
 
  thanks,
 
  --
  Rafael Braga
 
 




 --
 Rafael Braga http://www.linkedin.com/myprofile?trk=hb_tab_pro




 --
 Rafael Braga http://www.linkedin.com/myprofile?trk=hb_tab_pro




-- 
Harsh J
www.harshj.com


Re: i want to contribute

2010-10-25 Thread Harsh J
On Mon, Oct 25, 2010 at 10:51 AM, goutham patnaik
goutham.patn...@gmail.com wrote:
 i've been looking into contributing the code base and figured writing test
 cases for the common mapreduce examples was a good place to start. i got
 this idea ofcourse, from the main project suggestions page. i was wondering
 about the status of the following jira ticket :

 https://issues.apache.org/jira/browse/MAPREDUCE-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Looks like a person was already working on it. Perhaps it'd be best to
contact him/her before progressing onto this issue? You can find that
person's contact details here:
https://issues.apache.org/jira/secure/ViewProfile.jspa?name=kanjilal


 says its still open ? is anybody working on this actively right now ? if
 not, id like to have a go at it


 Goutham




-- 
Harsh J
www.harshj.com


Re: what happens inside hadoop !!

2010-10-22 Thread Harsh J
The source is your friend. And perhaps a good Java IDE too. I use Eclipse + F3.
But since you ask, you may begin at the wiki:
http://wiki.apache.org/hadoop/FrontPage
There's stuff there not many see, and those pretty much cover enough
to get you started at the right places :)

About a document, I guess O'Malley's Hadoop MR Arch one would help,
but nothing beats reading sources the way its supposed to be done:
http://docs.huihoo.com/apache/hadoop/HadoopMapReduceArch.pdf

On Fri, Oct 22, 2010 at 8:01 PM, Ahmad Shahzad ashahz...@gmail.com wrote:
 Hi ALL,
           Is there any documentation or guide or any presentation about
 what happens inside hadoop. I mean, there are different documentation about
 map-reduce and hdfs and they tell what the do, but what is happening inside
 is not mentioned in those articles. Any idea !!

 Ahmad




-- 
Harsh J
www.harshj.com


Re: HELP !!!! configuring hadoop on ECLIPSE

2010-08-11 Thread Harsh J
Hi,

If all you want to do is to write programs that use your stable hadoop
libraries, have a look at the Hadoop Eclipse plugin that comes along
(inside contrib folders).

If you want your stable hadoop as a project inside your eclipse
itself, run `ant eclipse` in the hadoop's extracted directory (or was
it eclipse-files?) and then import the folder using the 'Existing
Projects into Workspace' import option.

Alternatively, for the former requirement, you may use KarmaSphere's
Hadoop eclipse plugin (the free Community Edition). It's a great tool
to use as well.

Both the Apache-suppled plugin and the KarmaSphere plugin allows you
to run your MR code instantly onto a supplied cluster. And then some.

P.s. You might need to patch the existing apache-supplied hadoop
eclipse plugin a bit to make it usable on the latest versions of
Eclipse. A shameless self-blog-reference follows:
http://www.harshj.com/2010/07/18/making-the-eclipse-plugin-work-for-hadoop/

On Wed, Aug 11, 2010 at 9:33 PM, Ahmad Shahzad ashahz...@gmail.com wrote:
 Hi Saikat,
                Can you please provide more detail on how to do it. I tried
 creating a new java project, but i dont know how to associate hadoop source
 folders.

 Secondly, i tried creating eclipse project from existing an Ant buildfile
 and gives it hadoop build file that is in hadoop directory, but it asks me
 to select the javac decleration to use to define project and gives me a set
 of options such as :
 javac task found in target compile-rcc-compiler 
 javac task found in target compile-core-classes 
 javac task found in target compile-mapred-classes 
 javac task found in target compile-hdfs-classes 
 javac task found in target compile-tools 
 javac task found in target compile-examples 
 javac task found in target compile-core-test 
 javac task found in target compile-ant-tasks 

 i tried it with compile-rcc-compiler and compile-ant-tasks but it gives me
 the following error:

 problem setting classpath of the project from the javac classpath: Reference
 ivu-common.classpath not found.

 I will apprecite your reply.

 Ahmad




-- 
Harsh J
www.harshj.com


Re: How to Build Hadoop code in eclipse

2010-08-11 Thread Harsh J
Running the `ant eclipse-files` target will give you nearly usable
.project and .classpath files. Import the Hadoop project into Eclipse
using these.

Or you could always checkout a stable branch/tag via SVN and go ahead
with the original wiki instructions :)

On Wed, Aug 11, 2010 at 6:02 PM, Ahmad Shahzad ashahz...@gmail.com wrote:
 Hi All,
          I wanted to ask a related question to this one. How would you set
 up hadoop on eclipse if you dont want to download it from svn, rather you
 just want to configure a stable release e.g 0.20.2 on eclipse. So, i want to
 configure a stable release on eclipse and add/change the code i want and run
 it through ant.

 Ahmad

 On Sun, Aug 8, 2010 at 4:36 AM, Saikat Kanjilal sxk1...@hotmail.com wrote:

 I've been able to build the code successfully in Eclipse by using the svn
 plugin and importing the code and using ant.  I actually followed the wiki
 instructions and did an svn checkout inside Eclipse and was able to run all
 of the ant targets successfully.

 Sent from my iPhone

 On Aug 7, 2010, at 6:31 PM, thinke365 thinke...@gmail.com wrote:

 
  Maybe the official way to build hadoop is using hudson, the developers
 just
  using vim to make their work done, without IDE such as Eclipse.
  In my opinion, hadoop did badly to cooperate with IDE.
 
  ashish pareek wrote:
 
  Hello Friends,
                    If you know solution to this problem please reply
 back.
 
 
  On Mon, Jul 27, 2009 at 2:58 PM, ashish pareek pareek...@gmail.com
  wrote:
 
  Hi Everybody,
 
                   Is there any easy and elaborate page where its
  explained
  how to build hadoop code. I followed http://*wiki*.
 apache.org/*hadoop*/*
  Eclipse*Environment instruction and even video but i getting error :
 
  BUILD FAILED : java.net.UnknownHostException : repo2.maven.org
 
  But when accessed through browser this site is working .
 
  I browse through proxy and I have set up user name ans password
  correctly.
 
  Can any one suggest the possible soultion ?
 
  Thanks in advance.
 
  Regards,
  Ashish
 
 
 
 
 
  --
  View this message in context:
 http://old.nabble.com/How-to-Build-Hadoop-code-in-eclipse-tp24676996p29377931.html
  Sent from the Hadoop core-dev mailing list archive at Nabble.com.
 
 





-- 
Harsh J
www.harshj.com


Re: proxy settings for ivy

2010-07-12 Thread Harsh J
Ensure you've set your ANT_OPTS for this, before issuing the ant command.

For example: set ANT_OPTS=-Dhttp.proxyHost=kaboom -Dhttp.proxyPort=2888

There are similar options available for authenticated proxy also :)

On Mon, Jul 12, 2010 at 9:41 PM, Ahmad Shahzad ashahz...@gmail.com wrote:
 Hi ALL,
           Can anyone tell me where i set the proxy settings for ivy. I am
 unable to build hadoop using ant. It says BUILD FAILED
 java.net.ConnectException: Connection refused.The reason is that i am
 connected through a proxy to internet.So, where should i tell hadoop to use
 the proxy.

 Regards,
 Ahmad Shahzad




-- 
Harsh J
www.harshj.com


Re: HOW to COMPILE HADOOP

2010-07-02 Thread Harsh J
Use the ant build.xml (and the provided targets) bundled along?

On Fri, Jul 2, 2010 at 8:51 PM, Ahmad Shahzad ashahz...@gmail.com wrote:
 Hi ALL,
           Can anyone tell me that how will i compile the whole hadoop
 directory if i add some files to hadoop core directory or i change some code
 in some of the files.

 Regards,
 Ahmad Shahzad




-- 
Harsh J
www.harshj.com


<    1   2   3   4