MapReduce job is not picking up appended data.

2015-01-26 Thread Uthayan Suthakar
I have a Flume which stream data into HDFS sink (appends to same file), which I could hdfs dfs -cat and see it from HDFS. However, when I run MapReduce job on the folder that contains appended data, it only picks up the first batch that was flushed (bacthSize = 100) into HDFS. The rest are not

Which [open-souce] SQL engine atop Hadoop?

2015-01-26 Thread Samuel Marks
Since Hadoop https://hive.apache.org came out, there have been various commercial and/or open-source attempts to expose some compatibility with SQL http://drill.apache.org. I am seeking one which is good for low-latency querying, and supports the most common CRUD https://spark.apache.org,

Re: Time until a datanode is marked as dead

2015-01-26 Thread Nicolas Liochon
Note that there is a difference between being dead and being stale. stale means avoid as much as possible while dead means avoid absolutely AND initiate a recovery, i.e. copy all the data (typically 1 or more Tb) There is some info on this blog entry:

Re: Reliability of timestamps in logs

2015-01-26 Thread Ravi Prakash
Are you running NTP? On Friday, January 23, 2015 12:42 AM, Fabio anyte...@gmail.com wrote: Hi guys, while analyzing SLS logs I noticed some unexpected behaviors, such as resources requests sent before the AM container gets to a RUNNING state. For this reason I started wondering how

RE: Hadoop Security Community

2015-01-26 Thread johny casanova
I also have complicated clients that need help with this. I would like to help also. Date: Mon, 26 Jan 2015 18:49:42 + Subject: Re: Hadoop Security Community From: ranadi...@gmail.com To: user@hadoop.apache.org Hi Adam, I am interested in collaborating on this. I am working for a

Re: yarn jobhistory server not displaying all jobs

2015-01-26 Thread Ravi Prakash
Hi Matt! Take a look at the mapreduce.jobhistory.* configuration parameters here for the delay in moving finished jobs to the HistoryServer:https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml I've seen this error hadoop is not allowed

Re: NN config questions

2015-01-26 Thread Ravi Prakash
Hi Dave! Here the class which is used to store all the edits : https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java#L575 HTHRavi On Monday, January 26, 2015 10:32 AM, dlmar...@comcast.net

NN questions

2015-01-26 Thread dlmarion
If multiple directories are specified for dfs.namenode.name.dir and dfs.namenode.edits.dir, are the writes to the different directories done in parallel or serial? Does dfs.namenode.shared.edits.dir support multiple directories like the properties above? Thanks, Dave

Hadoop Security Community

2015-01-26 Thread Adam Montville
All: The Center for Internet Security (CIS) has established a Community focused on defining a configuration benchmark for Hadoop. We are in the early stages of benchmark development, and hope that you will consider joining the effort. Over the course of the next several days a draft

AW: Hadoop Security Community

2015-01-26 Thread mirko.kaempf
Dear Adam,  I am interested in collaborating on this. I work with Cloudera and teach Hadoop courses, such as the Administrator course. I learn about security implementation and think a common benchmark would be great for the community. What are the requirements for contributions? I volunteer

NN config questions

2015-01-26 Thread dlmarion
If multiple directories are specified for dfs.namenode.name.dir and dfs.namenode.edits.dir, are the writes to the different directories done in parallel or serial? Does dfs.namenode.shared.edits.dir support multiple directories like the properties above? Thanks, Dave

Re: Hadoop Security Community

2015-01-26 Thread Ranadip Chatterjee
Hi Adam, I am interested in collaborating on this. I am working for a large financial institution at the moment and security is a bit pain in the neck at the moment. So, this is a major focus area for me at the moment. Regards, Ranadip On 26 January 2015 at 18:32, mirko.kaempf

Re: Reliability of timestamps in logs

2015-01-26 Thread Fabio
Yes I am, does it make a difference? SLS runs on a single machine, wrapping the RM and simulating the nodes, thus it should use just the system time. Or do you mean there is a chance it's updating the clock while the job is running? Regards Fabio On 01/26/2015 08:00 PM, Ravi Prakash wrote:

Re: Time until a datanode is marked as dead

2015-01-26 Thread Azuryy Yu
Hi Frank, can you file an issue to add this configuration to the hdfs-default.xml? On Mon, Jan 26, 2015 at 5:39 PM, Frank Lanitz frank.lan...@sql-ag.de wrote: Hi, Am 23.01.2015 um 19:23 schrieb Chris Nauroth: The time period for determining if a datanode is dead is calculated as a

Multiple separate Hadoop clusters on same physical machines

2015-01-26 Thread Harun Reşit Zafer
Hi everyone, We have set up and been playing with Hadoop 1.2.x and its friends (Hbase, pig, hive etc.) on 7 physical servers. We want to test Hadoop (maybe different versions) and ecosystem on physical machines (virtualization is not an option) from different perspectives. As a bunch of

Re: Multiple separate Hadoop clusters on same physical machines

2015-01-26 Thread Azuryy Yu
Hi, I think the best way is deploy HDFS federation with Hadoop 2.x. On Mon, Jan 26, 2015 at 5:18 PM, Harun Reşit Zafer harun.za...@tubitak.gov.tr wrote: Hi everyone, We have set up and been playing with Hadoop 1.2.x and its friends (Hbase, pig, hive etc.) on 7 physical servers. We want to

Re: Time until a datanode is marked as dead

2015-01-26 Thread Frank Lanitz
Hi, Am 23.01.2015 um 19:23 schrieb Chris Nauroth: The time period for determining if a datanode is dead is calculated as a function of a few different configuration properties. The current implementation in DatanodeManager.java does it like this: final long heartbeatIntervalSeconds =