[OFFTOPIC] Big Data Application Meetup

2015-06-02 Thread Alex Baranau
at Hadoop Summit and Spark Summit in the following weeks. Thank you, Alex Baranau

Re: Bug in LocalJobRunner?

2013-03-22 Thread Alex Baranau
Hi Harsh J, Thanx for taking a look. I created https://issues.apache.org/jira/browse/MAPREDUCE-5097 and attached patch. I also provided (ugly, sorry) example of how to get the error. Alex Baranau On Thu, Mar 21, 2013 at 5:58 AM, Harsh J ha...@cloudera.com wrote: Hi Alex, This seems to make

Bug in LocalJobRunner?

2013-03-20 Thread Alex Baranau
); this.job.setClassLoader(classLoader); } I.e. we need to set classloader for job configuration so that it can load classes from the jar. If the above makes sense I will file JIRA with patch, otherwise, what am I missing? Thank you, Alex Baranau

Re: Number of concurrent writer to HDFS

2012-08-06 Thread Alex Baranau
you in advance, Alex Baranau -- Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Mon, Aug 6, 2012 at 2:14 AM, Yanbo Liang yanboha...@gmail.com wrote: You can use scribe or flume to collect log data and integrated with hadoop. 2012/8/4 Nguyen Manh Tien

Bulk Import Data Locality

2012-07-18 Thread Alex Baranau
tasks, this would help us. I believe this is not possible with MR1, please correct me if I'm wrong. Perhaps, this is this possible with MR2? I assume there's no way to provide a hint to a NameNode where to place blocks of a new File too, right? Thank you, -- Alex Baranau -- Sematext :: http

Fwd: Bulk Import Data Locality

2012-07-18 Thread Alex Baranau
to preserve data locality if RS fails down (or when anything else cause re-assigning the region). But since Region size is usually much bigger (usually 10-20 times bigger at least), this fact doesn't buy you something. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase

Fwd: Bulk Import Data Locality

2012-07-18 Thread Alex Baranau
to preserve data locality if RS fails down (or when anything else cause re-assigning the region). But since Region size is usually much bigger (usually 10-20 times bigger at least), this fact doesn't buy you something. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase

hadoop fs -du hbase table size

2011-03-14 Thread Alex Baranau
you, Alex Baranau Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

Making input in Map iterable

2010-12-08 Thread Alex Baranau
(unit-tests work well at least) state. Thank you in advance! Alex Baranau Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

Re: program running faster on single node than cluster

2010-11-17 Thread Alex Baranau
How many nodes do you use for you fully distributed cluster? Alex Baranau Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase On Wed, Nov 17, 2010 at 5:44 AM, Cornelio Iñigo cornelio.ini...@gmail.comwrote: Hi I have a question to you: I developed a program using

Re: repeat a job for different files

2010-11-17 Thread Alex Baranau
In case you need to process the files separately, use one MR job for each file. You can add a single file as input. I believe you'll need to iterate over all files in input dir and start job instance for each file. You can do this in java code or in script or... depending on your case. Alex

Re: program running faster on single node than cluster

2010-11-17 Thread Alex Baranau
many map and reduce tasks started for you job and how many nodes are used to process the job. Alex Baranau Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase On Thu, Nov 18, 2010 at 8:19 AM, Cornelio Iñigo cornelio.ini...@gmail.comwrote: Hi the cluster has 12 nodes

Re: wrong value class error

2010-11-16 Thread Alex Baranau
The message refers to the value not being an IntWritable, which is an *input* value type of your reducer (and the output value type of your mapper). Looks like you have a problem with mapper, not reducer. Alex Baranau Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

Re: JobConf

2010-11-14 Thread Alex Baranau
You might find this search tool valuable: http://search-hadoop.com. You can do search in sources and javadocs separately. Alex Baranau Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - HBase On Sun, Nov 14, 2010 at 10:28 PM, maha m...@umail.ucsb.edu wrote: Never mind Jeff

HBase MR: run more map tasks than regions

2010-09-14 Thread Alex Baranau
) of map tasks in this situation? Is the only way is to enhance TableInputFormat for me? Thank you, Alex Baranau --- http://sematext.com

Re: Client access

2010-09-07 Thread Alex Baranau
for aggregating log data streamed in real time from a large number of servers). (see http://blog.sematext.com/2010/08/02/hadoop-digest-july-2010/ with better formatting and links ;)) Alex Baranau Sematext :: http://sematext.com/ :: Solr - Lucene - Nautch - Hadoop - HBase Hadoop ecosystem search

Re: Research projects with Hadoop

2010-09-07 Thread Alex Baranau
Hi Luan, That's not a new question on these mailing lists, so I'd suggest to start digging into links at http://search-hadoop.com/?q=research+project+ideaspage. Hadoop-related projects are relatively young and full of ideas, good luck with finding your spot! Alex Baranau Sematext :: http

Re: Research projects with Hadoop

2010-09-07 Thread Alex Baranau
Sorry, looks like the link I provided got corrupted, the original was: http://search-hadoop.com/?q=research+project+ideas Alex Baranau Sematext :: http://sematext.com/ :: Solr - Lucene - Nautch - Hadoop - HBase Hadoop ecosystem search :: http://search-hadoop.com/ On Tue, Sep 7, 2010 at 10

Re: Classpath

2010-09-01 Thread Alex Baranau
From http://blog.sematext.com/2010/05/31/hadoop-digest-may-2010/ FAQ section: How can I attach external libraries (jars) which my jobs depend on? You can put them in a “lib” subdirectory of your jar root directory. Alternatively you can use DistributedCache API. Alex Baranau Sematext

Re: missing part folder - how to debug?

2010-09-01 Thread Alex Baranau
Hi, Adding Solr user list. We used similar approach to the one in this patch but with Hadoop Streaming. Did you determine that indices are really missing? I mean did you find missing documents in the output indices? Alex Baranau Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch

Searching more ZooKeeper content

2010-08-25 Thread Alex Baranau
by default? We look into adding this search service for all Hadoop's sub-projects. Assuming people are for this, any suggestions for how the search should function by default or any specific instructions for how the search box should be modified would be great! Thank you, Alex Baranau. P.S. HBase

Searching more MapReduce content

2010-08-25 Thread Alex Baranau
by default? We look into adding this search service for all Hadoop's sub-projects. Assuming people are for this, any suggestions for how the search should function by default or any specific instructions for how the search box should be modified would be great! Thank you, Alex Baranau. P.S. HBase

Searching more Hadoop-Common content

2010-08-25 Thread Alex Baranau
project by default? We look into adding this search service for all Hadoop's sub-projects. Assuming people are for this, any suggestions for how the search should function by default or any specific instructions for how the search box should be modified would be great! Thank you, Alex Baranau. P.S

Searching more HDFS content

2010-08-25 Thread Alex Baranau
? We look into adding this search service for all Hadoop's sub-projects. Assuming people are for this, any suggestions for how the search should function by default or any specific instructions for how the search box should be modified would be great! Thank you, Alex Baranau. P.S. HBase community

Re: FAQ for New to Hadoop

2010-07-11 Thread Alex Baranau
posts on http://blog.sematext.com as well. Alex Baranau Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase Hadoop ecosystem search :: http://search-hadoop.com/ On Fri, Jul 9, 2010 at 1:35 AM, Mark Kerzner markkerz...@gmail.com wrote: Cool, Ken, thank you, I think