Re: Efficient algorithm for many-to-many reduce-side join?

2009-05-28 Thread Chris K Wensel
one dataset, which of course will be problematic (and won't scale) when dealing with large datasets with large numbers of records with the same keys. Does an efficient algorithm exist for a many-to-many reduce-side join? -- Chris K Wensel ch...@concurrentinc.com http://www.concurrentinc.com

Re: Amazon Elastic MapReduce

2009-04-02 Thread Chris K Wensel
...@inf.ed.ac.uk wrote: ... and only in the US Miles 2009/4/2 zhang jianfeng zjf...@gmail.com: Does it support pig ? On Thu, Apr 2, 2009 at 3:47 PM, Chris K Wensel ch...@wensel.net wrote: FYI Amazons new Hadoop offering: http://aws.amazon.com/elasticmapreduce/ And Cascading 1.0

Re: Reducers spawned when mapred.reduce.tasks=0

2009-03-13 Thread Chris K Wensel
, it could be job-setup/ job- cleanup task that is running on a reduce slot. See HADOOP-3150 and HADOOP-4261. -Amareshwari Chris K Wensel wrote: May have found the answer, waiting on confirmation from users. Turns out 0.19.0 and .1 instantiate the reducer class when the task is actually intended

Reducers spawned when mapred.reduce.tasks=0

2009-03-12 Thread Chris K Wensel
the issue seems to manifest with or without spec exec. ckw -- Chris K Wensel ch...@wensel.net http://www.cascading.org/ http://www.scaleunlimited.com/

Cascading support of HBase

2009-02-03 Thread Chris K Wensel
natural to develop and think in than MapReduce. http://www.cascading.org/ enjoy, chris p.s. If you have any code you want to contribute back, just stick it on GitHub and send me a link. -- Chris K Wensel ch...@wensel.net http://www.cascading.org/ http://www.scaleunlimited.com/

Re: Control over max map/reduce tasks per job

2009-02-03 Thread Chris K Wensel
the list before filing an issue because it seems like someone may have thought about this in the past. Thanks. Jonathan Gray -- Chris K Wensel ch...@wensel.net http://www.cascading.org/ http://www.scaleunlimited.com/

Scale Unlimited Professionals Program

2009-02-02 Thread Chris K Wensel
that result from a referral, if any. To be added to our referral list or if you have a project that might benefit from Hadoop or related technologies, please email me directly. This course will also be announced for open public enrollment in the coming days. cheers, chris -- Chris K Wensel ch

Re: Windows Support

2009-01-19 Thread Chris K Wensel
this there were before I put together a patch. Seems bad Java practices to depend on shell utilities :-). Not very platform agnostic... Dan -- Dan Diephouse http://netzooid.com/blog -- Chris K Wensel ch...@wensel.net http://www.cascading.org/ http://www.scaleunlimited.com/

Cascading 1.0.0 Released

2009-01-15 Thread Chris K Wensel
Concurrent, Inc. http://www.concurrentinc.com/ And finally, Advanced Hadoop and Cascading training (and consulting) is available through Scale Unlimited: http://www.scaleunlimited.com/ cheers, chris -- Chris K Wensel ch...@wensel.net http://www.cascading.org/ http://www.scaleunlimited.com/

Re: Storing/retrieving time series with hadoop

2009-01-12 Thread Chris K Wensel
it to draw charts based on time series with fairly low latency? Thanks! Brock -- Chris K Wensel ch...@wensel.net http://www.cascading.org/ http://www.scaleunlimited.com/

Re: Lookup HashMap available within the Map

2008-11-25 Thread Chris K Wensel
cascading a shot for what I am doing. Cheers Tim On Tue, Nov 25, 2008 at 9:24 PM, Chris K Wensel [EMAIL PROTECTED] wrote: Hey Tim The .configure() method is what you are looking for i believe. It is called once per task, which in the default case, is once per jvm. Note Jobs are broken

Re: Auto-shutdown for EC2 clusters

2008-10-24 Thread Chris K Wensel
cluster doing the same kind of job on different data. Karl Anderson [EMAIL PROTECTED] http://monkey.org/~kra -- Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http://www.cascading.org/

Re: Auto-shutdown for EC2 clusters

2008-10-23 Thread Chris K Wensel
23, 2008, at 7:47 AM, Stuart Sierra wrote: Hi folks, Anybody tried scripting Hadoop on EC2 to... 1. Launch a cluster 2. Pull data from S3 3. Run a job 4. Copy results to S3 5. Terminate the cluster ... without any user interaction? -Stuart -- Chris K Wensel [EMAIL PROTECTED] http

Hadoop Training

2008-10-14 Thread Chris K Wensel
, again, give me a shout. cheers, chris -- Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http://www.cascading.org/

Re: Monthly Hadoop User Group Meeting (Bay Area)

2008-09-09 Thread Chris K Wensel
Chris K Wensel wrote: doh, conveniently collides with the GridGain and GridDynamics presentations: http://web.meetup.com/66/calendar/8561664/ Bay Area Hadoop User Group meetings are held on the third Wednesday every month. This has been on the calendar for quite a while. Doug maybe I

Re: Simple Survey

2008-09-09 Thread Chris K Wensel
Quick reminder to take the survey. We know more than a dozen companies are using Hadoop. heh http://www.scaleunlimited.com/survey.html thanks! chris On Sep 8, 2008, at 10:43 AM, Chris K Wensel wrote: Hey all Scale Unlimited is putting together some case studies for an upcoming

Re: Simple Survey

2008-09-09 Thread Chris K Wensel
)?' It would not let me enter more than 10TB (we currently have 45TB of data in our cluster; actual data, not a sum of disk used (with all of its replicas) but unique data). Other than that, I tried :-) On Tue, Sep 9, 2008 at 4:01 PM, Chris K Wensel [EMAIL PROTECTED] wrote: Quick

Re: Basic code organization questions + scheduling

2008-09-08 Thread Chris K Wensel
with Mozilla - http://enigmail.mozdev.org iD8DBQFIxNdWYVRKCnSvzfIRAnJ0AJ9EcXzdyZgouN8q6wtad63SUHP/twCfZ88o 9km8MTJcTQxnc7bijR1Oxs0= =79fZ -END PGP SIGNATURE- -- Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http://www.cascading.org/

Simple Survey

2008-09-08 Thread Chris K Wensel
results will be public. cheers, chris -- Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http://www.cascading.org/

Re: Monthly Hadoop User Group Meeting (Bay Area)

2008-09-08 Thread Chris K Wensel
College, Santa Clara, CA, Building 2, Training Rooms 34. Agenda: Cloud Computing Testbed - Thomas Sandholm, HP Katta on Hadoop - Stefan Groschupf Registration and directions: http://upcoming.yahoo.com/event/1075456/ Look forward to seeing you there! Ajay -- Chris K Wensel [EMAIL

RandomTextWriter

2008-07-07 Thread Chris K Wensel
Hey all Has anyone had success with RandomTextWriter? I'm finding it fairly unstable on 0.16.x, haven't tried 0.17 yet though. chris -- Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http://www.cascading.org/

Re: RandomTextWriter

2008-07-07 Thread Chris K Wensel
, at 10:08 AM, Arun C Murthy wrote: On Jul 7, 2008, at 9:46 AM, Chris K Wensel wrote: Hey all Has anyone had success with RandomTextWriter? I'm finding it fairly unstable on 0.16.x, haven't tried 0.17 yet though. What problems are you seeing? It seems to work fine for me... Arun -- Chris

Re: RandomTextWriter

2008-07-07 Thread Chris K Wensel
fyi, Things seem to be playing nicer on Hadoop 0.17.1. But I'm also now running on c1.medium EC2 instances with the recommended XEN kernel. So that could be a factor as well. ckw On Jul 7, 2008, at 10:30 AM, Chris K Wensel wrote: In local mode, only one mapper succeeds, the remaining

Re: hadoop in the ETL process

2008-07-02 Thread Chris K Wensel
Townsend St., Third Floor San Francisco, CA 94107 -- Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http://www.cascading.org/

Re: Using S3 Block FileSystem as HDFS replacement

2008-07-01 Thread Chris K Wensel
a custom AMI with a modified hadoop-init script right? or am I completely confused? slitz -- Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http://www.cascading.org/

Re: Using S3 Block FileSystem as HDFS replacement

2008-07-01 Thread Chris K Wensel
How do i put something into the fs? something like bin/hadoop fs -put input input will not work well since s3 is not the default fs, so i tried to do bin/hadoop fs -put input s3://ID:[EMAIL PROTECTED]/input (and some variations of it) but didn't worked, i always got an error complaining

Re: realtime hadoop

2008-06-24 Thread Chris K Wensel
to failure against work that must get done, regardless of the amount of work. ckw -- Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http://www.cascading.org/

Re: Ec2 and MR Job question

2008-06-14 Thread Chris K Wensel
from the map outputs store on the hfds. Is there away to make the mappers store the final output in hdfs? -- Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http://www.cascading.org/

Re: does anyone have idea on how to run multiple sequential jobs with bash script

2008-06-11 Thread Chris K Wensel
) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220) -- hustlin, hustlin, everyday I'm hustlin -- Chris K Wensel

Re: does anyone have idea on how to run multiple sequential jobs with bash script

2008-06-11 Thread Chris K Wensel
similar to pig, do you care to provide your comment here? If map reduce programmers are to go to the next level (scripting/query language), which way to go? -- Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http://www.cascading.org/

Re: does anyone have idea on how to run multiple sequential jobs with bash script

2008-06-11 Thread Chris K Wensel
map/reduce jobs, all inter-related, from ~10 unique units of work (internally lots of joins, sorts and math). I can't imagine having written them by hand. ckw -- Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http://www.cascading.org/

Re: contrib EC2 with hadoop 0.17

2008-06-09 Thread Chris K Wensel
be tuned to your application (cpu or io bound). ckw Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http://www.cascading.org/

Re: contrib EC2 with hadoop 0.17

2008-06-07 Thread Chris K Wensel
namenode localhost: no datanode to stop localhost: no secondarynamenode to stop conf files in /usr/local/hadoop-0.17.0 == # cat conf/slaves localhost # cat conf/masters localhost -- Chris Anderson http://jchris.mfdz.com Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http

Re: hadoop on EC2

2008-06-02 Thread Chris K Wensel
of authentication it would still plain http. 2.) Some kind of tunneling solution. The problem on this side is that each of my cluster node is in a different subnet, plus the dualism between the internal and external addresses of the nodes. Any hints? TIA, Andreas Chris K Wensel [EMAIL PROTECTED] http

Re: hadoop on EC2

2008-06-02 Thread Chris K Wensel
need any AWS keys etc. Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http://www.cascading.org/

Re: Read timed out, Abandoning block blk_-5476242061384228962

2008-05-13 Thread Chris K Wensel
cheaply. btw, the email notifying you that you have been approved may lag the actual approval (mine did for days). might be worth trying a larger cluster to see. Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http://www.cascading.org/

Re: Read timed out, Abandoning block blk_-5476242061384228962

2008-05-07 Thread Chris K Wensel
, but there are usually a couple at the end that take longer than I think they should, and they frequently have these sorts of errors. I'm running 20 machines on ec2 right now, with hadoop version 0.16.4. -- James Moore | [EMAIL PROTECTED] blog.restphone.com Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net

Groovy Scripting for Hadoop

2008-05-05 Thread Chris K Wensel
cheers, ckw Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http://www.cascading.org/

Re: Best practices for handling many small files

2008-04-23 Thread Chris K Wensel
-compressed SequenceFile, with the file names as keys. Will that work? Thanks, -Stuart, altlaw.org Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http://www.cascading.org/

Re: Not able to back up to S3

2008-04-17 Thread Chris K Wensel
-tp16737029p16750360.html Sent from the Hadoop core-user mailing list archive at Nabble.com. Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http://www.cascading.org/

Re: adding nodes to an EC2 cluster

2008-04-15 Thread Chris K Wensel
, -stephen Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http://www.cascading.org/

Re: adding nodes to an EC2 cluster

2008-04-15 Thread Chris K Wensel
bin/ hadoop. From here I do not know how to proceed? I basically want to implement http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873 . Hence I created a host using dyndns. If you can help me,it will be great. On Tue, Apr 15, 2008 at 11:30 PM, Chris K Wensel [EMAIL

Re: Hadoop performance on EC2?

2008-04-11 Thread Chris K Wensel
is for the ganglia interface. On Apr 11, 2008, at 2:01 PM, Nate Carlson wrote: On Wed, 9 Apr 2008, Chris K Wensel wrote: make sure all nodes are running in the same 'availability zone', http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1347 check! and that you are using the new xen

Re: Hadoop performance on EC2?

2008-04-09 Thread Chris K Wensel
| | depriving some poor village of its idiot since 1981| Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http://www.cascading.org/

Re: EC2 contrib scripts

2008-03-28 Thread Chris K Wensel
, Chris K Wensel wrote: Hey all I pushed up a patch (and tar) for the ec2 contrib scripts that provide support instance sizes, new zen kernels, availability zones, concurrent clusters, resizing, ganglia, etc. the patch can be found here: https://issues.apache.org/jira/browse/HADOOP-2410 I

Re: Performance / cluster scaling question

2008-03-27 Thread Chris K Wensel
FYI, Just ran a 50 node cluster using one of the new kernels for Fedora with all nodes forced onto the same 'availability zone' and there were no timeouts or failed writes. On Mar 27, 2008, at 4:16 PM, Chris K Wensel wrote: If it's any consolation, I'm seeing similar behaviors on 0.16.0 when

Re: Hadoop on EC2 for large cluster

2008-03-20 Thread Chris K Wensel
AM, Prasan Ary wrote: Chris, What do you mean when you say boot the slaves with the master private name ? === Chris K Wensel [EMAIL PROTECTED] wrote: I found it much better to start the master first, then boot the slaves with the master private name. i do not use

Re: copy - sort hanging

2008-03-14 Thread Chris K Wensel
) at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1191) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) On Mar 13, 2008, at 4:59 PM, Chris K Wensel wrote: I don't really have these logs as i've bounce my cluster. But am willing to ferret out anything in particular on my next

Re: copy - sort hanging

2008-03-13 Thread Chris K Wensel
$BlockReceiver.init(DataNode.java: 1983) at org.apache.hadoop.dfs.DataNode $DataXceiver.writeBlock(DataNode.java:1074) at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938) at java.lang.Thread.run(Thread.java:619) On Mar 13, 2008, at 11:25 AM, Chris K Wensel wrote

Re: copy - sort hanging

2008-03-13 Thread Chris K Wensel
. Raghu. Chris K Wensel wrote: here is a reset, followed by three attempts to write the block. 2008-03-13 13:40:06,892 INFO org.apache.hadoop.dfs.DataNode: Receiving block blk_7813471133156061911 src: /10.251.26.3:35762 dest: /10.251.26.3:50010 2008-03-13 13:40:06,957 INFO

Re: S3/EC2 setup problem: port 9001 unreachable

2008-03-10 Thread Chris K Wensel
the same group, the connectivity seems to be limited. 3.) All AWS docs tell me that VMs in one group have no firewalls in place. So what is happening here? Any ideas? Andreas Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/

Re: Question about using the metrics framework.

2008-03-06 Thread Chris K Wensel
counter + counterName + + group.getCounter(counterName) ); } } randomizeRecord.update(); } Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/

Re: Question about using the metrics framework.

2008-03-06 Thread Chris K Wensel
are not being flushed when the context is shut down, and the flush methods are not implemented for the ganglia context. Chris K Wensel wrote: I have ganglia up on my cluster, and I definitely see some metrics from the map/reduce tasks. But I don't see anything from the JVM context for ganglia

Re: Question about using the metrics framework.

2008-03-06 Thread Chris K Wensel
never mind on the jvm. just found the typo.. frown On Mar 6, 2008, at 3:19 PM, Chris K Wensel wrote: actually, I don't see any jvm metrics across the cluster. any idea how to get a local gmond to gmetad to report local statistics? it is also accumulating slave stats just fine (minus jvm

Re: java.io.IOException: Could not complete write to file // hadoop-0.16.0

2008-02-29 Thread Chris K Wensel
.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java: 67) Any idea why this exception is thrown? Thx in advance. Cu on the 'net, Bye - bye, André èrbnA Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/