RE: interfaces to HDFS icw Kerberos

2010-11-30 Thread Gibbon, Robert, VF-Group
Hi Evert, We use the WebDAV tool integrated with HUE for authenticated ad-hoc read/write access but for full throttle inbound data, our implementation uses Flume. With that said, the WebDAV solution is horizontally scalable - it is a stateless web app - so a software or hardware loadbalancer cou

RE: web-based file transfer

2010-11-09 Thread Gibbon, Robert, VF-Group
>Even all the Java servlet APIs assume that the content-length header >fits into a signed 32 bit integer and gets unhappy once you go over 2GB >(something I worry about in >http://jira.smartfrog.org/jira/browse/SFOS-1476 ) I built my HDFS webDAV implementation to reference the JackRabbit 1.6.4 l

RE: web-based file transfer

2010-11-05 Thread Gibbon, Robert, VF-Group
ct stores) and we found the protocol to be very "chatty". Since our use-case is fairly simple (just need to transfer lots of files from lots of clients; navigating the results isn't necessary), will the webdav solution be too much? Comments? Thanks! -Original Message- From: G

RE: web-based file transfer

2010-11-03 Thread Gibbon, Robert, VF-Group
Check out HDFS over WebDAV - http://www.hadoop.iponweb.net/Home/hdfs-over-webdav WebDAV is an HTTP based protocol for accessing remote filesystems. I'm running an adapted version of this. It runs under Jetty which is pretty industry standard and is built on Apache JackRabbit which is pretty pro

RE: Problem: when I run a pig's script I got one reduce task

2010-08-05 Thread Gibbon, Robert, VF-Group
Use the PARALLEL clause of course! PARALLEL n Increase the parallelism of a job by specifying the number of reduce tasks, n. The default value for n is 1 (one reduce task). Note the following: * Parallel only affects the number of reduce tasks. Map parallelism is determined by the input fil

RE: Google Patent on MapReduce

2010-01-22 Thread Gibbon, Robert, VF-Group
Software patents are not recognised nor enforcable in European Union member states (UK, France, Germany, Belgium etc.). The Apache license v.2.0 states: "This License shall be governed by the law of the jurisdiction specified in a notice contained within the Original Software..." Do you know w

RE: Which Hadoop product is more appropriate for a quick query on a large data set?

2010-01-06 Thread Gibbon, Robert, VF-Group
Isn't this what Hadoop Hbase is supposed to do? The partioning M/R implementation - "sharding" in street - is the sideways scaling that Hbase is designed to excel at! Also the indexed hbase flavour could allow very fast ad-hoc queries for Xueling using the new yet familiar HBQL sql-dialect? So

Hadoop with SELinux?

2010-01-05 Thread Gibbon, Robert, VF-Group
Hello list Can someone please tell me if it would be possible to run hadoop with SELinux enabled across the cluster? Are there any known issues or better, how2's I can be pointed at? Also interested in running iptables on the nodes - easy to do? Many thanks in advance Robert Robert Gibbon Solu