Re: hadoop input/output format advanced control

2015-03-25 Thread Aaron Davidson
t; we've suggested that a user should read from hdfs themselves (eg., > to > >> > read > >> > > multiple files together in one partition) -- with*out* reusing the > code > >> > in > >> > > HadoopRDD, though they would lose things like th

Re: hadoop input/output format advanced control

2015-03-25 Thread Patrick Wendell
> > preferred locations you get from HadoopRDD. Does HadoopRDD need to >> some >> > > refactoring to make that easier to do? Or do we just need a good >> > example? >> > > >> > > Imran >> > > >> > > (sorry for hijacking y

Re: hadoop input/output format advanced control

2015-03-25 Thread Sandy Ryza
oopRDD need to > some > > > refactoring to make that easier to do? Or do we just need a good > > example? > > > > > > Imran > > > > > > (sorry for hijacking your thread, Koert) > > > > > > > > > > > > On Mon, Mar

Re: hadoop input/output format advanced control

2015-03-25 Thread Imran Rashid
> > > > (sorry for hijacking your thread, Koert) > > > > > > > > On Mon, Mar 23, 2015 at 3:52 PM, Koert Kuipers > wrote: > > > > > see email below. reynold suggested i send it to dev instead of user > > > > > > -- Forwarded message -- >

Re: hadoop input/output format advanced control

2015-03-25 Thread Koert Kuipers
gt;> >>> >> criteria > >> >>> >> are: > >> >>> >> (a) common operations > >> >>> >> (b) error-prone / difficult to implement > >> >>> >> (c) non-obvious, but important for p

Re: hadoop input/output format advanced control

2015-03-25 Thread Patrick Wendell
>>> >> I think this case fits (a) & (c), so I think its still worthwhile. >> >>> >> But its >> >>> >> also worth asking whether or not its too difficult for a user to >> >>> >> extend >> >>> >&g

Re: hadoop input/output format advanced control

2015-03-25 Thread Koert Kuipers
read from hdfs themselves (eg., > to > >>> >> read > >>> >> multiple files together in one partition) -- with*out* reusing the > >>> >> code in > >>> >> HadoopRDD, though they would lose things like the metric tracking & &

Re: hadoop input/output format advanced control

2015-03-24 Thread Patrick Wendell
gt;>> >> where >>> >> we've suggested that a user should read from hdfs themselves (eg., to >>> >> read >>> >> multiple files together in one partition) -- with*out* reusing the >>> >> code in >>> >> HadoopRDD, th

Re: hadoop input/output format advanced control

2015-03-24 Thread Koert Kuipers
adoopRDD need to >> some >> >> refactoring to make that easier to do? Or do we just need a good >> example? >> >> >> >> Imran >> >> >> >> (sorry for hijacking your thread, Koert) >> >> >> >> >> >&

Re: hadoop input/output format advanced control

2015-03-24 Thread Koert Kuipers
gt; > >> Imran > >> > >> (sorry for hijacking your thread, Koert) > >> > >> > >> > >> On Mon, Mar 23, 2015 at 3:52 PM, Koert Kuipers > wrote: > >> > >> > see email below. reynold suggested i send it to dev instead of use

Re: hadoop input/output format advanced control

2015-03-24 Thread Patrick Wendell
gt;> refactoring to make that easier to do? Or do we just need a good example? >> >> Imran >> >> (sorry for hijacking your thread, Koert) >> >> >> >> On Mon, Mar 23, 2015 at 3:52 PM, Koert Kuipers wrote: >> >> > see email below. r

Re: hadoop input/output format advanced control

2015-03-24 Thread Nick Pentreath
t; > Imran > > (sorry for hijacking your thread, Koert) > > > > On Mon, Mar 23, 2015 at 3:52 PM, Koert Kuipers wrote: > > > see email below. reynold suggested i send it to dev instead of user > > > > ------ Forwarded message ------ > > F

Re: hadoop input/output format advanced control

2015-03-24 Thread Imran Rashid
> see email below. reynold suggested i send it to dev instead of user > > -- Forwarded message -- > From: Koert Kuipers > Date: Mon, Mar 23, 2015 at 4:36 PM > Subject: hadoop input/output format advanced control > To: "u...@spark.apache.org" > >

Fwd: hadoop input/output format advanced control

2015-03-23 Thread Koert Kuipers
see email below. reynold suggested i send it to dev instead of user -- Forwarded message -- From: Koert Kuipers Date: Mon, Mar 23, 2015 at 4:36 PM Subject: hadoop input/output format advanced control To: "u...@spark.apache.org" currently its pretty hard to control