subject:"Spark \- “min key = null, max key = null” while reading ORC file"

Re: Spark - “min key = null, max key = null” while reading ORC file

2016-06-20 Thread Mohanraj Ragupathiraj

Thank you very much. On Mon, Jun 20, 2016 at 3:38 PM, Jörn Franke wrote: > If you insert the data sorted then there is not need to bucket the data. > You can even create an index in Spark. Simply set the outputformat > configuration orc.create.index = true > > > On 20 Jun 2016, at 09:10, Mich Ta

Re: Spark - “min key = null, max key = null” while reading ORC file

2016-06-20 Thread Mohanraj Ragupathiraj

Thank you very much. On Mon, Jun 20, 2016 at 3:10 PM, Mich Talebzadeh wrote: > Right, you concern is that you expect storeindex in ORC file to help the > optimizer. > > Frankly I do not know what > write().mode(SaveMode.Overwrite).orc("orcFileToRead" does actually under > the bonnet. From my exp

Re: Spark - “min key = null, max key = null” while reading ORC file

2016-06-20 Thread Jörn Franke

If you insert the data sorted then there is not need to bucket the data. You can even create an index in Spark. Simply set the outputformat configuration orc.create.index = true > On 20 Jun 2016, at 09:10, Mich Talebzadeh wrote: > > Right, you concern is that you expect storeindex in ORC fil

Re: Spark - “min key = null, max key = null” while reading ORC file

2016-06-20 Thread Mich Talebzadeh

Right, you concern is that you expect storeindex in ORC file to help the optimizer. Frankly I do not know what write().mode(SaveMode.Overwrite).orc("orcFileToRead" does actually under the bonnet. From my experience in order for ORC index to be used you need to bucket the table. I have explained th

Re: Spark - “min key = null, max key = null” while reading ORC file

2016-06-19 Thread Mohanraj Ragupathiraj

Hi Mich, Thank you for your reply. Let me explain more clearly. File with 100 records needs to joined with a Big lookup File created in ORC format (500 million records). The Spark process i wrote is returing back the matching records and is working fine. My concern is that it loads the entire fi

Re: Spark - “min key = null, max key = null” while reading ORC file

2016-06-19 Thread Mich Talebzadeh

Hi, To start when you store the data in ORC file can you verify that the data is there? For example register it as tempTable processDF.register("tmp") sql("select count(1) from tmp).show Also what do you mean by index file in ORC? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedi

Spark - “min key = null, max key = null” while reading ORC file

2016-06-19 Thread Mohanraj Ragupathiraj

I am trying to join a Dataframe(say 100 records) with an ORC file with 500 million records through Spark(can increase to 4-5 billion, 25 bytes each record). I used Spark hiveContext API. *ORC File Creation Code* //fsdtRdd is JavaRDD, fsdtSchema is StructType schema DataFrame fsdtDf = hiveContext

Re: Spark - “min key = null, max key = null” while reading ORC file

Re: Spark - “min key = null, max key = null” while reading ORC file

Re: Spark - “min key = null, max key = null” while reading ORC file

Re: Spark - “min key = null, max key = null” while reading ORC file

Re: Spark - “min key = null, max key = null” while reading ORC file

Re: Spark - “min key = null, max key = null” while reading ORC file

Spark - “min key = null, max key = null” while reading ORC file

7 matches

Site Navigation

Mail list logo

Footer information