Re: Container out of memory: ORC format with many dynamic partitions

2016-04-30 Thread Gopal Vijayaraghavan
> SET hive.exec.orc.memory.pool=1.0; Might be a bad idea in general, this causes more OOMs than less. > SET mapred.map.child.java.opts=-Xmx2048M; > SET mapred.child.java.opts=-Xmx2048M; ... > Container >[pid=6278,containerID=container_e26_1460661845156_49295_01_000244] is >running beyond

Re: Making sqoop import use Spark engine as opposed to MapReduce for Hive

2016-04-30 Thread Mich Talebzadeh
Well Had to write a Scala code and compile it with Maven to make it work. Still doing it. The good thing as I expected it is doing a Direct Path Read (as opposed to the Conventional Path Read) from the source Oracle database.

Re: Making sqoop import use Spark engine as opposed to MapReduce for Hive

2016-04-30 Thread Mich Talebzadeh
yes I was thinking of that. use Spark to load JDBC data from Oracle and flush it into ORC table in Hive. Now I am using Spark 1.6.1 and JDBC driver as I recall (I raised a thread for it) throwing error. This was working under Spark 1.5.2. Cheers Dr Mich Talebzadeh LinkedIn *

Re: Making sqoop import use Spark engine as opposed to MapReduce for Hive

2016-04-30 Thread Jörn Franke
I do not think you make it faster by setting the execution engine to Spark. Especially with such an old Spark version. For such simple things such as "dump" bulk imports and exports, it does matter much less if it all what execution engine you use. There was recently a discussion on that on the

Re: Making sqoop import use Spark engine as opposed to MapReduce for Hive

2016-04-30 Thread Marcin Tustin
No, the execution engines are not in general interchangeable. The Hive project uses an abstraction layer to be able to plug different execution engines. I don't know if sqoop uses hive code, or if it uses an old version, or what. As with many things in the hadoop world, if you want to know if

Re: Making sqoop import use Spark engine as opposed to MapReduce for Hive

2016-04-30 Thread Mich Talebzadeh
Hi Marcin, It is the speed really. The speed in which data is digested into Hive. Sqoop is two stage as I understand. 1. Take the data out of RDMSD via JADB and put in on an external HDFS file 2. Read that file and insert into a Hive table The issue is the second part. In general I

Re: Making sqoop import use Spark engine as opposed to MapReduce for Hive

2016-04-30 Thread Marcin Tustin
They're not simply interchangeable. sqoop is written to use mapreduce. I actually implemented my own replacement for sqoop-export in spark, which was extremely simple. It wasn't any faster, because the bottleneck was the receiving database. Is your motivation here speed? Or correctness? On Sat,

Making sqoop import use Spark engine as opposed to MapReduce for Hive

2016-04-30 Thread Mich Talebzadeh
Hi, What is the simplest way of making sqoop import use spark engine as opposed to the default mapreduce when putting data into hive table. I did not see any parameter for this in sqoop command line doc. Thanks Dr Mich Talebzadeh LinkedIn *

Re: Container out of memory: ORC format with many dynamic partitions

2016-04-30 Thread Jörn Franke
I would still need some time to dig deeper in this. Are you using a specific distribution? Would it be possible to upgrade to a more recent Hive version? However, having so many small partitions is a bad practice which seriously affects performance. Each partition should at least contain

Re: Hive configuration parameter hive.enforce.bucketing does not exist in Hive 2

2016-04-30 Thread Mich Talebzadeh
Ok thanks Lefty Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com On 30 April 2016 at 02:23, Lefty