Re: Vectorised Query Execution extension

2016-08-04 Thread Jörn Franke
Even if it is possible it does only make sense to a certain limit given by your CPU and CPU caches. > On 04 Aug 2016, at 22:57, Mich Talebzadeh wrote: > > As I understand from the manual: > > Vectorized query execution is a Hive feature that greatly reduces the CPU

Re: hive concurrency not working

2016-08-04 Thread Raj hadoop
Thanks everyone.. we are raising case with Hortonworks On Wed, Aug 3, 2016 at 6:44 PM, Raj hadoop wrote: > Dear All, > > In need or your help, > > we have horton works 4 node cluster,and the problem is hive is allowing > only one user at a time, > > if any second resource

Re: Vectorised Query Execution extension

2016-08-04 Thread Gopal Vijayaraghavan
> Vectorized query execution streamlines operations by processing a block >of 1024 rows at a time. The real win of vectorization + columnar is that you get to take advantage of them at the same time. We get to execute the function once per 1024 rows when things are repeating - particularly true

Re: Crate Non-partitioned table from partitioned table using CREATE TABLE .. LIKE

2016-08-04 Thread Mich Talebzadeh
Ok Does it matter whether the table you create is accessible to Hive? You can read your hive table in Spark assuming you know the table name // Read hive table. This one is partitioned scala> val s = HiveContext.table("oraclehadoop.sales") s: org.apache.spark.sql.DataFrame = [prod_id: bigint,

Re: Crate Non-partitioned table from partitioned table using CREATE TABLE .. LIKE

2016-08-04 Thread Nagabhushanam Bheemisetty
Yes you are correct that is just meta copy and I need only that but without partition:( On Thu, Aug 4, 2016 at 5:15 PM Mich Talebzadeh wrote: > yes but that essentially copies the metadata and leaves the partition > there with no data. it is just an image copy. won't

Re: Crate Non-partitioned table from partitioned table using CREATE TABLE .. LIKE

2016-08-04 Thread Mich Talebzadeh
yes but that essentially copies the metadata and leaves the partition there with no data. it is just an image copy. won't help this case Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Crate Non-partitioned table from partitioned table using CREATE TABLE .. LIKE

2016-08-04 Thread Nagabhushanam Bheemisetty
Well you can create using like. CREATE EXTERNAL TABLE sales5 LIKE sales; On Thu, Aug 4, 2016 at 5:06 PM Mich Talebzadeh wrote: > Which process creates the master table in Hive as an external table? There > must be a process that creates the master table as external

Re: Crate Non-partitioned table from partitioned table using CREATE TABLE .. LIKE

2016-08-04 Thread Mich Talebzadeh
Which process creates the master table in Hive as an external table? There must be a process that creates the master table as external table? Hive knows about the schema of that table. It is in Hive metastore. You cannot create an external table with CREATE EXTERNAL TABLE AS ... hive> CREATE

Re: Crate Non-partitioned table from partitioned table using CREATE TABLE .. LIKE

2016-08-04 Thread Nagabhushanam Bheemisetty
I only get the table names that I need to ingest. So I don't know the master table schema upfront. Yes the new table based on master table which is partitioned but new table should not be partitioned and should not have partition column. On Thu, Aug 4, 2016 at 4:54 PM Mich Talebzadeh

Re: Crate Non-partitioned table from partitioned table using CREATE TABLE .. LIKE

2016-08-04 Thread Mich Talebzadeh
Do you know the existing table schema? The new table schema will be based on that table without partitioning? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Vectorised Query Execution extension

2016-08-04 Thread Mich Talebzadeh
As I understand from the manual: Vectorized query execution is a Hive feature that greatly reduces the CPU usage for typical query operations like scans, filters, aggregates, and joins. A standard query execution system processes one row at a time. This involves long code .. Vectorized query

Re: Hive LIKE predicate. '_' wildcard decrease perfomance

2016-08-04 Thread Gopal Vijayaraghavan
> where res_url like '%mts.ru%' ... > where res_url like '%mts_ru%' ... > Why '_' wildcard decrease perfomance? Because it misses the fast path by just one "_". ORC vectorized reader has a zero-copy check for 3 patterns - prefix, suffix and middle. That means "https://%;, "%.html", "%mts.ru%"

Re: Malformed orc file

2016-08-04 Thread Prasanth Jayachandran
Hi In case of streaming, when a transaction is open orc file is not closed and hence may not be flushed completely. Did the transaction commit successfully? Or was there any exception thrown during writes/commit? Thanks Prasanth On Aug 3, 2016, at 6:09 AM, Igor Kuzmenko

Re: Iterating over partitions using the metastore API

2016-08-04 Thread Elliot West
Thanks for your reply. I hadn't considered driving it from a list of partition names. To avoid the N+1 reads I am considering reading in batches like so: - Sorting the names - Taking every nth name (where n is the batch size) to use a a batch boundary. - Building a filter derived

Hive LIKE predicate. '_' wildcard decrease perfomance

2016-08-04 Thread Igor Kuzmenko
I've got Hive Transactional table 'data_http' in ORC format, containing around 100.000.000 rows. When I execute query: select * from data_http where res_url like '%mts.ru%' it completes in 10 seconds. But executing query select * from data_http where res_url like '%mts_ru%' takes more than

Re: Create table from orc file

2016-08-04 Thread Johannes Stamminger
Some progress: I could eliminate the error reported in a): the data file needs to be named 00_0 and must be placed in a the directory denoted by the location given at table creation. This is what the error message is about? ;-) Now the situation for a) is the same as for b): Trying to

Re: Iterating over partitions using the metastore API

2016-08-04 Thread Furcy Pin
Hi Elliot, I guess you can use IMetaStoreClient.listPartitionsNames instead, and then use IMetaStoreClient.getPartition for each partition. This might be slow though, as you will have to make 10 000 calls to get them. Another option I'd consider is connecting directly to the Hive metastore. This

Iterating over partitions using the metastore API

2016-08-04 Thread Elliot West
Hello, I have a process that needs to iterate over all of the partitions in a table using the metastore API.The process should not need to know about the structure or meaning of the partition key values (i.e. whether they are dates, numbers, country names etc), or be required to know the existing