Even if it is possible it does only make sense to a certain limit given by your
CPU and CPU caches.
> On 04 Aug 2016, at 22:57, Mich Talebzadeh wrote:
>
> As I understand from the manual:
>
> Vectorized query execution is a Hive feature that greatly reduces the CPU
Thanks everyone..
we are raising case with Hortonworks
On Wed, Aug 3, 2016 at 6:44 PM, Raj hadoop wrote:
> Dear All,
>
> In need or your help,
>
> we have horton works 4 node cluster,and the problem is hive is allowing
> only one user at a time,
>
> if any second resource
> Vectorized query execution streamlines operations by processing a block
>of 1024 rows at a time.
The real win of vectorization + columnar is that you get to take advantage
of them at the same time.
We get to execute the function once per 1024 rows when things are
repeating - particularly true
Ok
Does it matter whether the table you create is accessible to Hive?
You can read your hive table in Spark assuming you know the table name
// Read hive table. This one is partitioned
scala> val s = HiveContext.table("oraclehadoop.sales")
s: org.apache.spark.sql.DataFrame = [prod_id: bigint,
Yes you are correct that is just meta copy and I need only that but without
partition:(
On Thu, Aug 4, 2016 at 5:15 PM Mich Talebzadeh
wrote:
> yes but that essentially copies the metadata and leaves the partition
> there with no data. it is just an image copy. won't
yes but that essentially copies the metadata and leaves the partition there
with no data. it is just an image copy. won't help this case
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
Well you can create using like.
CREATE EXTERNAL TABLE sales5 LIKE sales;
On Thu, Aug 4, 2016 at 5:06 PM Mich Talebzadeh
wrote:
> Which process creates the master table in Hive as an external table? There
> must be a process that creates the master table as external
Which process creates the master table in Hive as an external table? There
must be a process that creates the master table as external table? Hive
knows about the schema of that table. It is in Hive metastore.
You cannot create an external table with CREATE EXTERNAL TABLE AS ...
hive> CREATE
I only get the table names that I need to ingest. So I don't know the
master table schema upfront.
Yes the new table based on master table which is partitioned but new table
should not be partitioned and should not have partition column.
On Thu, Aug 4, 2016 at 4:54 PM Mich Talebzadeh
Do you know the existing table schema? The new table schema will be based
on that table without partitioning?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
As I understand from the manual:
Vectorized query execution is a Hive feature that greatly reduces the CPU
usage for typical query operations like scans, filters, aggregates, and
joins. A standard query execution system processes one row at a time. This
involves long code .. Vectorized query
> where res_url like '%mts.ru%'
...
> where res_url like '%mts_ru%'
...
> Why '_' wildcard decrease perfomance?
Because it misses the fast path by just one "_".
ORC vectorized reader has a zero-copy check for 3 patterns - prefix,
suffix and middle.
That means "https://%;, "%.html", "%mts.ru%"
Hi
In case of streaming, when a transaction is open orc file is not closed and
hence may not be flushed completely. Did the transaction commit successfully?
Or was there any exception thrown during writes/commit?
Thanks
Prasanth
On Aug 3, 2016, at 6:09 AM, Igor Kuzmenko
Thanks for your reply. I hadn't considered driving it from a list of
partition names.
To avoid the N+1 reads I am considering reading in batches like so:
- Sorting the names
- Taking every nth name (where n is the batch size) to use a a batch
boundary.
- Building a filter derived
I've got Hive Transactional table 'data_http' in ORC format, containing
around 100.000.000 rows.
When I execute query:
select * from data_http
where res_url like '%mts.ru%'
it completes in 10 seconds.
But executing query
select * from data_http
where res_url like '%mts_ru%'
takes more than
Some progress: I could eliminate the error reported in a): the data file needs
to be named 00_0 and must be placed in a the directory denoted by the
location given at table creation. This is what the error message is about? ;-)
Now the situation for a) is the same as for b):
Trying to
Hi Elliot,
I guess you can use IMetaStoreClient.listPartitionsNames instead, and then
use IMetaStoreClient.getPartition for each partition.
This might be slow though, as you will have to make 10 000 calls to get
them.
Another option I'd consider is connecting directly to the Hive metastore.
This
Hello,
I have a process that needs to iterate over all of the partitions in a
table using the metastore API.The process should not need to know about the
structure or meaning of the partition key values (i.e. whether they are
dates, numbers, country names etc), or be required to know the existing
18 matches
Mail list logo