Hi,
> I have a question about how to get the location for a bunch of partitions.
...
> But in an enterprise environment I'm pretty sure this approach would not be
> the best because the RDS (mysql or derby) is maybe not reachable or
> I don't have the permission to it.
That was the reason Hive
Hi,
> java.lang.NoSuchMethodError:
> org.apache.hadoop.fs.FileStatus.compareTo(Lorg/apache/hadoop/fs/FileStatus;)I
> (state=,code=0)
Are you rolling your own Hadoop install?
https://issues.apache.org/jira/browse/HADOOP-14683
Cheers,
Gopal
> Yes both of these are valid ways of filtering data before join in Hive.
This has several implementation specifics attached to it. If you're looking at
Hive 1.1 or before, it might not work the same way as Vineet mentioned.
In older versions Calcite rewrites aren't triggered, which prevented
> I wish the Hive team to keep things more backward-compatible as well. Hive is
> such an enormous system with a wide-spread impact so any
> backward-incompatible change could cause an uproar in the community.
The incompatibilities were not avoidable in a set of situations - a lot of
those
Hi,
>> However, we have built Tez on CDH and it runs just fine.
Down that path you'll also need to deploy a slightly newer version of Hive as
well, because Hive 1.1 is a bit ancient & has known bugs with the tez planner
code.
You effectively end up building the hortonworks/hive-release
> I'll try the simplest query I can reduce it to with loads of memory and see
> if that gets anywhere. Other pointers are much appreciated.
Looks like something I'm testing right now (to make the memory setting
cost-based).
https://issues.apache.org/jira/browse/HIVE-21399
A less
Hi,
That looks like the TopN hash optimization didn't kick in, that must be a
settings issue in the install.
| Reduce Output Operator |
| key expressions: _col0 (type: string) |
| sort order: + |
|
>I am running an older version of Hive on MR. Does it have it too?
Hard to tell without an explain.
AFAIK, this was fixed in Aug 2013 - how old is your build?
Cheers,
Gopal
> I expect the maps to do some sorting and limiting in parallel. That way the
> reducer load would be small. I don’t think it does that. Can you tell me why?
They do.
Which version are you running, is it Tez and do you have an explain for the
plan?
Cheers,
Gopal
> ,row_number() over ( PARTITION BY A.dt,A.year, A.month,
>A.bouncer,A.visitor_type,A.device_type order by A.total_page_view_time desc )
>as rank
from content_pages_agg_by_month A
The row_number() window function is a streaming function, so this should not
consume a significant
Hi,
> Subject: Re: hive 3.1 mapjoin with complex predicate produce incorrect results
...
> | 0 if(_col0 is null, 44, _col0) (type: int) |
> | 1 _col0 (type: int) |
That rewrite is pretty neat, but I feel like the IF expression nesting is
Hi,
> I was looking at HiveServer2 performance going through Knox in KNOX-1524 and
> found that HTTP mode is significantly slower.
The HTTP mode does re-auth for every row before HIVE-20621 was fixed – Knox
should be doing cookie-auth to prevent ActiveDirectory/LDAP from throttling
this.
I
Hi,
> It doesn't help if you need concurrent threads writing to a table but we are
> just using the row_number analytic and a max value subquery to generate
> sequences on our star schema warehouse.
Yup, you're right the row_number doesn't help with concurrent writes - it
doesn't even scale
Hi,
> Hopefully someone can tell me if this is a bug, expected behavior, or
> something I'm causing myself :)
I don't think this is expected behaviour, but where the bug is what I'm looking
into.
> We have a custom StorageHandler that we're updating from Hive 1.2.1 to Hive
> 3.0.0.
Most
>query the external table using HiveCLI (e.g. SELECT * FROM
>my_external_table), HiveCLI prints out a table with the correct
If the error is always on a "select *", then the issue might be the SerDe's
handling of included columns.
Check what you get for
colNames =
> msck repair table ;
msck repair does not work on ACID tables.
In Hive 2.x, there is no way to move, replicate or rehydrate ACID tables from a
cold store - the only way it works if you connect to the old metastore.
Cheers,
Gopal
> Because I believe string should be able to handle integer as well.
No, because it is not a lossless conversion. Comparisons are lost.
"9" > "11", but 9 < 11
Even float -> double is lossy (because of epsilon).
You can always apply the Hive workaround suggested, otherwise you might find
Hi,
> on some days parquet was created by hive 2.1.1 and on some days it was
> created by using glue
…
> After some drill down i saw schema of columns inside both type of parquet
> file using parquet tool and found different data types for some column
...
> optional int32 action_date (DATE);
>
> Will it be referring to orc metadata or it will be loading the whole file and
> then counting the rows.
Depends on the partial-scan setting or if it is computing full column stats
(the full column stats does an nDV, which reads all rows).
hive> analyze table compute statistics ...
> By the way, if you want near-real-time tables with Hive, maybe you should
> have a look at this project from Uber: https://uber.github.io/hudi/
> I don't know how mature it is yet, but I think it aims at solving that kind
> of challenge.
Depending on your hive setup, you don't need a
A hive version would help to preface this, because that matters for this (like
TEZ-3709 doesn't apply for hive-1.2).
> I’m trying to simply change the format of a very large partitioned table from
> Json to ORC. I’m finding that it is unexpectedly resource intensive,
> primarily due to a
> I am interested in working on a project that takes a large number of Hive
> queries (as well as their meta data like amount of resources used etc) and
> find out common sub queries and expensive query groups etc.
This was roughly the central research topic of one of the Hive CBO devs,
> Search ’Total length’ in log sys_dag_xxx, it is 2147483648.
This is the INT_MAX “placeholder” value for uncompacted ACID tables.
This is because with ACIDv1 there is no way to generate splits against
uncompacted files, so this gets “an empty bucket + unknown number of inserts +
updates”
> "TBLPROPERTIES ("orc.compress"="Snappy"); "
That doesn't use the Hadoop SnappyCodec, but uses a pure-java version (which is
slower, but always works).
The Hadoop snappyCodec needs libsnappy installed on all hosts.
Cheers,
Gopal
> My conclusion is that a query can update some internal states of HiveServer2,
> affecting DAG generation for subsequent queries.
Other than the automatic reoptimization feature, there's two other potential
suspects.
First one would be to disable the in-memory stats cache's variance param,
> Or a simple insert will be automatically sorted as the table DDL mention ?
Simple insert should do the sorting, older versions of Hive had ability to
disable that (which is a bad thing & therefore these settings are now just
hard-configed to =true in Hive3.x)
-- set
> I'm using Hive 1.2.1 with LLAP on HDP 2.6.5. Tez AM is 3GB, there are 3
> daemons for a total of 34816 MB.
Assuming you're using Hive2 here (with LLAP) and LLAP kinda sucks for ETL
workloads, but this is a different problem.
> PARTITIONED BY (DATAPASSAGGIO string, ORAPASSAGGIO string)
>
> When LLAP Execution Mode is set to 'only' you can't have a macro and window
> function in the same select statement.
The "only" part isn't enforced for the simple select query, but is enforced for
the complex one (the PTF one).
> select col_1, col_2 from macro_bug where otrim(col_1) is not
> When LLAP Execution Mode is set to 'only' you can't have a macro and
window function in the same select statement.
The "only" part isn't enforced for the simple select query, but is enforced
for the complex one (the PTF one).
> select col_1, col_2 from macro_bug where
> This is Hadoop 3.0.3
> java.lang.NoSuchMethodError:
> org.apache.hadoop.fs.FileStatus.compareTo(Lorg/apache/hadoop/fs/FileStatus;)I
> (state=08S01,code=1)
> Something is missing here! Is this specific to ORC tables?
No, it is a Hadoop BUG.
https://issues.apache.org/jira/browse/HADOOP-1468
> So transactional tables only work with hdfs. Thanks for the confirmation
> Elliot.
No, that's not what said.
Streaming ingest into transactional tables requires strong filesystem
consistency and a flush-to-remote operation (hflush).
S3 supports neither of those things and HDFS is not the
> It is 2.7.3
+
> Error: java.io.IOException: java.lang.RuntimeException: ORC split generation
> failed with exception: java.lang.NoSuchMethodError:
> org.apache.hadoop.fs.FileStatus.compareTo(Lorg/apache/hadoop/fs/FileStatus;)I
> (state=,code=0)
> Then I am wondering if the merge statement is impracticable because
> of bad use of myself or because this feature is just not mature enough.
Since you haven't mentioned a Hive version here, I'm going to assume you're
some variant of Hive 1.x & that has some fundamental physical planning
> delta_000_000
...
> I am using Glue data catalog as metastore, so should there be any link up to
> these tables from hive?
That would be why transactions are returning as 0 (there is never a transaction
0), because it is not using a Hive standard metastore.
You might not be able to
> We are copying data from upstream system into our storage S3. As part of
> copy, directories along with Zero bytes files are been copied.
Is this exactly the same issue as the previous thread or a different one?
ot;
so there asking "where is the Hive bucketing spec". Is it just to read the
code for that function? They were looking for something more explicit, I think.
Thanks
- Original Message -
From: "Gopal Vijayaraghavan" <gop...@apache
>* I'm interested in your statement that CLUSTERED BY does not CLUSTER BY.
> My understanding was that this was related to the number of buckets, but you
> are relating it to ORC stripes. It is odd that no examples that I've seen
> include the SORTED BY statement other than in relation to
There's more here than Bucketing or Tez.
> PARTITIONED BY(daydate STRING, epoch BIGINT)
> CLUSTERED BY(r_crs_id) INTO 64 BUCKETS
I hope the epoch partition column is actually a day rollup and not 1 partition
for every timestamp.
CLUSTERED BY does not CLUSTER BY, which it should (but it
Hi,
> Would this also ensure that all the existing data compressed in snappy format
> and the new data stored in zlib format can work in tandem with no disruptions
> or issues to end users who query the table.
Yes.
Each file encodes its own compressor kind & readers use that. The writers
> It also shows that the process is consuming more than 30GB. However, it is
> not clear what is causing the process to consume more than 30GB.
The Xmx only applies to the heap size, there's another factor that is usually
ignored which are the network buffers and compression buffers used by
Hi,
> Caused by: java.lang.ArrayIndexOutOfBoundsException
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1453)
In general HDP specific issues tend to get more attention on HCC, but this is a
pretty old issue stemming from MapReduce being designed for fairly
> However, ideally we wish to manipulate the original query as delivered by the
> user (or as close to it as possible), and we’re finding that the tree has
> been modified significantly by the time it hits the hook
That's CBO. It takes the Query - > AST -> Calcite Tree -> AST -> hook - the
> For example, a Hive job may start Tez containers, which then retrieve data
> from LLAP running concurrently. In the current implementation, this is
> unrealistic
That is how LLAP was built - to push work from Tez to LLAP vertex by vertex,
instead of an all-or-nothing implementation.
Here
Hi,
> I wanted to understand why hive has a performance issue with using _
> character in queries.
This is somewhat of a missed optimization issue - the "%" impl uses a fast
BoyerMoore algorithm and avoids converting from utf-8 bytes -> String.
Hi,
If you've got the 1st starvation fixed (with Hadoop 2.8 patch), all these
configs + enable log4j2 async logging, you should definitely see a performance
improvement.
Here's the log patches, which need a corresponding LLAP config (& have to be
disabled in HS2, for the progress bar to work)
Hi,
> In our test, we found the shuffle stage of LLAP is very slow. Whether need to
> configure some related shuffle value or not?
Shuffle is the one hit by the 2nd, 3rd and 4th resource starvation issues
listed earlier (FDs, somaxconn & DNS UDP packet loss).
> And we get the following log
Hi,
> With these configurations, the cpu utilization of llap is very low.
Low CPU usage has been observed with LLAP due to RPC starvation.
I'm going to assume that the build you're testing is a raw Hadoop 2.7.3 with no
additional patches?
Hadoop-RPC is single-threaded & has a single mutex
Hi,
> Please help us find whether we use the wrong configuration. Thanks for your
> help.
Since there are no details, I'm not sure what configuration you are discussing
here.
A first step would be to check if LLAP cache is actually being used (the LLAP
IO in the explain), vectorization is
> Why jdbc read them as control symbols?
Most likely this is already fixed by
https://issues.apache.org/jira/browse/HIVE-1608
That pretty much makes the default as
set hive.query.result.fileformat=SequenceFile;
Cheers,
Gopal
Hi,
> org.apache.hive.jdbc.HiveResultSetMetaData.getTableName(HiveResultSetMetaData.java:102)
https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveResultSetMetaData.java#L102
I don't think this issue is fixed in any release - this probably needs to go
into a
> . I didn't see data skew for that reducer. It has similar amount of
> REDUCE_INPUT_RECORDS as other reducers.
…
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 8000 rows for
> join key [4092813312923569]
The ratio of REDUCE_INPUT_RECORDS and REDUCE_INPUT_GROUPS is what is
> ) t_result where formable = ’t1'
…
> This sql using 29+ hours in 11 computers cluster within 600G memory.
> In my opinion, the time wasting in the `order by sampledate` and `calculate
> the table B’s record`. Is there a setting to avoid `table B`’s record not to
> get ‘avg_wfoy_b2’ column,
> Now we need an explanation of "map" -- can you supply it?
The "map" mode runs all tasks with a TableScan operator inside LLAP instances
and all other tasks in Tez YARN containers. This is the LLAP + Tez hybrid mode,
which introduces some complexity in debugging a single query.
The "only"
Hi,
> org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get a spark
> session: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create
> spark client.
I get inexplicable errors with Hive-on-Spark unless I do a three step build.
Build Hive first, use that version to
> Are there any frameworks like TPC-DS to benchmark Hive ACID functionality?
Are you trying to work on and improve Hive ACID?
I have a few ACID micro-benchmarks like this
https://github.com/t3rmin4t0r/acid2x-jmh
so that I can test the inner loops of ACID without having any ORC data at all.
> Caused by:
> org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionError:
> VectorMapJoin Hash table loading exceeded memory limits.
> estimatedMemoryUsage: 1644167752 noconditionalTaskSize: 463667612
> inflationFactor: 2.0 threshold: 927335232 effectiveThreshold: 927335232
Most
> java.util.concurrent.ExecutionException: java.io.FileNotFoundException:
> /tmp/staging-slider-HHIwk3/lib/tez.tar.gz (Is a directory)
LLAP expects to find a tarball where tez.lib.uris is - looks like you've got a
directory?
Cheers,
Gopal
> Or, is this an artifact of an incompatibility between ORC files written by
> the Hive 2.x ORC serde not being readable by the Hive 1.x ORC serde?
> 3. Is there a difference in the ORC file format spec. at play here?
Nope, we're still defaulting to hive-0.12 format ORC files in Hive-2.x.
We
> COUNT(DISTINCT monthly_user_id) AS monthly_active_users,
> COUNT(DISTINCT weekly_user_id) AS weekly_active_users,
…
> GROUPING_ID() AS gid,
> COUNT(1) AS dummy
There are two things which prevent Hive from optimize multiple count distincts.
Another aggregate like a count(1) or a Grouping sets
TL;DR - A Materialized view is a much more useful construct than trying to get
limited indexes to work.
That is pretty lively project which has been going on for a while with
Druid+LLAP
https://issues.apache.org/jira/browse/HIVE-14486
> This seems out of the blue but my initial benchmarks
> Running Hive 2.2 w/ LLAP enabled (tried the same thing in Hive 2.3 w/ LLAP),
> queries working but when we submit queries like the following (via our
> automated test framework), they just seem to hang with Parsing
> CommandOther queries seem to work fine Any idea on what's going on
Hi,
> java.lang.Exception: java.util.concurrent.ExecutionException:
> java.lang.NoSuchMethodError:
> org.apache.hadoop.tracing.SpanReceiverHost.getInstance(Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/tracing/SpanReceiverHost;
There's a good possibility that you've built
> cast(NULL as bigint) as malone_id,
> cast(NULL as bigint) as zpid,
I ran this on master (with text vectorization off) and I get
20170626123 NULLNULL10
However, I think the backtracking for the columns is broken, somewhere - where
both the nulls
> I guess I see different things. Having used all the tech. In particular for
> large hive queries I see OOM simply SCANNING THE INPUT of a data directory,
> after 20 seconds!
If you've got an LLAP deployment you're not happy with - this list is the right
place to air your grievances. I
> It is not that simple. The average Hadoop user has years 6-7 of data. They do
> not have a "magic" convert everything button. They also have legacy processes
> that don't/can't be converted.
…
> They do not want the "fastest format" they want "the fastest hive for their
> data".
I've yet
> I kept hearing about vectorization, but later found out it was going to work
> if i used ORC.
Yes, it's a tautology - if you cared about performance, you'd use ORC, because
ORC is the fastest format.
And doing performance work to support folks who don't quite care about it, is
not exactly
> 1711647 -1032220119
Ok, so this is the hashCode skew issue, probably the one we already know about.
https://github.com/apache/hive/commit/fcc737f729e60bba5a241cf0f607d44f7eac7ca4
String hashcode distribution is much better in master after that. Hopefully
that fixes the distinct speed issue
> 1) both do the same thing.
The start of this thread is the exact opposite - trying to suggest ORC is
better for storage & wanting to use it.
> As it relates the columnar formats, it is silly arms race.
I'm not sure "silly" is the operative word - we've lost a lot of fragmentation
of the
> SELECT COUNT(DISTINCT ip) FROM table - 71 seconds
> SELECT COUNT(DISTINCT id) FROM table - 12,399 seconds
Ok, I misunderstood your gist.
> While ip is more unique that id, ip runs many times faster than id.
>
> How can I debug this ?
Nearly the same way - just replace "ip" with "id" in my
Hi,
I think this is worth fixing because this seems to be triggered by the data
quality itself - so let me dig in a bit into a couple more scenarios.
> hive.optimize.distinct.rewrite is True by default
FYI, we're tackling the count(1) + count(distinct col) case in the Optimizer
now (which
> We are looking at migrating files(less than 5 Mb of data in total) with
> variable record lengths from a mainframe system to hive.
https://issues.apache.org/jira/browse/HIVE-10856
+
https://github.com/rbheemana/Cobol-to-Hive/
came up on this list a while back.
> Are there other
> for the slider 0.92, the patch is already applied, right?
Yes, except it has been refactored to a different place.
https://github.com/apache/incubator-slider/blob/branches/branch-0.92/slider-agent/src/main/python/agent/NetUtil.py#L44
Cheers,
Gopal
> NetUtil.py:60 - [Errno 8] _ssl.c:492: EOF occurred in violation of protocol
The error is directly related to the SSL verification error - TLSv1.0 vs
TLSv1.2.
JDK8 defaults to v1.2 and Python 2.6 defaults to v1.0.
Python 2.7.9 + the patch in 0.92 might be needed to get this to work.
AFAIK,
> ERROR 2017-05-09 22:04:56,469 NetUtil.py:62 - SSLError: Failed to connect.
> Please check openssl library versions.
…
> I am using hive 2.1.0, slider 0.92.0, tez 0.8.5
AFAIK, this was reportedly fixed in 0.92.
https://issues.apache.org/jira/browse/SLIDER-942
I'm not sure if the fix in that
Hi,
> Does Hive LLAP work with Parquet format as well?
LLAP does work with the Parquet format, but it does not work very fast, because
the java Parquet reader is slow.
https://issues.apache.org/jira/browse/PARQUET-131
+
https://issues.apache.org/jira/browse/HIVE-14826
In particular to
> But on Hue or JDBC interface to Hive Server 2, the following error occurs
> while SELECT querying the view.
You should be getting identical errors for HS2 and CLI, so that suggests you
might be running different CLI and HS2 versions.
> SELECT COUNT(1) FROM pk_test where ds='2017-04-20';
>
> I'd like to remember that Hive supports ACID (in a very early stages yet) but
> most often that is a feature that most people don't use for real production
> systems.
Yes, you need ACID to maintain multiple writers correctly.
ACID does have a global primary key (which is not a single
> Is there anyway one can enable both (Kerberos and LDAP with SSL) on Hive?
I believe what you're looking for is Apache Knox SSO. And for LDAP users,
Apache Ranger user-sync handles auto-configuration.
That is how SSL+LDAP+JDBC works in the HD Cloud gateway [1].
There might be a similar
> SELECT COUNT(*), COUNT(DISTINCT id) FROM accounts;
…
> 0:01 [8.59M rows, 113MB] [11M rows/s, 146MB/s]
I'm hoping this is not rewriting to the approx_distinct() in Presto.
> I got similar performance with Hive + LLAP too.
This is a logical plan issue, so I don't know if LLAP helps a lot.
A
> My bad. Looks like the thrift server is cycling through various AMs it
> started when the thrift server was started. I think this is different from
> either Hive 2.0.1 or LLAP.
This has been roughly been possible since hive-1.0, if you follow any of the
Tez BI tuning guides over the last 4
> We are using a query with union all and groupby and same table is read
> multiple times in the union all subquery.
…
> When run with Mapreduce, the job is run in one stage consuming n mappers and
> m reducers and all union all scans are done with the same job.
The logical plans are identical
> by setting tez.am.mode.session=false in hive-cli and hive-jdbc via
> hive-server2.
That setting does not work if you do "set tez.am.*" parameters (any tez.am
params).
Can you try doing
hive --hiveconf tez.am.mode.session=false
instead of a set; param and see if that works?
Cheers,
> Using Apache Hive 1.2.1, I get a NullPointerExcetion when performing a
> request through an ODBC driver.
> The request is just a simple LOAD DATA request:
Looks like the NPE is coming from the getResultMetaData() call, which returns
the schema of the rows returned.
LOAD is probably one of
> I try reduce check interval and launch it manuallty with command "Alter
> table tx_tbl compaction 'major';". Nothing helps.
You can check the hive metastore log and confirm it also has the DbTxnManager
set up & that it is triggering the compactions.
Without a standalone metastore, the hive
> Gopal : (yarn logs -application $APPID) doesn't contain a line
> containing HISTORY so it doesn't produce svg file. Should I turn on
> some option to get the lines containing HISTORY in yarn application
> log?
There's a config option tez.am.log.level=INFO which controls who much data is
> Hive LLAP shows better performance than Presto and Spark for most queries,
> but it shows very poor performance on the execution of query 72.
My suspicion will be the the inventory x catalog_sales x warehouse join -
assuming the column statistics are present and valid.
If you could send the
> > 'skip.header.line.count'='1',
Trying removing that config option.
I've definitely seen footer markers disabling file splitting, possibly header
also does.
Cheers,
Gopal
> Has there been any study of how much compressing Hive Parquet tables with
> snappy reduces storage space or simply the table size in quantitative terms?
http://www.slideshare.net/oom65/file-format-benchmarks-avro-json-orc-parquet/20
Since SNAPPY is just LZ77, I would assume it would be
> Error 40003]: Only External tables can have an explicit location
…
> using hive 1.2. I got this error. This was definitely not a requirement
> before
Are you using Apache hive or some vendor fork?
Some BI engines demand there be no aliasing for tables, so each table needs a
unique location
> We have 20 GB txt File, When we have created external table on top of 20
> Gb file, we see Tez is creating only one mapper.
For an uncompressed file, that is very strange. Is this created as "STORED AS
TEXTFILE" or some other strange format?
Cheers,
Gopal
> !connect jdbc:hive2://localhost:1/default; -n hiveuser -p hivepassword
...
> What's missing here? how do I fix it? Thank you very much
Mostly, this is missing the actual protocol specs - this is something which is
never a problem for real clusters because ZK load-balancing automatically
> I have been following the instructions under
> https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+Based+Hive+Authorization
> in great detail with no success.
…
> Error: org.apache.spark.sql.catalyst.parser.ParseException:
You're reading the docs for Apache Hive and trying to
> So no one has a solution?
…
> “mapreduce.job.name” works for M/R queries, not Tez.
Depends on the Hive version you're talking about.
https://issues.apache.org/jira/browse/HIVE-12357
That doesn't help you with YARN, but only with the TezUI (since each YARN AM
runs > 1 queries).
For
> Sounds like VARCHAR and CHAR types were created for Hive to have ANSI SQL
> Compliance. Otherwise they seem to be practically the same as String types.
They are relatively identical in storage, except both are slower on the CPU in
actual use (CHAR has additional padding code in the
> I have also noticed that this execution mode is only applicable to single
> predicate search. It does not work with multiple predicates searches. Can
> someone confirms this please?
Can you explain what you mean?
Vectorization supports multiple & nested AND+OR predicates - with some extra
> Thanks Gopal. Yeah I'm using CloudBerry. Storage is Azure.
Makes sense, only an object store would have this.
> Are you saying this _0,1,2,3 are directories ?.
No, only the zero size "files".
This is really for compat with regular filesystems.
If you have /tmp/1/foo in an object
> For any insert operation, there will be one Zero bytes file. I would like to
> know importance of this Zero bytes file.
They are directories.
I'm assuming you're using S3A + screenshots from something like Bucket explorer.
These directory entries will not be shown if you do something like
> I want to know whether Beeline can handle HTTP redirect or not. I was
> wondering if some of Beeline experts can answer my question?
Beeline uses the hive-jdbc driver, which is the one actually handling network
connections.
That driver in turn, uses a standard
> Actually, we don't have that many partitions - there are lot of gaps both in
> days and time events as well.
Your partition description sounded a lot like one of the FAQs from Mithun's
talks, which is why I asked
> The partition is by year/month/day/hour/minute. I have two directories - over
> two years, and the total number of records is 50Million.
That's a million partitions with 50 rows in each of them?
> I am seeing it takes more than 1hr to complete. Any thoughts, on what could
> be the issue or
1 - 100 of 316 matches
Mail list logo