Seems TeZ is spawning many processes and using all file descriptors, causing
Unix to temporarily run out of resources.
I suppose this may be the problem, but don't know why it doesn't happen when
2nd query is invoked. It always fails on 3rd query.
Is there any settings which can prevent this
Hi,
Is there any Date Function which returns Full Month Name for given time
stamp.
Is there any support of Theta Join in Spark. We have a requirement to
identify the country name based on Range of IP Address in a table.
Forwarded Message
Subject:Support of Theta Join
Date: Thu, 12 Jan 2017 15:19:51 +
From: Mahender Sarangam
Ref:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions
.
int
month(string date)
Returns the month part of a date or a timestamp string: month("1970-11-01
00:00:00") = 11, month("1970-11-01") = 11.
Does it fit in your requirement?.
Thanks
On
Coming from DBMS background I tend to treat the columns in Hive similar to
an RDBMS table. For example if a table created in Hive as Parquet I will
use VARCHAR(30) for column that has been defined as VARCHAR(30) as source.
If a column is defined as TEXT in RDBMS table I use STRING in Hive with a
hi Mahender,
I don't know your version of Hive .
Please try :
date_format(curren_date,'M')
regards
Dev
On Mon, Jan 16, 2017 at 6:56 PM, Jitendra Yadav
wrote:
> Ref: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#
>
How is that efficient storage wise because as far as I see it is in hdfs
and storage is based on your block size.
Am i missing something here?
On Jan 16, 2017 9:07 PM, "Mich Talebzadeh"
wrote:
Coming from DBMS background I tend to treat the columns in Hive similar
thanks both.
String has a max length of 2GB so in a MapReduce with a 128MB block size we
are talking about 16 blocks. With VARCHAR(30) we are talking about 1 block.
I have not really experimented with this, however, I assume a table of 100k
rows with VARCHAR columns will have a smaller footprint
Thanks Elliot for the insight.
Another issue that Spark does not support "CHAR" types. It supports
VARCHAR. Often one uses Spark as well on these tables.
This should not really matter. I tend to define CHA(N) to be VARCHAR(N) as
the assumption is that the table ingested into Parquet say is
Maybe the wrong configuration file is picked up?
> On 17 Jan 2017, at 07:44, wenxing zheng wrote:
>
> Dear all,
>
> I met an issue in the TEZ configuration for HIVE, as from the HIVE logs file:
>
>> Caused by: java.io.FileNotFoundException: File does not exist:
>>
Sorry never mind my previous mail... in the stack it seems to look exactly for
this file. Can you try to download the file? Can you check if these are all
files needed? I think you need to extract the .tar.gz and point to the jars
(check the Tez web site for the confit).
> On 17 Jan 2017, at
Dear all,
I met an issue in the TEZ configuration for HIVE, as from the HIVE logs
file:
> *Caused by: java.io.FileNotFoundException: File does not exist:
> hdfs://hdfscluster/apps/tez-0.8.4/tez.tar.gz*
> *at
>
Sounds like VARCHAR and CHAR types were created for Hive to have ANSI SQL
Compliance. Otherwise they seem to be practically the same as String types.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
Internally it looks as though Hive simply represents CHAR/VARCHAR values
using a Java String and so I would not expect a significant change in
execution performance. The Hive JIRA suggests that these types were added
to 'support for more SQL-compliant behavior, such as SQL string comparison
> Sounds like VARCHAR and CHAR types were created for Hive to have ANSI SQL
> Compliance. Otherwise they seem to be practically the same as String types.
They are relatively identical in storage, except both are slower on the CPU in
actual use (CHAR has additional padding code in the
15 matches
Mail list logo