Re: Hive, Tez, clustering, buckets, and Presto

2018-04-04 Thread Richard A. Bross
This is great information, Gopal, thank you. I wish I had the time to create a comparison for our use case between Hive buckets and ORC files on S3 and ORC files without bucket. Unfortunately it's a chicken and egg issue, since I won't have enough data volume until we are in production, which

Re: Hive, Tez, clustering, buckets, and Presto

2018-04-04 Thread Gopal Vijayaraghavan
> so there asking "where is the Hive bucketing spec". Is it just to read the > code for that function? This worked the other way around in time, than writing a spec first - ACIDv1 implemented Streaming ingest via Storm, it used an explicit naming "bucket_" for the filename. Since until the

[SECURITY] CVE-2018-1284: Hive UDF series UDFXPathXXXX allow users to pass carefully crafted XML to access arbitrary files

2018-04-04 Thread Daniel Dai
CVE-2018-1284: Hive UDF series UDFXPath allow users to pass carefully crafted XML to access arbitrary files Severity: Important Vendor: The Apache Software Foundation Versions Affected: This vulnerability affects all versions from 0.6.0 Description: Malicious user might use any xpath UDFs

[SECURITY] CVE-2018-1282 JDBC driver is susceptible to SQL injection attack if the input parameters are not properly cleaned

2018-04-04 Thread Daniel Dai
CVE-2018-1282: JDBC driver is susceptible to SQL injection attack if the input parameters are not properly cleaned Severity: Important Vendor: The Apache Software Foundation Versions Affected: This vulnerability affects all versions of Hive JDBC driver from 0.7.1 Description: This

[SECURITY] CVE-2018-1315 'COPY FROM FTP' statement in HPL/SQL can write to arbitrary location if the FTP server is compromised

2018-04-04 Thread Daniel Dai
CVE-2018-1315: 'COPY FROM FTP' statement in HPL/SQL can write to arbitrary location if the FTP server is compromised Severity: Moderate Vendor: The Apache Software Foundation Versions Affected: Hive 2.1.0 to 2.3.2 Description: When 'COPY FROM FTP' statement is run using HPL/SQL extension to

[ANNOUNCE] Apache Hive 2.3.3 Released

2018-04-04 Thread Daniel Dai
The Apache Hive team is proud to announce the release of Apache Hive version 2.3.3. The Apache Hive (TM) data warehouse software facilitates querying and managing large datasets residing in distributed storage. Built on top of Apache Hadoop (TM), it provides, among others: * Tools to enable easy

Re: [Announce] Hive-MR3: Hive running on top of MR3

2018-04-04 Thread Thai Bui
It would be interesting to see how this compares to Hive LLAP on Tez. Since the llap daemons contain a queue of tasks that is shared amongst many Tez AMs, it could have similar characteristics to the way MR3 is sharing the containers between the AMs. On Wed, Apr 4, 2018 at 10:06 AM Sungwoo Park

[Announce] Hive-MR3: Hive running on top of MR3

2018-04-04 Thread Sungwoo Park
Hello Hive users, I am pleased to announce MR3 and Hive-MR3. Please visit the following webpage for everything on MR3 and Hive-MR3: https://mr3.postech.ac.kr/ http://datamonad.com Here is a description of MR3 and Hive-MR3 from the webpage: MR3 is a new execution engine for Hadoop. Similar in

ForeignKeysRequest Issue

2018-04-04 Thread Courtney Edwards
Hi, In implementing the ThriftHiveMetastore.Iface we have run into some issues understanding the get_foreign_keys(ForeignKeysRequest request) method contract. We are receiving

Re: Building Datwarehouse Application in Spark

2018-04-04 Thread Richard A. Bross
Mahender, To really address your question I think that you'd have to supply a bit more information, such as the kind of data that you want to save; RBDMS type look ups, key/value/index type look ups, insert velocity, etc. These wide choices of technologies are suited to different use cases,

Re: Building Datwarehouse Application in Spark

2018-04-04 Thread Furcy Pin
Hi Mahender, Did you look at this? https://www.snappydata.io/blog/the-spark-database But I believe that most people handle this use case by either using: - Their favorite regular RDBMS (mySQL, postgres, Oracle, SQL-Server, ...) if the data is not too big - Their favorite New-SQL storage

Building Datwarehouse Application in Spark

2018-04-04 Thread Mahender Sarangam
Hi, Does anyone has good architecture document/design principle for building warehouse application using Spark. Is it better way of having Hive Context created with HQL and perform transformation or Directly loading files in dataframe and perform data transformation. We need to implement SCD