[jira] [Created] (HAWQ-1399) Invalid references to gpfs_hdfs_tell in elog statements

2017-03-21 Thread Kyle R Dunn (JIRA)
Kyle R Dunn created HAWQ-1399:
-

 Summary: Invalid references to gpfs_hdfs_tell in elog statements
 Key: HAWQ-1399
 URL: https://issues.apache.org/jira/browse/HAWQ-1399
 Project: Apache HAWQ
  Issue Type: Bug
  Components: Core
Reporter: Kyle R Dunn
Assignee: Ed Espino
 Fix For: backlog


Several warning messages attribute the originating function incorrectly as 
{{gpfs_hdfs_tell()}}.

[Line 665 | 
https://github.com/apache/incubator-hawq/blob/master/src/bin/gpfilesystem/hdfs/gpfshdfs.c#L665]
 should be {{gpfs_hdfs_truncate()}}

[Line 720 | 
https://github.com/apache/incubator-hawq/blob/master/src/bin/gpfilesystem/hdfs/gpfshdfs.c#L720]
 should be {{gpfs_hdfs_getpathinfo()}}

[Line 760 | 
https://github.com/apache/incubator-hawq/blob/master/src/bin/gpfilesystem/hdfs/gpfshdfs.c#L760]
 should be {{gpfs_hdfs_freefileinfo()}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HAWQ-1398) Warnings in gpfs_hdfs_disconnect() reference incorrect originating function

2017-03-21 Thread Kyle R Dunn (JIRA)
Kyle R Dunn created HAWQ-1398:
-

 Summary: Warnings in gpfs_hdfs_disconnect() reference incorrect 
originating function
 Key: HAWQ-1398
 URL: https://issues.apache.org/jira/browse/HAWQ-1398
 Project: Apache HAWQ
  Issue Type: Bug
  Components: Core
Reporter: Kyle R Dunn
Assignee: Ed Espino
 Fix For: backlog


Both warning messages in the {{gpfs_hdfs_disconnect()}} function reference the 
wrong function, {{gpfs_hdfs_openfile()}}.

[Line 174 | 
https://github.com/apache/incubator-hawq/blob/master/src/bin/gpfilesystem/hdfs/gpfshdfs.c#L174]

[Line 183 | 
https://github.com/apache/incubator-hawq/blob/master/src/bin/gpfilesystem/hdfs/gpfshdfs.c#L183]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HAWQ-1270) Plugged storage back-ends for HAWQ

2017-03-13 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901627#comment-15901627
 ] 

Kyle R Dunn edited comment on HAWQ-1270 at 3/14/17 12:58 AM:
-

>From what I can tell, [this | 
>https://github.com/apache/incubator-hawq/blob/master/src/bin/gpfilesystem/hdfs/gpfshdfs.c]
> IS the interface.

When you look at the {{pg_filesystem}} table, it lists the exact functions 
required for a new backend:
{code}
SELECT * from pg_filesystem ;
-[ RECORD 1 ]--+--
fsysname   | hdfs
fsysconnfn | gpfs_hdfs_connect
fsysdisconnfn  | gpfs_hdfs_disconnect
fsysopenfn | gpfs_hdfs_openfile
fsysclosefn| gpfs_hdfs_closefile
fsysseekfn | gpfs_hdfs_seek
fsystellfn | gpfs_hdfs_tell
fsysreadfn | gpfs_hdfs_read
fsyswritefn| gpfs_hdfs_write
fsysflushfn| gpfs_hdfs_sync
fsysdeletefn   | gpfs_hdfs_delete
fsyschmodfn| gpfs_hdfs_chmod
fsysmkdirfn| gpfs_hdfs_createdirectory
fsystruncatefn | gpfs_hdfs_truncate
fsysgetpathinfofn  | gpfs_hdfs_getpathinfo
fsysfreefileinfofn | gpfs_hdfs_freefileinfo
fsyslibfile| $libdir/gpfshdfs.so
fsysowner  | 10
fsystrusted| f
fsysacl|
{code}


was (Author: kdunn926):
>From what I can tell, [this | 
>https://github.com/apache/incubator-hawq/blob/master/src/bin/gpfilesystem/hdfs/gpfshdfs.c]
> IS the interface.

When you look at the {{pg_filesystems}} table, it lists the exact functions 
required for a new backend:
{code}
SELECT * from pg_filesystem ;
-[ RECORD 1 ]--+--
fsysname   | hdfs
fsysconnfn | gpfs_hdfs_connect
fsysdisconnfn  | gpfs_hdfs_disconnect
fsysopenfn | gpfs_hdfs_openfile
fsysclosefn| gpfs_hdfs_closefile
fsysseekfn | gpfs_hdfs_seek
fsystellfn | gpfs_hdfs_tell
fsysreadfn | gpfs_hdfs_read
fsyswritefn| gpfs_hdfs_write
fsysflushfn| gpfs_hdfs_sync
fsysdeletefn   | gpfs_hdfs_delete
fsyschmodfn| gpfs_hdfs_chmod
fsysmkdirfn| gpfs_hdfs_createdirectory
fsystruncatefn | gpfs_hdfs_truncate
fsysgetpathinfofn  | gpfs_hdfs_getpathinfo
fsysfreefileinfofn | gpfs_hdfs_freefileinfo
fsyslibfile| $libdir/gpfshdfs.so
fsysowner  | 10
fsystrusted| f
fsysacl|
{code}

> Plugged storage back-ends for HAWQ
> --
>
> Key: HAWQ-1270
> URL: https://issues.apache.org/jira/browse/HAWQ-1270
> Project: Apache HAWQ
>  Issue Type: Improvement
>Reporter: Dmitry Buzolin
>Assignee: Ed Espino
>
> Since HAWQ only depends on Hadoop and Parquet for columnar format support, I 
> would like to propose pluggable storage backend design for Hawq. Hadoop is 
> already supported but there is Ceph -  a distributed, storage system which 
> offers standard Posix compliant file system, object and a block storage. Ceph 
> is also data location aware, written in C++. and is more sophisticated 
> storage backend compare to Hadoop at this time. It provides replicated and 
> erasure encoded storage pools, Other great features of Ceph are: snapshots 
> and an algorithmic approach to map data to the nodes rather than having 
> centrally managed namenodes. I don't think HDFS offers any of these features. 
> In terms of performance, Ceph should be faster than HFDS since it is written 
> on C++ and because it doesn't have scalability limitations when mapping data 
> to storage pools, compare to Hadoop, where name node is such point of 
> contention.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HAWQ-1270) Plugged storage back-ends for HAWQ

2017-03-10 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901627#comment-15901627
 ] 

Kyle R Dunn edited comment on HAWQ-1270 at 3/10/17 9:27 PM:


>From what I can tell, [this | 
>https://github.com/apache/incubator-hawq/blob/master/src/bin/gpfilesystem/hdfs/gpfshdfs.c]
> IS the interface.

When you look at the {{pg_filesystems}} table, it lists the exact functions 
required for a new backend:
{code}
SELECT * from pg_filesystem ;
-[ RECORD 1 ]--+--
fsysname   | hdfs
fsysconnfn | gpfs_hdfs_connect
fsysdisconnfn  | gpfs_hdfs_disconnect
fsysopenfn | gpfs_hdfs_openfile
fsysclosefn| gpfs_hdfs_closefile
fsysseekfn | gpfs_hdfs_seek
fsystellfn | gpfs_hdfs_tell
fsysreadfn | gpfs_hdfs_read
fsyswritefn| gpfs_hdfs_write
fsysflushfn| gpfs_hdfs_sync
fsysdeletefn   | gpfs_hdfs_delete
fsyschmodfn| gpfs_hdfs_chmod
fsysmkdirfn| gpfs_hdfs_createdirectory
fsystruncatefn | gpfs_hdfs_truncate
fsysgetpathinfofn  | gpfs_hdfs_getpathinfo
fsysfreefileinfofn | gpfs_hdfs_freefileinfo
fsyslibfile| $libdir/gpfshdfs.so
fsysowner  | 10
fsystrusted| f
fsysacl|
{code}


was (Author: kdunn926):
>From what I can tell, [this | 
>https://github.com/apache/incubator-hawq/blob/master/src/bin/gpfilesystem/hdfs/gpfshdfs.c]
> IS the interface.

When you look at the {{pg_filesystems}} table, it lists the exact functions 
requires for a new backend:
{code}
SELECT * from pg_filesystem ;
-[ RECORD 1 ]--+--
fsysname   | hdfs
fsysconnfn | gpfs_hdfs_connect
fsysdisconnfn  | gpfs_hdfs_disconnect
fsysopenfn | gpfs_hdfs_openfile
fsysclosefn| gpfs_hdfs_closefile
fsysseekfn | gpfs_hdfs_seek
fsystellfn | gpfs_hdfs_tell
fsysreadfn | gpfs_hdfs_read
fsyswritefn| gpfs_hdfs_write
fsysflushfn| gpfs_hdfs_sync
fsysdeletefn   | gpfs_hdfs_delete
fsyschmodfn| gpfs_hdfs_chmod
fsysmkdirfn| gpfs_hdfs_createdirectory
fsystruncatefn | gpfs_hdfs_truncate
fsysgetpathinfofn  | gpfs_hdfs_getpathinfo
fsysfreefileinfofn | gpfs_hdfs_freefileinfo
fsyslibfile| $libdir/gpfshdfs.so
fsysowner  | 10
fsystrusted| f
fsysacl|
{code}

> Plugged storage back-ends for HAWQ
> --
>
> Key: HAWQ-1270
> URL: https://issues.apache.org/jira/browse/HAWQ-1270
> Project: Apache HAWQ
>  Issue Type: Improvement
>Reporter: Dmitry Buzolin
>Assignee: Ed Espino
>
> Since HAWQ only depends on Hadoop and Parquet for columnar format support, I 
> would like to propose pluggable storage backend design for Hawq. Hadoop is 
> already supported but there is Ceph -  a distributed, storage system which 
> offers standard Posix compliant file system, object and a block storage. Ceph 
> is also data location aware, written in C++. and is more sophisticated 
> storage backend compare to Hadoop at this time. It provides replicated and 
> erasure encoded storage pools, Other great features of Ceph are: snapshots 
> and an algorithmic approach to map data to the nodes rather than having 
> centrally managed namenodes. I don't think HDFS offers any of these features. 
> In terms of performance, Ceph should be faster than HFDS since it is written 
> on C++ and because it doesn't have scalability limitations when mapping data 
> to storage pools, compare to Hadoop, where name node is such point of 
> contention.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAWQ-1270) Plugged storage back-ends for HAWQ

2017-03-08 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901627#comment-15901627
 ] 

Kyle R Dunn commented on HAWQ-1270:
---

>From what I can tell, [this | 
>https://github.com/apache/incubator-hawq/blob/master/src/bin/gpfilesystem/hdfs/gpfshdfs.c]
> IS the interface.

When you look at the {{pg_filesystems}} table, it lists the exact functions 
requires for a new backend:
{code}
SELECT * from pg_filesystem ;
-[ RECORD 1 ]--+--
fsysname   | hdfs
fsysconnfn | gpfs_hdfs_connect
fsysdisconnfn  | gpfs_hdfs_disconnect
fsysopenfn | gpfs_hdfs_openfile
fsysclosefn| gpfs_hdfs_closefile
fsysseekfn | gpfs_hdfs_seek
fsystellfn | gpfs_hdfs_tell
fsysreadfn | gpfs_hdfs_read
fsyswritefn| gpfs_hdfs_write
fsysflushfn| gpfs_hdfs_sync
fsysdeletefn   | gpfs_hdfs_delete
fsyschmodfn| gpfs_hdfs_chmod
fsysmkdirfn| gpfs_hdfs_createdirectory
fsystruncatefn | gpfs_hdfs_truncate
fsysgetpathinfofn  | gpfs_hdfs_getpathinfo
fsysfreefileinfofn | gpfs_hdfs_freefileinfo
fsyslibfile| $libdir/gpfshdfs.so
fsysowner  | 10
fsystrusted| f
fsysacl|
{code}

> Plugged storage back-ends for HAWQ
> --
>
> Key: HAWQ-1270
> URL: https://issues.apache.org/jira/browse/HAWQ-1270
> Project: Apache HAWQ
>  Issue Type: Improvement
>Reporter: Dmitry Buzolin
>Assignee: Ed Espino
>
> Since HAWQ only depends on Hadoop and Parquet for columnar format support, I 
> would like to propose pluggable storage backend design for Hawq. Hadoop is 
> already supported but there is Ceph -  a distributed, storage system which 
> offers standard Posix compliant file system, object and a block storage. Ceph 
> is also data location aware, written in C++. and is more sophisticated 
> storage backend compare to Hadoop at this time. It provides replicated and 
> erasure encoded storage pools, Other great features of Ceph are: snapshots 
> and an algorithmic approach to map data to the nodes rather than having 
> centrally managed namenodes. I don't think HDFS offers any of these features. 
> In terms of performance, Ceph should be faster than HFDS since it is written 
> on C++ and because it doesn't have scalability limitations when mapping data 
> to storage pools, compare to Hadoop, where name node is such point of 
> contention.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAWQ-1270) Plugged storage back-ends for HAWQ

2017-03-03 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15894998#comment-15894998
 ] 

Kyle R Dunn commented on HAWQ-1270:
---

This is a possible route for Ceph in particular: [cephfs-hadoop | 
https://github.com/ceph/cephfs-hadoop]

> Plugged storage back-ends for HAWQ
> --
>
> Key: HAWQ-1270
> URL: https://issues.apache.org/jira/browse/HAWQ-1270
> Project: Apache HAWQ
>  Issue Type: Improvement
>Reporter: Dmitry Buzolin
>Assignee: Ed Espino
>
> Since HAWQ only depends on Hadoop and Parquet for columnar format support, I 
> would like to propose pluggable storage backend design for Hawq. Hadoop is 
> already supported but there is Ceph -  a distributed, storage system which 
> offers standard Posix compliant file system, object and a block storage. Ceph 
> is also data location aware, written in C++. and is more sophisticated 
> storage backend compare to Hadoop at this time. It provides replicated and 
> erasure encoded storage pools, Other great features of Ceph are: snapshots 
> and an algorithmic approach to map data to the nodes rather than having 
> centrally managed namenodes. I don't think HDFS offers any of these features. 
> In terms of performance, Ceph should be faster than HFDS since it is written 
> on C++ and because it doesn't have scalability limitations when mapping data 
> to storage pools, compare to Hadoop, where name node is such point of 
> contention.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAWQ-760) Hawq register

2017-03-02 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892440#comment-15892440
 ] 

Kyle R Dunn commented on HAWQ-760:
--

Well this is arguably one of the best "two birds with one stone" features I've 
seen in a while. Not only does {{hawq register}} enable straightforward DR 
recovery (the original intent), this also means HAWQ 1.x to 2.x upgrades are 
about as painless as possible... we don't need to dump and reload!

Thanks again for all the great work here!

> Hawq register
> -
>
> Key: HAWQ-760
> URL: https://issues.apache.org/jira/browse/HAWQ-760
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: Command Line Tools
>Reporter: Yangcheng Luo
>Assignee: Lili Ma
> Fix For: backlog
>
>
> Scenario: 
> 1. Register a parquet file generated by other systems, such as Hive, Spark, 
> etc.
> 2. For cluster Disaster Recovery. Two clusters co-exist, periodically import 
> data from Cluster A to Cluster B. Need Register data to Cluster B.
> 3. For the rollback of table. Do checkpoints somewhere, and need to rollback 
> to previous checkpoint. 
> Usage1
> Description
> Register a file/folder to an existing table. Can register a file or a folder. 
> If we register a file, can specify eof of this file. If eof not specified, 
> directly use actual file size. If we register a folder, directly use actual 
> file size.
> hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-f 
> filepath] [-e eof]
> Usage 2
> Description
> Register according to .yml configuration file. 
> hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-c 
> config] [--force][--repair]  
> Behavior:
> 1. If table doesn't exist, will automatically create the table and register 
> the files in .yml configuration file. Will use the filesize specified in .yml 
> to update the catalog table. 
> 2. If table already exist, and neither --force nor --repair configured. Do 
> not create any table, and directly register the files specified in .yml file 
> to the table. Note that if the file is under table directory in HDFS, will 
> throw error, say, to-be-registered files should not under the table path.
> 3. If table already exist, and --force is specified. Will clear all the 
> catalog contents in pg_aoseg.pg_paqseg_$relid while keep the files on HDFS, 
> and then re-register all the files to the table.  This is for scenario 2.
> 4. If table already exist, and --repair is specified. Will change both file 
> folder and catalog table pg_aoseg.pg_paqseg_$relid to the state which .yml 
> file configures. Note may some new generated files since the checkpoint may 
> be deleted here. Also note the all the files in .yml file should all under 
> the table folder on HDFS. Limitation: Do not support cases for hash table 
> redistribution, table truncate and table drop. This is for scenario 3.
> Requirements for both the cases:
> 1. To be registered file path has to colocate with HAWQ in the same HDFS 
> cluster.
> 2. If to be registered is a hash table, the registered file number should be 
> one or multiple times or hash table bucket number.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HAWQ-326) Support RPM build for HAWQ

2017-03-02 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889199#comment-15889199
 ] 

Kyle R Dunn edited comment on HAWQ-326 at 3/2/17 3:45 PM:
--

I've done some initial work on this.

After compiling HAWQ from source and running {{make install}}, with the 
{{rpmbuild}} utility installed, perform the following steps:
{code}
$ mkdir -p ~/RPMBUILD/{hawq,SPECS}
$ cd /usr/local
$ tar cjf ~/RPMBUILD/hawq/hawq-2.1.0.0-rc4.tar.bz2 hawq

$ cd ~/RPMBUILD
$ rpmbuild -bb SPECS/hawq-2.1.0.0-rc4.spec
{code}

where the above RPM SPEC file contains the following:
{code}
# Don't try fancy stuff like debuginfo, which is useless on binary-only
# packages. Don't strip binary too
# Be sure buildpolicy set to do nothing
%define__spec_install_post %{nil}
%define  debug_package %{nil}
%define__os_install_post %{_dbpath}/brp-compress
%define _unpackaged_files_terminate_build 0

Summary: Apache HAWQ
Name: hawq
Version: 2.1.0.0
Release: rc4
License: Apache 2.0
Group: Development/Tools
SOURCE0 : %{name}-%{version}-%{release}.tar.bz2
URL: https://hawq.incubator.apache.org

%define prefix /usr/local/
%define deploy_dir %{prefix}%{name}

%description
%{summary}

%prep
%setup -n %{name}

%install
rm -rf $RPM_BUILD_ROOT
mkdir -p $RPM_BUILD_ROOT%{deploy_dir}
cp -aR * $RPM_BUILD_ROOT%{deploy_dir}

# Just in case env isn't fresh
userdel -rf gpadmin 2> /dev/null || true

useradd -m gpadmin

chown -R gpadmin:gpadmin %{deploy_dir}

%files
%defattr(-,root,root,-)
%{deploy_dir}/greenplum_path.sh
%{deploy_dir}/bin
%{deploy_dir}/sbin
%{deploy_dir}/docs
%{deploy_dir}/etc
%{deploy_dir}/include
%{deploy_dir}/lib
%{deploy_dir}/share
{code}



was (Author: kdunn926):
I've done some initial work on this.

After compiling HAWQ from source and running {{make install}}, with the 
{{rpmbuild}} utility installed, perform the following steps:
{code}
$ mkdir -p ~/RPMBUILD/{hawq,SPECS}
$ cd /usr/local
$ tar cjf ~/RPMBUILD/hawq/hawq-2.1.0.0-rc4.tar.bz2 hawq

$ cd ~/RPMBUILD
$ rpmbuild -bb SPECS/hawq-2.1.0.0-rc4.spec
{code}

where the above RPM SPEC file contains the following:
{code}
# Don't try fancy stuff like debuginfo, which is useless on binary-only
# packages. Don't strip binary too
# Be sure buildpolicy set to do nothing
%define__spec_install_post %{nil}
%define  debug_package %{nil}
%define__os_install_post %{_dbpath}/brp-compress
%define _unpackaged_files_terminate_build 0

Summary: Apache HAWQ
Name: hawq
Version: 2.1.0.0
Release: rc4
License: Apache 2.0
Group: Development/Tools
SOURCE0 : %{name}-%{version}-%{release}.tar.bz2
URL: https://hawq.incubator.apache.org

%define prefix /usr/local/
%define deploy_dir %{prefix}%{name}

%description
%{summary}

%prep
%setup -n %{name}

%install
rm -rf $RPM_BUILD_ROOT
mkdir -p $RPM_BUILD_ROOT%{deploy_dir}
cp -aR * $RPM_BUILD_ROOT%{deploy_dir}

useradd -m gpadmin

chown -R gpadmin:gpadmin /usr/local/%{installdir}

%files
%defattr(-,root,root,-)
%{deploy_dir}/greenplum_path.sh
%{deploy_dir}/bin
%{deploy_dir}/sbin
%{deploy_dir}/docs
%{deploy_dir}/etc
%{deploy_dir}/include
%{deploy_dir}/lib
%{deploy_dir}/share
{code}


> Support RPM build for HAWQ
> --
>
> Key: HAWQ-326
> URL: https://issues.apache.org/jira/browse/HAWQ-326
> Project: Apache HAWQ
>  Issue Type: Wish
>  Components: Build
>Reporter: Lei Chang
>Assignee: Paul Guo
> Fix For: 2.2.0.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAWQ-760) Hawq register

2017-03-01 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890537#comment-15890537
 ] 

Kyle R Dunn commented on HAWQ-760:
--

[~lilima] - How does hawq register change or handle trying to register files 
from different versions where the catalog could have changed? i.e. what would 
happen if I try to register tables from hawq 1.x into hawq 2.x?

> Hawq register
> -
>
> Key: HAWQ-760
> URL: https://issues.apache.org/jira/browse/HAWQ-760
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: Command Line Tools
>Reporter: Yangcheng Luo
>Assignee: Lili Ma
> Fix For: backlog
>
>
> Scenario: 
> 1. Register a parquet file generated by other systems, such as Hive, Spark, 
> etc.
> 2. For cluster Disaster Recovery. Two clusters co-exist, periodically import 
> data from Cluster A to Cluster B. Need Register data to Cluster B.
> 3. For the rollback of table. Do checkpoints somewhere, and need to rollback 
> to previous checkpoint. 
> Usage1
> Description
> Register a file/folder to an existing table. Can register a file or a folder. 
> If we register a file, can specify eof of this file. If eof not specified, 
> directly use actual file size. If we register a folder, directly use actual 
> file size.
> hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-f 
> filepath] [-e eof]
> Usage 2
> Description
> Register according to .yml configuration file. 
> hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-c 
> config] [--force][--repair]  
> Behavior:
> 1. If table doesn't exist, will automatically create the table and register 
> the files in .yml configuration file. Will use the filesize specified in .yml 
> to update the catalog table. 
> 2. If table already exist, and neither --force nor --repair configured. Do 
> not create any table, and directly register the files specified in .yml file 
> to the table. Note that if the file is under table directory in HDFS, will 
> throw error, say, to-be-registered files should not under the table path.
> 3. If table already exist, and --force is specified. Will clear all the 
> catalog contents in pg_aoseg.pg_paqseg_$relid while keep the files on HDFS, 
> and then re-register all the files to the table.  This is for scenario 2.
> 4. If table already exist, and --repair is specified. Will change both file 
> folder and catalog table pg_aoseg.pg_paqseg_$relid to the state which .yml 
> file configures. Note may some new generated files since the checkpoint may 
> be deleted here. Also note the all the files in .yml file should all under 
> the table folder on HDFS. Limitation: Do not support cases for hash table 
> redistribution, table truncate and table drop. This is for scenario 3.
> Requirements for both the cases:
> 1. To be registered file path has to colocate with HAWQ in the same HDFS 
> cluster.
> 2. If to be registered is a hash table, the registered file number should be 
> one or multiple times or hash table bucket number.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HAWQ-8) Installing the HAWQ Software thru the Apache Ambari

2017-03-01 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890288#comment-15890288
 ] 

Kyle R Dunn edited comment on HAWQ-8 at 3/1/17 2:38 PM:


I think we should aim to have the build system be OS agnostic. 

I was already able to successfully compile for SLES 11.4. The plan is to 
capture it all in a Dockerfile, then try to replicate for other SLES versions, 
and ultimately Ubuntu.

We also need to think about how to handle run library dependencies - one 
solution is to bundle them in {{/usr/local/hawq/lib}} with everything else - 
but that would imply we have a particular prefix at compile-time for things 
like Boost, YAML, Thrift, etc.


was (Author: kdunn926):
I think we should aim to have the build system be OS agnostic. 

I was already able to successfully compile for SLES 11.4. The plan is to 
capture it all in a Dockerfile, then try to replicate for other SLES versions, 
and ultimately Ubuntu.

We also need to think about how to handle run library dependencies - one 
solution is to bundle them in `/usr/local/hawq/lib` with everything else - but 
that would imply we have a particular prefix at compile-time for things like 
Boost, YAML, Thrift, etc.

> Installing the HAWQ Software thru the Apache Ambari 
> 
>
> Key: HAWQ-8
> URL: https://issues.apache.org/jira/browse/HAWQ-8
> Project: Apache HAWQ
>  Issue Type: Wish
>  Components: Ambari
> Environment: CentOS
>Reporter: Vijayakumar Ramdoss
>Assignee: Alexander Denissov
> Fix For: backlog
>
> Attachments: 1Le8tdm[1]
>
>
> In order to integrate with the Hadoop system, We would have to install the 
> HAWQ software thru Ambari.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAWQ-8) Installing the HAWQ Software thru the Apache Ambari

2017-03-01 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890288#comment-15890288
 ] 

Kyle R Dunn commented on HAWQ-8:


I think we should aim to have the build system be OS agnostic. 

I was already able to successfully compile for SLES 11.4. The plan is to 
capture it all in a Dockerfile, then try to replicate for other SLES versions, 
and ultimately Ubuntu.

We also need to think about how to handle run library dependencies - one 
solution is to bundle them in `/usr/local/hawq/lib` with everything else - but 
that would imply we have a particular prefix at compile-time for things like 
Boost, YAML, Thrift, etc.

> Installing the HAWQ Software thru the Apache Ambari 
> 
>
> Key: HAWQ-8
> URL: https://issues.apache.org/jira/browse/HAWQ-8
> Project: Apache HAWQ
>  Issue Type: Wish
>  Components: Ambari
> Environment: CentOS
>Reporter: Vijayakumar Ramdoss
>Assignee: Alexander Denissov
> Fix For: backlog
>
> Attachments: 1Le8tdm[1]
>
>
> In order to integrate with the Hadoop system, We would have to install the 
> HAWQ software thru Ambari.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAWQ-326) Support RPM build for HAWQ

2017-03-01 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890278#comment-15890278
 ] 

Kyle R Dunn commented on HAWQ-326:
--

PXF can be built by performing the following:

{code}
cd incubator-hawq-source-dir/pxf
make tomcat
make rpm
{code}

The resulting PXF RPMs will be in 
{{incubator-hawq-source-dir/pxf/build/distributions}} and for tomcat: 
{{incubator-hawq-source-dir/pxf/distributions}}

> Support RPM build for HAWQ
> --
>
> Key: HAWQ-326
> URL: https://issues.apache.org/jira/browse/HAWQ-326
> Project: Apache HAWQ
>  Issue Type: Wish
>  Components: Build
>Reporter: Lei Chang
>Assignee: Paul Guo
> Fix For: 2.2.0.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HAWQ-326) Support RPM build for HAWQ

2017-03-01 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889199#comment-15889199
 ] 

Kyle R Dunn edited comment on HAWQ-326 at 3/1/17 2:31 PM:
--

I've done some initial work on this.

After compiling HAWQ from source and running {{make install}}, with the 
{{rpmbuild}} utility installed, perform the following steps:
{code}
$ mkdir -p ~/RPMBUILD/{hawq,SPECS}
$ cd /usr/local
$ tar cjf ~/RPMBUILD/hawq/hawq-2.1.0.0-rc4.tar.bz2 hawq

$ cd ~/RPMBUILD
$ rpmbuild -bb SPECS/hawq-2.1.0.0-rc4.spec
{code}

where the above RPM SPEC file contains the following:
{code}
# Don't try fancy stuff like debuginfo, which is useless on binary-only
# packages. Don't strip binary too
# Be sure buildpolicy set to do nothing
%define__spec_install_post %{nil}
%define  debug_package %{nil}
%define__os_install_post %{_dbpath}/brp-compress
%define _unpackaged_files_terminate_build 0

Summary: Apache HAWQ
Name: hawq
Version: 2.1.0.0
Release: rc4
License: Apache 2.0
Group: Development/Tools
SOURCE0 : %{name}-%{version}-%{release}.tar.bz2
URL: https://hawq.incubator.apache.org

%define installdir hawq

BuildRoot: %{_tmppath}/%{name}

%description
%{summary}

%prep
%setup -n %{installdir}

rm -rf /usr/local/%{installdir}
mkdir /usr/local/%{installdir}

# in buildroot
cp -ra * /usr/local/%{installdir}/

useradd -m gpadmin

chown -R gpadmin:gpadmin /usr/local/%{installdir}

%clean
rm -rf %{buildroot}


%files
%defattr(-,root,root,-)
/greenplum_path.sh
/bin
/sbin
/docs
/etc
/include
/lib
/share
{code}

Note, we need to add steps to create the {{gpadmin}} user and ensure 
installation directory permissions are the correct owner and mode.


was (Author: kdunn926):
I've done some initial work on this.

After compiling HAWQ from source and running {{make install}}, with the 
{{rpmbuild}} utility installed, perform the following steps:
{code}
$ mkdir -p ~/RPMBUILD/{hawq,SPECS}
$ cd /usr/local
$ tar cjf ~/RPMBUILD/hawq/hawq-2.1.0.0-rc4.tar.bz2 hawq

$ cd ~/RPMBUILD
$ rpmbuild -bb SPECS/hawq-2.1.0.0-rc4.spec
{code}

where the above RPM SPEC file contains the following:
{code}
# Don't try fancy stuff like debuginfo, which is useless on binary-only
# packages. Don't strip binary too
# Be sure buildpolicy set to do nothing
%define__spec_install_post %{nil}
%define  debug_package %{nil}
%define__os_install_post %{_dbpath}/brp-compress
%define _unpackaged_files_terminate_build 0

Summary: Apache HAWQ
Name: hawq
Version: 2.1.0.0
Release: rc4
License: Apache 2.0
Group: Development/Tools
SOURCE0 : %{name}-%{version}-%{release}.tar.bz2
URL: https://hawq.incubator.apache.org

%define installdir hawq

BuildRoot: %{_tmppath}/%{name}

%description
%{summary}

%prep
%setup -n %{installdir}

#%build
# Empty section.

%install
rm -rf /usr/local/%{installdir}
mkdir /usr/local/%{installdir}

# in buildroot
cp -ra * /usr/local/%{installdir}/


%clean
rm -rf %{buildroot}


%files
%defattr(-,root,root,-)
/greenplum_path.sh
/bin
/sbin
/docs
/etc
/include
/lib
/share
{code}

Note, we need to add steps to create the {{gpadmin}} user and ensure 
installation directory permissions are the correct owner and mode.

> Support RPM build for HAWQ
> --
>
> Key: HAWQ-326
> URL: https://issues.apache.org/jira/browse/HAWQ-326
> Project: Apache HAWQ
>  Issue Type: Wish
>  Components: Build
>Reporter: Lei Chang
>Assignee: Paul Guo
> Fix For: 2.2.0.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAWQ-303) Index support for non-heap tables

2017-02-28 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889214#comment-15889214
 ] 

Kyle R Dunn commented on HAWQ-303:
--

Once we have support for indexes, high-performance PostGIS on HAWQ becomes a 
very compelling differentiating feature.

> Index support for non-heap tables
> -
>
> Key: HAWQ-303
> URL: https://issues.apache.org/jira/browse/HAWQ-303
> Project: Apache HAWQ
>  Issue Type: Wish
>  Components: Storage
>Reporter: Lei Chang
>Assignee: Lili Ma
> Fix For: 3.0.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAWQ-303) Index support for non-heap tables

2017-02-28 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889211#comment-15889211
 ] 

Kyle R Dunn commented on HAWQ-303:
--

I'm wondering if/how we can prioritize this? Is it accurate that the feature is 
targeted for 3.0.0.0?

> Index support for non-heap tables
> -
>
> Key: HAWQ-303
> URL: https://issues.apache.org/jira/browse/HAWQ-303
> Project: Apache HAWQ
>  Issue Type: Wish
>  Components: Storage
>Reporter: Lei Chang
>Assignee: Lili Ma
> Fix For: 3.0.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAWQ-98) Moving HAWQ docker file into code base

2017-02-28 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889207#comment-15889207
 ] 

Kyle R Dunn commented on HAWQ-98:
-

Looks like we can close this as the Docker bits are in 
[master|https://github.com/apache/incubator-hawq/tree/master/contrib/hawq-docker].

> Moving HAWQ docker file into code base
> --
>
> Key: HAWQ-98
> URL: https://issues.apache.org/jira/browse/HAWQ-98
> Project: Apache HAWQ
>  Issue Type: Wish
>Reporter: Goden Yao
>Assignee: Roman Shaposhnik
> Fix For: 2.2.0.0-incubating
>
>
> We have a pre-built docker image (check [HAWQ build & 
> install|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61320026]
>  sitting outside the codebase.
> It should be incorporated in the Apache git and maintained by the community.
> Proposed location is to create a  folder under root



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HAWQ-326) Support RPM build for HAWQ

2017-02-28 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889199#comment-15889199
 ] 

Kyle R Dunn edited comment on HAWQ-326 at 3/1/17 12:29 AM:
---

I've done some initial work on this.

After compiling HAWQ from source and running {{make install}}, with the 
{{rpmbuild}} utility installed, perform the following steps:
{code}
$ mkdir -p ~/RPMBUILD/hawq
$ cd /usr/local
$ tar cjf ~/RPMBUILD/hawq/hawq-2.1.0.0-rc4.tar.bz2 hawq

$ rpmbuild -bb SPECS/hawq-2.1.0.0-rc4.spec
{code}

where the above RPM SPEC file contains the following:
{code}
# Don't try fancy stuff like debuginfo, which is useless on binary-only
# packages. Don't strip binary too
# Be sure buildpolicy set to do nothing
%define__spec_install_post %{nil}
%define  debug_package %{nil}
%define__os_install_post %{_dbpath}/brp-compress
%define _unpackaged_files_terminate_build 0

Summary: Apache HAWQ
Name: hawq
Version: 2.1.0.0
Release: rc4
License: Apache 2.0
Group: Development/Tools
SOURCE0 : %{name}-%{version}-%{release}.tar.bz2
URL: https://hawq.incubator.apache.org

%define installdir hawq

BuildRoot: %{_tmppath}/%{name}

%description
%{summary}

%prep
%setup -n %{installdir}

#%build
# Empty section.

%install
rm -rf /usr/local/%{installdir}
mkdir /usr/local/%{installdir}

# in buildroot
cp -ra * /usr/local/%{installdir}/


%clean
rm -rf %{buildroot}


%files
%defattr(-,root,root,-)
/greenplum_path.sh
/bin
/sbin
/docs
/etc
/include
/lib
/share
{code}

Note, we need to add steps to create the {{gpadmin}} user and ensure 
installation directory permissions are the correct owner and mode.


was (Author: kdunn926):
I've done some initial work on this.

After compiling HAWQ from source and running {{make install}}, with the 
{{rpmbuild}} utility installed, perform the following steps:
{code}
$ mkdir -p ~/RPMBUILD/hawq
$ cd /usr/local
$ tar cjf ~/RPMBUILD/hawq/hawq-2.1.0.0-rc4.tar.bz2 hawq )

$ rpmbuild -bb SPECS/hawq-2.1.0.0-rc4.spec
{code}

where the above RPM SPEC file contains the following:
{code}
# Don't try fancy stuff like debuginfo, which is useless on binary-only
# packages. Don't strip binary too
# Be sure buildpolicy set to do nothing
%define__spec_install_post %{nil}
%define  debug_package %{nil}
%define__os_install_post %{_dbpath}/brp-compress
%define _unpackaged_files_terminate_build 0

Summary: Apache HAWQ
Name: hawq
Version: 2.1.0.0
Release: rc4
License: Apache 2.0
Group: Development/Tools
SOURCE0 : %{name}-%{version}-%{release}.tar.bz2
URL: https://hawq.incubator.apache.org

%define installdir hawq

BuildRoot: %{_tmppath}/%{name}

%description
%{summary}

%prep
%setup -n %{installdir}

#%build
# Empty section.

%install
rm -rf /usr/local/%{installdir}
mkdir /usr/local/%{installdir}

# in buildroot
cp -ra * /usr/local/%{installdir}/


%clean
rm -rf %{buildroot}


%files
%defattr(-,root,root,-)
/greenplum_path.sh
/bin
/sbin
/docs
/etc
/include
/lib
/share
{code}

Note, we need to add steps to create the {{gpadmin}} user and ensure 
installation directory permissions are the correct owner and mode.

> Support RPM build for HAWQ
> --
>
> Key: HAWQ-326
> URL: https://issues.apache.org/jira/browse/HAWQ-326
> Project: Apache HAWQ
>  Issue Type: Wish
>  Components: Build
>Reporter: Lei Chang
>Assignee: Paul Guo
> Fix For: 2.2.0.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HAWQ-326) Support RPM build for HAWQ

2017-02-28 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889199#comment-15889199
 ] 

Kyle R Dunn edited comment on HAWQ-326 at 3/1/17 12:30 AM:
---

I've done some initial work on this.

After compiling HAWQ from source and running {{make install}}, with the 
{{rpmbuild}} utility installed, perform the following steps:
{code}
$ mkdir -p ~/RPMBUILD/{hawq,SPECS}
$ cd /usr/local
$ tar cjf ~/RPMBUILD/hawq/hawq-2.1.0.0-rc4.tar.bz2 hawq

$ cd ~/RPMBUILD
$ rpmbuild -bb SPECS/hawq-2.1.0.0-rc4.spec
{code}

where the above RPM SPEC file contains the following:
{code}
# Don't try fancy stuff like debuginfo, which is useless on binary-only
# packages. Don't strip binary too
# Be sure buildpolicy set to do nothing
%define__spec_install_post %{nil}
%define  debug_package %{nil}
%define__os_install_post %{_dbpath}/brp-compress
%define _unpackaged_files_terminate_build 0

Summary: Apache HAWQ
Name: hawq
Version: 2.1.0.0
Release: rc4
License: Apache 2.0
Group: Development/Tools
SOURCE0 : %{name}-%{version}-%{release}.tar.bz2
URL: https://hawq.incubator.apache.org

%define installdir hawq

BuildRoot: %{_tmppath}/%{name}

%description
%{summary}

%prep
%setup -n %{installdir}

#%build
# Empty section.

%install
rm -rf /usr/local/%{installdir}
mkdir /usr/local/%{installdir}

# in buildroot
cp -ra * /usr/local/%{installdir}/


%clean
rm -rf %{buildroot}


%files
%defattr(-,root,root,-)
/greenplum_path.sh
/bin
/sbin
/docs
/etc
/include
/lib
/share
{code}

Note, we need to add steps to create the {{gpadmin}} user and ensure 
installation directory permissions are the correct owner and mode.


was (Author: kdunn926):
I've done some initial work on this.

After compiling HAWQ from source and running {{make install}}, with the 
{{rpmbuild}} utility installed, perform the following steps:
{code}
$ mkdir -p ~/RPMBUILD/hawq
$ cd /usr/local
$ tar cjf ~/RPMBUILD/hawq/hawq-2.1.0.0-rc4.tar.bz2 hawq

$ rpmbuild -bb SPECS/hawq-2.1.0.0-rc4.spec
{code}

where the above RPM SPEC file contains the following:
{code}
# Don't try fancy stuff like debuginfo, which is useless on binary-only
# packages. Don't strip binary too
# Be sure buildpolicy set to do nothing
%define__spec_install_post %{nil}
%define  debug_package %{nil}
%define__os_install_post %{_dbpath}/brp-compress
%define _unpackaged_files_terminate_build 0

Summary: Apache HAWQ
Name: hawq
Version: 2.1.0.0
Release: rc4
License: Apache 2.0
Group: Development/Tools
SOURCE0 : %{name}-%{version}-%{release}.tar.bz2
URL: https://hawq.incubator.apache.org

%define installdir hawq

BuildRoot: %{_tmppath}/%{name}

%description
%{summary}

%prep
%setup -n %{installdir}

#%build
# Empty section.

%install
rm -rf /usr/local/%{installdir}
mkdir /usr/local/%{installdir}

# in buildroot
cp -ra * /usr/local/%{installdir}/


%clean
rm -rf %{buildroot}


%files
%defattr(-,root,root,-)
/greenplum_path.sh
/bin
/sbin
/docs
/etc
/include
/lib
/share
{code}

Note, we need to add steps to create the {{gpadmin}} user and ensure 
installation directory permissions are the correct owner and mode.

> Support RPM build for HAWQ
> --
>
> Key: HAWQ-326
> URL: https://issues.apache.org/jira/browse/HAWQ-326
> Project: Apache HAWQ
>  Issue Type: Wish
>  Components: Build
>Reporter: Lei Chang
>Assignee: Paul Guo
> Fix For: 2.2.0.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAWQ-326) Support RPM build for HAWQ

2017-02-28 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889199#comment-15889199
 ] 

Kyle R Dunn commented on HAWQ-326:
--

I've done some initial work on this.

After compiling HAWQ from source and running {{make install}}, with the 
{{rpmbuild}} utility installed, perform the following steps:
{code}
$ mkdir -p ~/RPMBUILD/hawq
$ cd /usr/local
$ tar cjf ~/RPMBUILD/hawq/hawq-2.1.0.0-rc4.tar.bz2 hawq )

$ rpmbuild -bb SPECS/hawq-2.1.0.0-rc4.spec
{code}

where the above RPM SPEC file contains the following:
{code}
# Don't try fancy stuff like debuginfo, which is useless on binary-only
# packages. Don't strip binary too
# Be sure buildpolicy set to do nothing
%define__spec_install_post %{nil}
%define  debug_package %{nil}
%define__os_install_post %{_dbpath}/brp-compress
%define _unpackaged_files_terminate_build 0

Summary: Apache HAWQ
Name: hawq
Version: 2.1.0.0
Release: rc4
License: Apache 2.0
Group: Development/Tools
SOURCE0 : %{name}-%{version}-%{release}.tar.bz2
URL: https://hawq.incubator.apache.org

%define installdir hawq

BuildRoot: %{_tmppath}/%{name}

%description
%{summary}

%prep
%setup -n %{installdir}

#%build
# Empty section.

%install
rm -rf /usr/local/%{installdir}
mkdir /usr/local/%{installdir}

# in buildroot
cp -ra * /usr/local/%{installdir}/


%clean
rm -rf %{buildroot}


%files
%defattr(-,root,root,-)
/greenplum_path.sh
/bin
/sbin
/docs
/etc
/include
/lib
/share
{code}

Note, we need to add steps to create the {{gpadmin}} user and ensure 
installation directory permissions are the correct owner and mode.

> Support RPM build for HAWQ
> --
>
> Key: HAWQ-326
> URL: https://issues.apache.org/jira/browse/HAWQ-326
> Project: Apache HAWQ
>  Issue Type: Wish
>  Components: Build
>Reporter: Lei Chang
>Assignee: Paul Guo
> Fix For: 2.2.0.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAWQ-8) Installing the HAWQ Software thru the Apache Ambari

2017-02-28 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889191#comment-15889191
 ] 

Kyle R Dunn commented on HAWQ-8:


It seems like a clear dependency that Ambari-only installation will require an 
RPM of HAWQ for both RHEL and SLES.

> Installing the HAWQ Software thru the Apache Ambari 
> 
>
> Key: HAWQ-8
> URL: https://issues.apache.org/jira/browse/HAWQ-8
> Project: Apache HAWQ
>  Issue Type: Wish
>  Components: Ambari
> Environment: CentOS
>Reporter: Vijayakumar Ramdoss
>Assignee: Alexander Denissov
> Fix For: backlog
>
> Attachments: 1Le8tdm[1]
>
>
> In order to integrate with the Hadoop system, We would have to install the 
> HAWQ software thru Ambari.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HAWQ-401) json type support

2017-02-28 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889182#comment-15889182
 ] 

Kyle R Dunn edited comment on HAWQ-401 at 3/1/17 12:20 AM:
---

[~lilima] - I'm wondering if we'll be able to incorporate the work done 
[here|https://github.com/greenplum-db/gpdb/pull/530] for JSON type support?


was (Author: kdunn926):
[~lilima] - I'm wondering if we'll be able to incorporate the work done 
[here](https://github.com/greenplum-db/gpdb/pull/530) for JSON type support?

> json type support
> -
>
> Key: HAWQ-401
> URL: https://issues.apache.org/jira/browse/HAWQ-401
> Project: Apache HAWQ
>  Issue Type: Wish
>  Components: Core
>Reporter: Lei Chang
>Assignee: Lei Chang
> Fix For: backlog
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAWQ-401) json type support

2017-02-28 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889182#comment-15889182
 ] 

Kyle R Dunn commented on HAWQ-401:
--

[~lilima] - I'm wondering if we'll be able to incorporate the work done 
[here](https://github.com/greenplum-db/gpdb/pull/530) for JSON type support?

> json type support
> -
>
> Key: HAWQ-401
> URL: https://issues.apache.org/jira/browse/HAWQ-401
> Project: Apache HAWQ
>  Issue Type: Wish
>  Components: Core
>Reporter: Lei Chang
>Assignee: Lei Chang
> Fix For: backlog
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HAWQ-1234) Document HAWQ to PXF APIs

2017-02-14 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866266#comment-15866266
 ] 

Kyle R Dunn edited comment on HAWQ-1234 at 2/14/17 6:03 PM:


Additionally, I observed all this using the following command, on the HAWQ 
master. First, start the {{tcpdump}} trace, then invoke a SELECT from a 
previously defined PXF table, either using HCatalog directly, or manually 
defining a PXF external table. A similar approach could be used to observe 
datanode traffic during read/write operations.

Watch all traffic on port 51200 (PXF default)
{code}
$ tcpdump -n port 51200 -A
{code}

Initiate a PXF query via HCatalog
{code}
# select * from hcatalog.default.kdtest ;
   key   |   value
-+---
 somekey | somevalue
 1234| 56789
 hello   | world
 aloha   | mondays
(4 rows)
{code}

Here is the conversation output:
{code}
21:20:14.632410 IP 127.0.0.1.20416 > 127.0.0.1.51200: Flags [S], seq 
3498721498, win 65483, options [mss 65495,sackOK,TS val 1901602390 ecr 
1901547904,nop,wscale 9], length 0
E.. 127.0.0.1.20416: Flags [S.], seq 
3752275736, ack 3498721499, win 65483, options [mss 65495,sackOK,TS val 
1901602390 ecr 1901602390,nop,wscale 9], length 0
E..<..@.@.<...O.../...@
qX
21:20:14.632428 IP 127.0.0.1.20416 > 127.0.0.1.51200: Flags [.], ack 1, win 
128, options [nop,nop,TS val 1901602390 ecr 1901602390], length 0
E..4f.@.@..4O.@.../
qX
21:20:14.632602 IP 127.0.0.1.20416 > 127.0.0.1.51200: Flags [P.], seq 1:318, 
ack 1, win 128, options [nop,nop,TS val 1901602390 ecr 1901602390], length 317
E..qf.@.@...O.@.../..e.
qX /pxf/v14/Metadata/getMetadata?profile=Hive=default.kdtest 
HTTP/1.1
Host: localhost:51200
Accept: application/json
X-GP-SEGMENT-ID: -15432
X-GP-SEGMENT-COUNT: 0
X-GP-XID: 2725021
X-GP-ALIGNMENT: 8
X-GP-URL-HOST: localhost
X-GP-URL-PORT: 51200
X-GP-URI: localhost:51200/
X-GP-HAS-FILTER: 0


21:20:14.632607 IP 127.0.0.1.51200 > 127.0.0.1.20416: Flags [.], ack 318, win 
130, options [nop,nop,TS val 1901602390 ecr 1901602390], length 0
E..4O@.@.3s..O.../...B
qX
21:20:15.084890 IP 127.0.0.1.51200 > 127.0.0.1.20416: Flags [P.], seq 1:269, 
ack 318, win 130, options [nop,nop,TS val 1901602843 ecr 1901602390], length 268
E..@P@.@.2f..O.../...B..4.
qX(.qX/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: application/json
Content-Length: 132
Date: Tue, 14 Feb 2017 05:20:15 GMT

{"PXFMetadata":[{"item":{"path":"default","name":"kdtest"},"fields":[{"name":"key","type":"text"},{"name":"value","type":"text"}]}]}
21:20:15.084900 IP 127.0.0.1.20416 > 127.0.0.1.51200: Flags [.], ack 269, win 
130, options [nop,nop,TS val 1901602843 ecr 1901602843], length 0
E..4f.@.@..2O.B...0%.!.
qX(.qX(.
21:20:15.085229 IP 127.0.0.1.20416 > 127.0.0.1.51200: Flags [F.], seq 318, ack 
269, win 130, options [nop,nop,TS val 1901602843 ecr 1901602843], length 0
E..4f.@.@..1O.B...0%. .
qX(.qX(.
21:20:15.085286 IP 127.0.0.1.51200 > 127.0.0.1.20416: Flags [F.], seq 269, ack 
319, win 130, options [nop,nop,TS val 1901602843 ecr 1901602843], length 0
E..4Q@.@.3q..O...0%..B
qX(.qX(.
21:20:15.085294 IP 127.0.0.1.20416 > 127.0.0.1.51200: Flags [.], ack 270, win 
130, options [nop,nop,TS val 1901602843 ecr 1901602843], length 0
E..4f.@.@..0O.B...0&...
qX(.qX(.
21:20:15.112739 IP 127.0.0.1.20422 > 127.0.0.1.51200: Flags [S], seq 222439143, 
win 65483, options [mss 65495,sackOK,TS val 1901602870 ecr 
1901602843,nop,wscale 9], length 0
B&..
qX(6qX(
21:20:15.112765 IP 127.0.0.1.51200 > 127.0.0.1.20422: Flags [S.], seq 
2606634976, ack 222439144, win 65483, options [mss 65495,sackOK,TS val 
1901602871 ecr 1901602870,nop,wscale 9], length 0
B&O..^..
qX(7qX(6...
21:20:15.112773 IP 127.0.0.1.20422 > 127.0.0.1.51200: Flags [.], ack 1, win 
128, options [nop,nop,TS val 1901602871 ecr 1901602871], length 0
B&..^..EO...
qX(7qX(7
21:20:15.112809 IP 127.0.0.1.20422 > 127.0.0.1.51200: Flags [P.], seq 1:581, 
ack 1, win 128, options [nop,nop,TS val 1901602871 ecr 1901602871], length 580
B&..^...m...O...
qX(7qX(7GET /pxf/v14/Fragmenter/getFragments?path=default.kdtest HTTP/1.1
Host: localhost:51200
Accept: application/json
X-GP-FORMAT: GPDBWritable
X-GP-ATTRS: 2
X-GP-ATTR-NAME0: key
X-GP-ATTR-TYPECODE0: 25
X-GP-ATTR-TYPENAME0: text
X-GP-ATTR-NAME1: value
X-GP-ATTR-TYPECODE1: 25
X-GP-ATTR-TYPENAME1: text
X-GP-SEGMENT-ID: -15432
X-GP-SEGMENT-COUNT: 0
X-GP-XID: 2725021
X-GP-ALIGNMENT: 8
X-GP-URL-HOST: localhost
X-GP-URL-PORT: 51200
X-GP-DATA-DIR: default.kdtest
X-GP-Profile: Hive
X-GP-URI: pxf://localhost:51200/default.kdtest?Profile=Hive
X-GP-HAS-FILTER: 0



[jira] [Commented] (HAWQ-1234) Document HAWQ to PXF APIs

2017-02-14 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866266#comment-15866266
 ] 

Kyle R Dunn commented on HAWQ-1234:
---

Additionally, I observed all this using the following command, on the HAWQ 
master. First, start the {{tcpdump}} trace, then invoke a SELECT from a 
previously defined PXF table, either using HCatalog directly, or manually 
defining a PXF external table.

{code}
$ tcpdump -n port 51200 -A
{code}

Here is the conversation output:
{code}
21:20:14.632410 IP 127.0.0.1.20416 > 127.0.0.1.51200: Flags [S], seq 
3498721498, win 65483, options [mss 65495,sackOK,TS val 1901602390 ecr 
1901547904,nop,wscale 9], length 0
E.. 127.0.0.1.20416: Flags [S.], seq 
3752275736, ack 3498721499, win 65483, options [mss 65495,sackOK,TS val 
1901602390 ecr 1901602390,nop,wscale 9], length 0
E..<..@.@.<...O.../...@
qX
21:20:14.632428 IP 127.0.0.1.20416 > 127.0.0.1.51200: Flags [.], ack 1, win 
128, options [nop,nop,TS val 1901602390 ecr 1901602390], length 0
E..4f.@.@..4O.@.../
qX
21:20:14.632602 IP 127.0.0.1.20416 > 127.0.0.1.51200: Flags [P.], seq 1:318, 
ack 1, win 128, options [nop,nop,TS val 1901602390 ecr 1901602390], length 317
E..qf.@.@...O.@.../..e.
qX /pxf/v14/Metadata/getMetadata?profile=Hive=default.kdtest 
HTTP/1.1
Host: localhost:51200
Accept: application/json
X-GP-SEGMENT-ID: -15432
X-GP-SEGMENT-COUNT: 0
X-GP-XID: 2725021
X-GP-ALIGNMENT: 8
X-GP-URL-HOST: localhost
X-GP-URL-PORT: 51200
X-GP-URI: localhost:51200/
X-GP-HAS-FILTER: 0


21:20:14.632607 IP 127.0.0.1.51200 > 127.0.0.1.20416: Flags [.], ack 318, win 
130, options [nop,nop,TS val 1901602390 ecr 1901602390], length 0
E..4O@.@.3s..O.../...B
qX
21:20:15.084890 IP 127.0.0.1.51200 > 127.0.0.1.20416: Flags [P.], seq 1:269, 
ack 318, win 130, options [nop,nop,TS val 1901602843 ecr 1901602390], length 268
E..@P@.@.2f..O.../...B..4.
qX(.qX/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: application/json
Content-Length: 132
Date: Tue, 14 Feb 2017 05:20:15 GMT

{"PXFMetadata":[{"item":{"path":"default","name":"kdtest"},"fields":[{"name":"key","type":"text"},{"name":"value","type":"text"}]}]}
21:20:15.084900 IP 127.0.0.1.20416 > 127.0.0.1.51200: Flags [.], ack 269, win 
130, options [nop,nop,TS val 1901602843 ecr 1901602843], length 0
E..4f.@.@..2O.B...0%.!.
qX(.qX(.
21:20:15.085229 IP 127.0.0.1.20416 > 127.0.0.1.51200: Flags [F.], seq 318, ack 
269, win 130, options [nop,nop,TS val 1901602843 ecr 1901602843], length 0
E..4f.@.@..1O.B...0%. .
qX(.qX(.
21:20:15.085286 IP 127.0.0.1.51200 > 127.0.0.1.20416: Flags [F.], seq 269, ack 
319, win 130, options [nop,nop,TS val 1901602843 ecr 1901602843], length 0
E..4Q@.@.3q..O...0%..B
qX(.qX(.
21:20:15.085294 IP 127.0.0.1.20416 > 127.0.0.1.51200: Flags [.], ack 270, win 
130, options [nop,nop,TS val 1901602843 ecr 1901602843], length 0
E..4f.@.@..0O.B...0&...
qX(.qX(.
21:20:15.112739 IP 127.0.0.1.20422 > 127.0.0.1.51200: Flags [S], seq 222439143, 
win 65483, options [mss 65495,sackOK,TS val 1901602870 ecr 
1901602843,nop,wscale 9], length 0
B&..
qX(6qX(
21:20:15.112765 IP 127.0.0.1.51200 > 127.0.0.1.20422: Flags [S.], seq 
2606634976, ack 222439144, win 65483, options [mss 65495,sackOK,TS val 
1901602871 ecr 1901602870,nop,wscale 9], length 0
B&O..^..
qX(7qX(6...
21:20:15.112773 IP 127.0.0.1.20422 > 127.0.0.1.51200: Flags [.], ack 1, win 
128, options [nop,nop,TS val 1901602871 ecr 1901602871], length 0
B&..^..EO...
qX(7qX(7
21:20:15.112809 IP 127.0.0.1.20422 > 127.0.0.1.51200: Flags [P.], seq 1:581, 
ack 1, win 128, options [nop,nop,TS val 1901602871 ecr 1901602871], length 580
B&..^...m...O...
qX(7qX(7GET /pxf/v14/Fragmenter/getFragments?path=default.kdtest HTTP/1.1
Host: localhost:51200
Accept: application/json
X-GP-FORMAT: GPDBWritable
X-GP-ATTRS: 2
X-GP-ATTR-NAME0: key
X-GP-ATTR-TYPECODE0: 25
X-GP-ATTR-TYPENAME0: text
X-GP-ATTR-NAME1: value
X-GP-ATTR-TYPECODE1: 25
X-GP-ATTR-TYPENAME1: text
X-GP-SEGMENT-ID: -15432
X-GP-SEGMENT-COUNT: 0
X-GP-XID: 2725021
X-GP-ALIGNMENT: 8
X-GP-URL-HOST: localhost
X-GP-URL-PORT: 51200
X-GP-DATA-DIR: default.kdtest
X-GP-Profile: Hive
X-GP-URI: pxf://localhost:51200/default.kdtest?Profile=Hive
X-GP-HAS-FILTER: 0


21:20:15.112813 IP 127.0.0.1.51200 > 127.0.0.1.20422: Flags [.], ack 581, win 
131, options [nop,nop,TS val 1901602871 ecr 1901602871], length 0
B),C..O..^..
qX(7qX(7
21:20:15.305723 IP 127.0.0.1.51200 > 127.0.0.1.20422: Flags [P.], seq 1:1443, 
ack 581, win 131, options [nop,nop,TS val 1901603063 ecr 1901602871], length 
1442
B),...O..^..
qX(.qX(7HTTP/1.1 200 OK
Server: 

[jira] [Comment Edited] (HAWQ-1234) Document HAWQ to PXF APIs

2017-02-14 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866266#comment-15866266
 ] 

Kyle R Dunn edited comment on HAWQ-1234 at 2/14/17 6:01 PM:


Additionally, I observed all this using the following command, on the HAWQ 
master. First, start the {{tcpdump}} trace, then invoke a SELECT from a 
previously defined PXF table, either using HCatalog directly, or manually 
defining a PXF external table. A similar approach could be used to observe 
datanode traffic during read/write operations.

{code}
$ tcpdump -n port 51200 -A
{code}

Here is the conversation output:
{code}
21:20:14.632410 IP 127.0.0.1.20416 > 127.0.0.1.51200: Flags [S], seq 
3498721498, win 65483, options [mss 65495,sackOK,TS val 1901602390 ecr 
1901547904,nop,wscale 9], length 0
E.. 127.0.0.1.20416: Flags [S.], seq 
3752275736, ack 3498721499, win 65483, options [mss 65495,sackOK,TS val 
1901602390 ecr 1901602390,nop,wscale 9], length 0
E..<..@.@.<...O.../...@
qX
21:20:14.632428 IP 127.0.0.1.20416 > 127.0.0.1.51200: Flags [.], ack 1, win 
128, options [nop,nop,TS val 1901602390 ecr 1901602390], length 0
E..4f.@.@..4O.@.../
qX
21:20:14.632602 IP 127.0.0.1.20416 > 127.0.0.1.51200: Flags [P.], seq 1:318, 
ack 1, win 128, options [nop,nop,TS val 1901602390 ecr 1901602390], length 317
E..qf.@.@...O.@.../..e.
qX /pxf/v14/Metadata/getMetadata?profile=Hive=default.kdtest 
HTTP/1.1
Host: localhost:51200
Accept: application/json
X-GP-SEGMENT-ID: -15432
X-GP-SEGMENT-COUNT: 0
X-GP-XID: 2725021
X-GP-ALIGNMENT: 8
X-GP-URL-HOST: localhost
X-GP-URL-PORT: 51200
X-GP-URI: localhost:51200/
X-GP-HAS-FILTER: 0


21:20:14.632607 IP 127.0.0.1.51200 > 127.0.0.1.20416: Flags [.], ack 318, win 
130, options [nop,nop,TS val 1901602390 ecr 1901602390], length 0
E..4O@.@.3s..O.../...B
qX
21:20:15.084890 IP 127.0.0.1.51200 > 127.0.0.1.20416: Flags [P.], seq 1:269, 
ack 318, win 130, options [nop,nop,TS val 1901602843 ecr 1901602390], length 268
E..@P@.@.2f..O.../...B..4.
qX(.qX/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: application/json
Content-Length: 132
Date: Tue, 14 Feb 2017 05:20:15 GMT

{"PXFMetadata":[{"item":{"path":"default","name":"kdtest"},"fields":[{"name":"key","type":"text"},{"name":"value","type":"text"}]}]}
21:20:15.084900 IP 127.0.0.1.20416 > 127.0.0.1.51200: Flags [.], ack 269, win 
130, options [nop,nop,TS val 1901602843 ecr 1901602843], length 0
E..4f.@.@..2O.B...0%.!.
qX(.qX(.
21:20:15.085229 IP 127.0.0.1.20416 > 127.0.0.1.51200: Flags [F.], seq 318, ack 
269, win 130, options [nop,nop,TS val 1901602843 ecr 1901602843], length 0
E..4f.@.@..1O.B...0%. .
qX(.qX(.
21:20:15.085286 IP 127.0.0.1.51200 > 127.0.0.1.20416: Flags [F.], seq 269, ack 
319, win 130, options [nop,nop,TS val 1901602843 ecr 1901602843], length 0
E..4Q@.@.3q..O...0%..B
qX(.qX(.
21:20:15.085294 IP 127.0.0.1.20416 > 127.0.0.1.51200: Flags [.], ack 270, win 
130, options [nop,nop,TS val 1901602843 ecr 1901602843], length 0
E..4f.@.@..0O.B...0&...
qX(.qX(.
21:20:15.112739 IP 127.0.0.1.20422 > 127.0.0.1.51200: Flags [S], seq 222439143, 
win 65483, options [mss 65495,sackOK,TS val 1901602870 ecr 
1901602843,nop,wscale 9], length 0
B&..
qX(6qX(
21:20:15.112765 IP 127.0.0.1.51200 > 127.0.0.1.20422: Flags [S.], seq 
2606634976, ack 222439144, win 65483, options [mss 65495,sackOK,TS val 
1901602871 ecr 1901602870,nop,wscale 9], length 0
B&O..^..
qX(7qX(6...
21:20:15.112773 IP 127.0.0.1.20422 > 127.0.0.1.51200: Flags [.], ack 1, win 
128, options [nop,nop,TS val 1901602871 ecr 1901602871], length 0
B&..^..EO...
qX(7qX(7
21:20:15.112809 IP 127.0.0.1.20422 > 127.0.0.1.51200: Flags [P.], seq 1:581, 
ack 1, win 128, options [nop,nop,TS val 1901602871 ecr 1901602871], length 580
B&..^...m...O...
qX(7qX(7GET /pxf/v14/Fragmenter/getFragments?path=default.kdtest HTTP/1.1
Host: localhost:51200
Accept: application/json
X-GP-FORMAT: GPDBWritable
X-GP-ATTRS: 2
X-GP-ATTR-NAME0: key
X-GP-ATTR-TYPECODE0: 25
X-GP-ATTR-TYPENAME0: text
X-GP-ATTR-NAME1: value
X-GP-ATTR-TYPECODE1: 25
X-GP-ATTR-TYPENAME1: text
X-GP-SEGMENT-ID: -15432
X-GP-SEGMENT-COUNT: 0
X-GP-XID: 2725021
X-GP-ALIGNMENT: 8
X-GP-URL-HOST: localhost
X-GP-URL-PORT: 51200
X-GP-DATA-DIR: default.kdtest
X-GP-Profile: Hive
X-GP-URI: pxf://localhost:51200/default.kdtest?Profile=Hive
X-GP-HAS-FILTER: 0


21:20:15.112813 IP 127.0.0.1.51200 > 127.0.0.1.20422: Flags [.], ack 581, win 
131, options [nop,nop,TS val 1901602871 ecr 1901602871], length 0
B),C..O..^..
qX(7qX(7
21:20:15.305723 IP 127.0.0.1.51200 > 127.0.0.1.20422: Flags [P.], seq 1:1443, 
ack 581, 

[jira] [Commented] (HAWQ-1234) Document HAWQ to PXF APIs

2017-02-14 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866238#comment-15866238
 ] 

Kyle R Dunn commented on HAWQ-1234:
---

I did some initial exploration of the HAWQ -> PXF communication chain, for a 
different purpose. I'm going to paste in what I've learned so far. Also, PXF 
itself does not store metadata, either HAWQ provides this directly or HCatalog 
can be queried for it; I'm showing the latter. PXF expects the metadata about 
the data, as well as some other pieces, to be provided as HTTP headers, which 
it appears to convert to a hashmap on the server side, as shown 
[here|https://github.com/apache/incubator-hawq/blob/master/pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/rest/RestResource.java#L52].
 

Get the PXF server version
{code}
$ curl 'http://localhost:51200/pxf/ProtocolVersion'
{ "version": "v14"} 
{code}

Get metadata from HCatalog for a Hive table called "kdtest" in the "default" 
database
{code}
$ curl -i -H "X-GP-SEGMENT-ID: -15432" -H "X-GP-SEGMENT-COUNT: 0" -H 
"X-GP-XID: 2724107" -H "X-GP-ALIGNMENT: 8" -H "X-GP-URL-HOST: localhost" -H 
"X-GP-URL-PORT: 51200" -H "X-GP-URI: localhost:51200/" -H "X-GP-HAS-FILTER: 0" 
'localhost:51200/pxf/v14/Metadata/getMetadata?profile=Hive=default.kdtest'
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: application/json
Content-Length: 132
Date: Tue, 14 Feb 2017 05:06:11 GMT

{"PXFMetadata":[{"item":{"path":"default","name":"kdtest"},"fields":[{"name":"key","type":"text"},{"name":"value","type":"text"}]}]}
{code}

Get the actual data (in {{TEXT}} format, {{GPDBWritable}} is also valid) for 
the above table's PXF "Fragments"
{code}
$ curl -i -H "X-GP-SEGMENT-ID: -15432" -H "X-GP-SEGMENT-COUNT: 0" -H 
"X-GP-XID: 2724107" -H "X-GP-ALIGNMENT: 8" -H "X-GP-URL-HOST: localhost" -H 
"X-GP-URL-PORT: 51200" -H "X-GP-URI: 
pxf://localhost:51200/default.kdtest?Profile=Hive" -H "X-GP-HAS-FILTER: 0" -H 
"X-GP-FORMAT: TEXT" -H "X-GP-ATTRS: 2" -H "X-GP-ATTR-NAME0: key" -H 
"X-GP-ATTR-TYPECODE0: 25" -H "X-GP-ATTR-TYPENAME0: text" -H "X-GP-ATTR-NAME1:  
value" -H "X-GP-ATTR-TYPECODE1: 25" -H "X-GP-ATTR-TYPENAME1: text" -H 
"X-GP-Profile: Hive" -H "X-GP-DATA-DIR: default.kdtest" 
'http://localhost:51200/pxf/v14/Fragmenter/getFragments?path=/apps/hive/warehouse/kdtest'

HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: application/json
Content-Length: 1305
Date: Tue, 14 Feb 2017 05:30:05 GMT

{"PXFFragments":[{"sourceName":"/apps/hive/warehouse/kdtest/hive-test-data.txt","index":0,"replicas":["10.215.181.12","10.215.181.11"],"metadata":"rO0ABXcQN3VyABNbTGphdmEubGFuZy5TdHJpbmc7rdJW5+kde0cCAAB4cAJ0AB1jbHBxbjFwZGhkYmRuMDIuaW5mb3NvbGNvLm5ldHQAHWNscHFuMXBkaGRiZG4wMS5pbmZvc29sY28ubmV0","userData":"b3JnLmFwYWNoZS5oYWRvb3AubWFwcmVkLlRleHRJbnB1dEZvcm1hdCFIVUREIW9yZy5hcGFjaGUuaGFkb29wLmhpdmUuc2VyZGUyLmxhenkuTGF6eVNpbXBsZVNlckRlIUhVREQhIwojTW9uIEZlYiAxMyAyMToyOTozNSBQU1QgMjAxNwpuYW1lPWRlZmF1bHQua2R0ZXN0Cm51bUZpbGVzPTEKZmllbGQuZGVsaW09LApjb2x1bW5zLnR5cGVzPXN0cmluZ1w6c3RyaW5nCnNlcmlhbGl6YXRpb24uZGRsPXN0cnVjdCBrZHRlc3QgeyBzdHJpbmcga2V5LCBzdHJpbmcgdmFsdWV9CmNvbHVtbnM9a2V5LHZhbHVlCnNlcmlhbGl6YXRpb24uZm9ybWF0PSwKY29sdW1ucy5jb21tZW50cz1cdTAwMDAKYnVja2V0X2NvdW50PS0xCnNlcmlhbGl6YXRpb24ubGliPW9yZy5hcGFjaGUuaGFkb29wLmhpdmUuc2VyZGUyLmxhenkuTGF6eVNpbXBsZVNlckRlCkNPTFVNTl9TVEFUU19BQ0NVUkFURT10cnVlCmZpbGUuaW5wdXRmb3JtYXQ9b3JnLmFwYWNoZS5oYWRvb3AubWFwcmVkLlRleHRJbnB1dEZvcm1hdAp0b3RhbFNpemU9NTUKZmlsZS5vdXRwdXRmb3JtYXQ9b3JnLmFwYWNoZS5oYWRvb3AuaGl2ZS5xbC5pby5IaXZlSWdub3JlS2V5VGV4dE91dHB1dEZvcm1hdApsb2NhdGlvbj1oZGZzXDovL2NscHFuMXBkaGRibW4wMS5pbmZvc29sY28ubmV0XDo4MDIwL2FwcHMvaGl2ZS93YXJlaG91c2Uva2R0ZXN0CnRyYW5zaWVudF9sYXN0RGRsVGltZT0xNDg3MDA2NDg4CiFIVUREISFITlBUISFIVUREIWZhbHNl"}]}
{code}

The Hive table looks like this:
{code}
hive> describe formatted kdtest;
OK
# col_name  data_type   comment

key string
value   string

# Detailed Table Information
Database:   default
Owner:  kdunn
CreateTime: Mon Feb 13 09:20:40 PST 2017
LastAccessTime: UNKNOWN
Protect Mode:   None
Retention:  0
Location:   
hdfs://clpqn1pdhdbmn01.infosolco.net:8020/apps/hive/warehouse/kdtest
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE   true
numFiles1
totalSize   55
transient_lastDdlTime   1487006488

# Storage Information
SerDe Library:  org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:org.apache.hadoop.mapred.TextInputFormat
OutputFormat:   
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets:-1
Bucket Columns: []
Sort Columns:   []
Storage Desc Params:
field.delim ,
serialization.format,
Time 

[jira] [Comment Edited] (HAWQ-1234) Document HAWQ to PXF APIs

2017-02-14 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866238#comment-15866238
 ] 

Kyle R Dunn edited comment on HAWQ-1234 at 2/14/17 5:56 PM:


I did some initial exploration of the HAWQ -> PXF communication chain, for a 
different purpose. I'm going to paste in what I've learned so far. Also, PXF 
itself does not store metadata, either HAWQ provides this directly or HCatalog 
can be queried for it; I'm showing the latter. PXF expects the metadata about 
the data, as well as some other pieces, to be provided as HTTP headers, which 
it appears to convert to a hashmap on the server side, as shown 
[here|https://github.com/apache/incubator-hawq/blob/master/pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/rest/RestResource.java#L52].
 

Get the PXF server version
{code}
$ curl 'http://localhost:51200/pxf/ProtocolVersion'
{ "version": "v14"} 
{code}

Get metadata from HCatalog for a Hive table called "kdtest" in the "default" 
database
{code}
$ curl -i -H "X-GP-SEGMENT-ID: -15432" -H "X-GP-SEGMENT-COUNT: 0" -H 
"X-GP-XID: 2724107" -H "X-GP-ALIGNMENT: 8" -H "X-GP-URL-HOST: localhost" -H 
"X-GP-URL-PORT: 51200" -H "X-GP-URI: localhost:51200/" -H "X-GP-HAS-FILTER: 0" 
'localhost:51200/pxf/v14/Metadata/getMetadata?profile=Hive=default.kdtest'
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: application/json
Content-Length: 132
Date: Tue, 14 Feb 2017 05:06:11 GMT

{"PXFMetadata":[{"item":{"path":"default","name":"kdtest"},"fields":[{"name":"key","type":"text"},{"name":"value","type":"text"}]}]}
{code}

Get the actual data (in {{TEXT}} format, {{GPDBWritable}} is also valid) for 
the above table's PXF "Fragments"
{code}
$ curl -i -H "X-GP-SEGMENT-ID: -15432" -H "X-GP-SEGMENT-COUNT: 0" -H 
"X-GP-XID: 2724107" -H "X-GP-ALIGNMENT: 8" -H "X-GP-URL-HOST: localhost" -H 
"X-GP-URL-PORT: 51200" -H "X-GP-URI: 
pxf://localhost:51200/default.kdtest?Profile=Hive" -H "X-GP-HAS-FILTER: 0" -H 
"X-GP-FORMAT: TEXT" -H "X-GP-ATTRS: 2" -H "X-GP-ATTR-NAME0: key" -H 
"X-GP-ATTR-TYPECODE0: 25" -H "X-GP-ATTR-TYPENAME0: text" -H "X-GP-ATTR-NAME1:  
value" -H "X-GP-ATTR-TYPECODE1: 25" -H "X-GP-ATTR-TYPENAME1: text" -H 
"X-GP-Profile: Hive" -H "X-GP-DATA-DIR: default.kdtest" 
'http://localhost:51200/pxf/v14/Fragmenter/getFragments?path=/apps/hive/warehouse/kdtest'

HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: application/json
Content-Length: 1305
Date: Tue, 14 Feb 2017 05:30:05 GMT

{"PXFFragments":[{"sourceName":"/apps/hive/warehouse/kdtest/hive-test-data.txt","index":0,"replicas":["10.215.181.12","10.215.181.11"],"metadata":"rO0ABXcQN3VyABNbTGphdmEubGFuZy5TdHJpbmc7rdJW5+kde0cCAAB4cAJ0AB1jbHBxbjFwZGhkYmRuMDIuaW5mb3NvbGNvLm5ldHQAHWNscHFuMXBkaGRiZG4wMS5pbmZvc29sY28ubmV0","userData":"b3JnLmFwYWNoZS5oYWRvb3AubWFwcmVkLlRleHRJbnB1dEZvcm1hdCFIVUREIW9yZy5hcGFjaGUuaGFkb29wLmhpdmUuc2VyZGUyLmxhenkuTGF6eVNpbXBsZVNlckRlIUhVREQhIwojTW9uIEZlYiAxMyAyMToyOTozNSBQU1QgMjAxNwpuYW1lPWRlZmF1bHQua2R0ZXN0Cm51bUZpbGVzPTEKZmllbGQuZGVsaW09LApjb2x1bW5zLnR5cGVzPXN0cmluZ1w6c3RyaW5nCnNlcmlhbGl6YXRpb24uZGRsPXN0cnVjdCBrZHRlc3QgeyBzdHJpbmcga2V5LCBzdHJpbmcgdmFsdWV9CmNvbHVtbnM9a2V5LHZhbHVlCnNlcmlhbGl6YXRpb24uZm9ybWF0PSwKY29sdW1ucy5jb21tZW50cz1cdTAwMDAKYnVja2V0X2NvdW50PS0xCnNlcmlhbGl6YXRpb24ubGliPW9yZy5hcGFjaGUuaGFkb29wLmhpdmUuc2VyZGUyLmxhenkuTGF6eVNpbXBsZVNlckRlCkNPTFVNTl9TVEFUU19BQ0NVUkFURT10cnVlCmZpbGUuaW5wdXRmb3JtYXQ9b3JnLmFwYWNoZS5oYWRvb3AubWFwcmVkLlRleHRJbnB1dEZvcm1hdAp0b3RhbFNpemU9NTUKZmlsZS5vdXRwdXRmb3JtYXQ9b3JnLmFwYWNoZS5oYWRvb3AuaGl2ZS5xbC5pby5IaXZlSWdub3JlS2V5VGV4dE91dHB1dEZvcm1hdApsb2NhdGlvbj1oZGZzXDovL2NscHFuMXBkaGRibW4wMS5pbmZvc29sY28ubmV0XDo4MDIwL2FwcHMvaGl2ZS93YXJlaG91c2Uva2R0ZXN0CnRyYW5zaWVudF9sYXN0RGRsVGltZT0xNDg3MDA2NDg4CiFIVUREISFITlBUISFIVUREIWZhbHNl"}]}
{code}

The Hive table looks like this:
{code}
hive> describe formatted kdtest;
OK
# col_name  data_type   comment

key string
value   string

# Detailed Table Information
Database:   default
Owner:  kdunn
CreateTime: Mon Feb 13 09:20:40 PST 2017
LastAccessTime: UNKNOWN
Protect Mode:   None
Retention:  0
Location:   hdfs://nowhere.com:8020/apps/hive/warehouse/kdtest
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE   true
numFiles1
totalSize   55
transient_lastDdlTime   1487006488

# Storage Information
SerDe Library:  org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:org.apache.hadoop.mapred.TextInputFormat
OutputFormat:   
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets:-1
Bucket Columns: []
Sort Columns:   []
Storage Desc Params:
field.delim ,

[jira] [Comment Edited] (HAWQ-1234) Document HAWQ to PXF APIs

2017-02-14 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866238#comment-15866238
 ] 

Kyle R Dunn edited comment on HAWQ-1234 at 2/14/17 5:55 PM:


I did some initial exploration of the HAWQ -> PXF communication chain, for a 
different purpose. I'm going to paste in what I've learned so far. Also, PXF 
itself does not store metadata, either HAWQ provides this directly or HCatalog 
can be queried for it; I'm showing the latter. PXF expects the metadata about 
the data, as well as some other pieces, to be provided as HTTP headers, which 
it appears to convert to a hashmap on the server side, as shown 
[here|https://github.com/apache/incubator-hawq/blob/master/pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/rest/RestResource.java#L52].
 

Get the PXF server version
{code}
$ curl 'http://localhost:51200/pxf/ProtocolVersion'
{ "version": "v14"} 
{code}

Get metadata from HCatalog for a Hive table called "kdtest" in the "default" 
database
{code}
$ curl -i -H "X-GP-SEGMENT-ID: -15432" -H "X-GP-SEGMENT-COUNT: 0" -H 
"X-GP-XID: 2724107" -H "X-GP-ALIGNMENT: 8" -H "X-GP-URL-HOST: localhost" -H 
"X-GP-URL-PORT: 51200" -H "X-GP-URI: localhost:51200/" -H "X-GP-HAS-FILTER: 0" 
'localhost:51200/pxf/v14/Metadata/getMetadata?profile=Hive=default.kdtest'
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: application/json
Content-Length: 132
Date: Tue, 14 Feb 2017 05:06:11 GMT

{"PXFMetadata":[{"item":{"path":"default","name":"kdtest"},"fields":[{"name":"key","type":"text"},{"name":"value","type":"text"}]}]}
{code}

Get the actual data (in {{TEXT}} format, {{GPDBWritable}} is also valid) for 
the above table's PXF "Fragments"
{code}
$ curl -i -H "X-GP-SEGMENT-ID: -15432" -H "X-GP-SEGMENT-COUNT: 0" -H 
"X-GP-XID: 2724107" -H "X-GP-ALIGNMENT: 8" -H "X-GP-URL-HOST: localhost" -H 
"X-GP-URL-PORT: 51200" -H "X-GP-URI: 
pxf://localhost:51200/default.kdtest?Profile=Hive" -H "X-GP-HAS-FILTER: 0" -H 
"X-GP-FORMAT: TEXT" -H "X-GP-ATTRS: 2" -H "X-GP-ATTR-NAME0: key" -H 
"X-GP-ATTR-TYPECODE0: 25" -H "X-GP-ATTR-TYPENAME0: text" -H "X-GP-ATTR-NAME1:  
value" -H "X-GP-ATTR-TYPECODE1: 25" -H "X-GP-ATTR-TYPENAME1: text" -H 
"X-GP-Profile: Hive" -H "X-GP-DATA-DIR: default.kdtest" 
'http://localhost:51200/pxf/v14/Fragmenter/getFragments?path=/apps/hive/warehouse/kdtest'

HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: application/json
Content-Length: 1305
Date: Tue, 14 Feb 2017 05:30:05 GMT

{"PXFFragments":[{"sourceName":"/apps/hive/warehouse/kdtest/hive-test-data.txt","index":0,"replicas":["10.215.181.12","10.215.181.11"],"metadata":"rO0ABXcQN3VyABNbTGphdmEubGFuZy5TdHJpbmc7rdJW5+kde0cCAAB4cAJ0AB1jbHBxbjFwZGhkYmRuMDIuaW5mb3NvbGNvLm5ldHQAHWNscHFuMXBkaGRiZG4wMS5pbmZvc29sY28ubmV0","userData":"b3JnLmFwYWNoZS5oYWRvb3AubWFwcmVkLlRleHRJbnB1dEZvcm1hdCFIVUREIW9yZy5hcGFjaGUuaGFkb29wLmhpdmUuc2VyZGUyLmxhenkuTGF6eVNpbXBsZVNlckRlIUhVREQhIwojTW9uIEZlYiAxMyAyMToyOTozNSBQU1QgMjAxNwpuYW1lPWRlZmF1bHQua2R0ZXN0Cm51bUZpbGVzPTEKZmllbGQuZGVsaW09LApjb2x1bW5zLnR5cGVzPXN0cmluZ1w6c3RyaW5nCnNlcmlhbGl6YXRpb24uZGRsPXN0cnVjdCBrZHRlc3QgeyBzdHJpbmcga2V5LCBzdHJpbmcgdmFsdWV9CmNvbHVtbnM9a2V5LHZhbHVlCnNlcmlhbGl6YXRpb24uZm9ybWF0PSwKY29sdW1ucy5jb21tZW50cz1cdTAwMDAKYnVja2V0X2NvdW50PS0xCnNlcmlhbGl6YXRpb24ubGliPW9yZy5hcGFjaGUuaGFkb29wLmhpdmUuc2VyZGUyLmxhenkuTGF6eVNpbXBsZVNlckRlCkNPTFVNTl9TVEFUU19BQ0NVUkFURT10cnVlCmZpbGUuaW5wdXRmb3JtYXQ9b3JnLmFwYWNoZS5oYWRvb3AubWFwcmVkLlRleHRJbnB1dEZvcm1hdAp0b3RhbFNpemU9NTUKZmlsZS5vdXRwdXRmb3JtYXQ9b3JnLmFwYWNoZS5oYWRvb3AuaGl2ZS5xbC5pby5IaXZlSWdub3JlS2V5VGV4dE91dHB1dEZvcm1hdApsb2NhdGlvbj1oZGZzXDovL2NscHFuMXBkaGRibW4wMS5pbmZvc29sY28ubmV0XDo4MDIwL2FwcHMvaGl2ZS93YXJlaG91c2Uva2R0ZXN0CnRyYW5zaWVudF9sYXN0RGRsVGltZT0xNDg3MDA2NDg4CiFIVUREISFITlBUISFIVUREIWZhbHNl"}]}
{code}

The Hive table looks like this:
{code}
hive> describe formatted kdtest;
OK
# col_name  data_type   comment

key string
value   string

# Detailed Table Information
Database:   default
Owner:  kdunn
CreateTime: Mon Feb 13 09:20:40 PST 2017
LastAccessTime: UNKNOWN
Protect Mode:   None
Retention:  0
Location:   hdfs://nowhere.com:8020/apps/hive/warehouse/kdtest
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE   true
numFiles1
totalSize   55
transient_lastDdlTime   1487006488

# Storage Information
SerDe Library:  org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:org.apache.hadoop.mapred.TextInputFormat
OutputFormat:   
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets:-1
Bucket Columns: []
Sort Columns:   []
Storage Desc Params:
field.delim ,

[jira] [Commented] (HAWQ-256) Integrate Security with Apache Ranger

2017-02-07 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856444#comment-15856444
 ] 

Kyle R Dunn commented on HAWQ-256:
--

[~Lili Ma] - here's some input for you
*1)  Why do they want to use Ranger?  What are the scenario and use cases?*
Ranger provides the missing (and very important) functionality for 
synchronizing roles and groups from a identity management provider (like LDAP) 
into HAWQ. Without this capability, roles must be provisioned manually or 
something like pg-ldap-sync must be used, neither are very enterprise-friendly 
or "baked" solutions. 

*2)  Which version of Ranger do they want to use?  Is the version 0.6+ 
acceptable (shipped in HDP 2.5+) ?*
I think any version is a good starting point, in my opinion, it is best we stay 
aligned with what is available the current GA HDP GA.

*3)  What are the specific HAWQ objects they want to manage in Ranger, for 
example, Database/Tablespace/Schema/Table/Sequence/Language/Function/Protocol? 
Is there anything else?*
In my mind, support for schema, table, sequence, function, protocol are more 
important. Then prioritize database, tablespace - those seem to the more 
"advanced" usage (compared to the former) for most SQL on Hadoop installations 
I've seen.

*4)  What kind of tables do they want to manage? Heap (catalog) table, or data 
table on HDFS?*
Data tables. My opinion, catalog should only be managed by a local superuser.

*5)  Do they want to restrict superuser privileges? If yes, what kind of 
privileges do they want to restrict, including catalog table or just the table 
on HDFS?*
I've not seen this requirement, except with PL/x function creation / 
invocation. 

*6)  Do they want to use Ambari to deploy HAWQ and Ranger?*
Whenever possible, yes.

*7) Do they have requirements for integration with user management tool such as 
LDAP?*
Absolutely, this is the main motivator from my perspective.

*8) Do they have a need to switch back and forth from Ranger? Say, setting 
Ranger on, and then setting off (using HAWQ native authorization)?*
Hard to say here. If it is possible for HAWQ to reach some un-usable state as a 
result of have Ranger on, then yes, otherwise, it seems unlikely this would be 
a common activity.

*9) Are they ok with the solution that we put system catalog/function/owner 
check in HAWQ?
--- There are a lot of catalog information check(for example, pg_catalog, 
information_schema, etc) and system embedded function(for example, count, 
charne, etc) check in a simple SQL command such as “analyze” and “\d”, which 
will consume a lot of communication cost with Ranger if we process it in 
Ranger. Also, the embedded catalog/function may not include so much sensitive 
data.
   --- HAWQ does owner check under some cases. For example, only the owner who 
creates the table can drop it. Are the customer OK with that we keep the owner 
check in HAWQ?*
This makes sense to me. Having admin functions only available via a local 
account but auditable by Ranger is likely a fair tradeoff here. 

*10) Are they ok with the solution that once Ranger is configured, we will 
forbid GRANT/REVOKE command in HAWQ?*
This seems to be the correct behavior to avoid inconsistencies.

*11) Are they ok with the solution that HAWQ handles the privileges check for 
drop table/create database?*
This comes back to the third question - I think it makes sense, others may have 
a different opinion.

*12) Are they ok with the solution that configuring an extra GUC in Ambari side 
for indicating Ranger on/off?*
Not sure here. If Ranger thinks it's managing HAWQ, HAWQ should not be allowed 
to be "off" in Ambari. For the "disable Ranger" mode in HAWQ, maybe it should 
be command line only, as it would likely be only for troubleshooting / 
temporary usage.

*13) Are they OK if we don’t provide High Availability with HAWQ Ranger Plugin 
Service (RPS) in the first (beta) release?*
I think this is ok. Right now, it is not easy (or maybe even possible) to have 
high availability with HAWQ+LDAP, so this is still at parity with current 
functionality. 


Hope this helps.

> Integrate Security with Apache Ranger
> -
>
> Key: HAWQ-256
> URL: https://issues.apache.org/jira/browse/HAWQ-256
> Project: Apache HAWQ
>  Issue Type: New Feature
>  Components: Security
>Reporter: Michael Andre Pearce (IG)
>Assignee: Lili Ma
> Fix For: backlog
>
> Attachments: HAWQRangerSupportDesign.pdf, 
> HAWQRangerSupportDesign_v0.2.pdf, HAWQRangerSupportDesign_v0.3.pdf
>
>
> Integrate security with Apache Ranger for a unified Hadoop security solution. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HAWQ-1161) Refactor PXF to use new Hadoop MapReduce APIs

2016-11-16 Thread Kyle R Dunn (JIRA)
Kyle R Dunn created HAWQ-1161:
-

 Summary: Refactor PXF to use new Hadoop MapReduce APIs
 Key: HAWQ-1161
 URL: https://issues.apache.org/jira/browse/HAWQ-1161
 Project: Apache HAWQ
  Issue Type: Improvement
  Components: PXF
Reporter: Kyle R Dunn
Assignee: Lei Chang
 Fix For: backlog


Several classes in PXF make use of the older `org.apache.hadoop.mapred` API 
rather than the new `org.apache.hadoop.mapreduce` one. As a plugin developer, 
this has been the source of a significant headache. Other HAWQ libraries, like 
hawq-hadoop use the newer `org.apache.hadoop.mapreduce` API, creating 
unnecessary friction between these two things. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1078) Implement hawqsync-falcon DR utility.

2016-09-26 Thread Kyle R Dunn (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle R Dunn updated HAWQ-1078:
--
Attachment: hawq-dr-design.pdf

WIP design overview.

> Implement hawqsync-falcon DR utility.
> -
>
> Key: HAWQ-1078
> URL: https://issues.apache.org/jira/browse/HAWQ-1078
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Command Line Tools
>Reporter: Kyle R Dunn
>Assignee: Lei Chang
> Fix For: backlog
>
> Attachments: hawq-dr-design.pdf
>
>
> HAWQ currently offers no DR functionality. This JIRA is for tracking the 
> design and development of a hawqsync-falcon utility, which uses a combination 
> of Falcon-based HDFS replication and custom automation in Python for allowing 
> both the HAWQ master catalog and corresponding HDFS data to be replicated to 
> a remote cluster for DR functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1078) Implement hawqsync-falcon DR utility.

2016-09-26 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524164#comment-15524164
 ] 

Kyle R Dunn commented on HAWQ-1078:
---

This is dependent on HAWQ-991. 

> Implement hawqsync-falcon DR utility.
> -
>
> Key: HAWQ-1078
> URL: https://issues.apache.org/jira/browse/HAWQ-1078
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Command Line Tools
>Reporter: Kyle R Dunn
>Assignee: Lei Chang
> Fix For: backlog
>
>
> HAWQ currently offers no DR functionality. This JIRA is for tracking the 
> design and development of a hawqsync-falcon utility, which uses a combination 
> of Falcon-based HDFS replication and custom automation in Python for allowing 
> both the HAWQ master catalog and corresponding HDFS data to be replicated to 
> a remote cluster for DR functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAWQ-1078) Implement hawqsync-falcon DR utility.

2016-09-26 Thread Kyle R Dunn (JIRA)
Kyle R Dunn created HAWQ-1078:
-

 Summary: Implement hawqsync-falcon DR utility.
 Key: HAWQ-1078
 URL: https://issues.apache.org/jira/browse/HAWQ-1078
 Project: Apache HAWQ
  Issue Type: Improvement
  Components: Command Line Tools
Reporter: Kyle R Dunn
Assignee: Lei Chang
 Fix For: backlog


HAWQ currently offers no DR functionality. This JIRA is for tracking the design 
and development of a hawqsync-falcon utility, which uses a combination of 
Falcon-based HDFS replication and custom automation in Python for allowing both 
the HAWQ master catalog and corresponding HDFS data to be replicated to a 
remote cluster for DR functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAWQ-1069) Support Kerberos and token impersonation

2016-09-22 Thread Kyle R Dunn (JIRA)
Kyle R Dunn created HAWQ-1069:
-

 Summary: Support Kerberos and token impersonation
 Key: HAWQ-1069
 URL: https://issues.apache.org/jira/browse/HAWQ-1069
 Project: Apache HAWQ
  Issue Type: Improvement
  Components: libhdfs
Reporter: Kyle R Dunn
Assignee: Lei Chang
 Fix For: backlog


Created as a carryover for [libhdfs3 Github PR 
#45|https://github.com/Pivotal-Data-Attic/pivotalrd-libhdfs3/pull/45] on behalf 
of [bdrosen96|https://github.com/bdrosen96].





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAWQ-1066) Improper handling of install name for shared library on OS X

2016-09-21 Thread Kyle R Dunn (JIRA)
Kyle R Dunn created HAWQ-1066:
-

 Summary: Improper handling of install name for shared library on 
OS X
 Key: HAWQ-1066
 URL: https://issues.apache.org/jira/browse/HAWQ-1066
 Project: Apache HAWQ
  Issue Type: Bug
  Components: libhdfs
Reporter: Kyle R Dunn
Assignee: Lei Chang


Created as a carryover for [libhdfs3 Github 
#40|https://github.com/Pivotal-Data-Attic/pivotalrd-libhdfs3/issues/46] on 
behalf of [elfprince13|https://github.com/elfprince13]:

I am working on a project that has libhdfs3 as a submodule in our git repo. 
Since we want to keep the build process contained in a single (user-owned) 
directory tree, we configure with {{cmake 
-DCMAKE_INSTALL_PREFIX:PATH=$(pwd)/usr}}. However, after running {{make && make 
install}}, I then find the following incorrect behavior when I run {{otool}}.

{code}
[thomas@Mithlond] libhdfs3-cmake $ otool -D usr/lib/libhdfs3.dylib
usr/lib/libhdfs3.dylib:
libhdfs3.1.dylib
{code}

Note that since the install name is incorrectly set, linking against this copy 
of the library, even by absolute path, will produce a binary that can't find 
libhdfs3.dylib without manually altering LD_LIBRARY_PATH.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-1066) Improper handling of install name for shared library on OS X

2016-09-21 Thread Kyle R Dunn (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle R Dunn updated HAWQ-1066:
--
Priority: Minor  (was: Major)

> Improper handling of install name for shared library on OS X
> 
>
> Key: HAWQ-1066
> URL: https://issues.apache.org/jira/browse/HAWQ-1066
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: libhdfs
>Reporter: Kyle R Dunn
>Assignee: Lei Chang
>Priority: Minor
>
> Created as a carryover for [libhdfs3 Github 
> #40|https://github.com/Pivotal-Data-Attic/pivotalrd-libhdfs3/issues/46] on 
> behalf of [elfprince13|https://github.com/elfprince13]:
> I am working on a project that has libhdfs3 as a submodule in our git repo. 
> Since we want to keep the build process contained in a single (user-owned) 
> directory tree, we configure with {{cmake 
> -DCMAKE_INSTALL_PREFIX:PATH=$(pwd)/usr}}. However, after running {{make && 
> make install}}, I then find the following incorrect behavior when I run 
> {{otool}}.
> {code}
> [thomas@Mithlond] libhdfs3-cmake $ otool -D usr/lib/libhdfs3.dylib
> usr/lib/libhdfs3.dylib:
> libhdfs3.1.dylib
> {code}
> Note that since the install name is incorrectly set, linking against this 
> copy of the library, even by absolute path, will produce a binary that can't 
> find libhdfs3.dylib without manually altering LD_LIBRARY_PATH.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1066) Improper handling of install name for shared library on OS X

2016-09-21 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510987#comment-15510987
 ] 

Kyle R Dunn commented on HAWQ-1066:
---

Current work-around is adding:

{code}
if [[ "$OS_UNAME" == "Darwin" ]]; then
install_name_tool -id `pwd`/usr/lib/libhdfs3.1.dylib 
`pwd`/usr/lib/libhdfs3.1.dylib
fi
{code}

into a bash script that runs as a wrapper around the build task, goal is to 
have the official CMake files configured correctly.

> Improper handling of install name for shared library on OS X
> 
>
> Key: HAWQ-1066
> URL: https://issues.apache.org/jira/browse/HAWQ-1066
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: libhdfs
>Reporter: Kyle R Dunn
>Assignee: Lei Chang
>
> Created as a carryover for [libhdfs3 Github 
> #40|https://github.com/Pivotal-Data-Attic/pivotalrd-libhdfs3/issues/46] on 
> behalf of [elfprince13|https://github.com/elfprince13]:
> I am working on a project that has libhdfs3 as a submodule in our git repo. 
> Since we want to keep the build process contained in a single (user-owned) 
> directory tree, we configure with {{cmake 
> -DCMAKE_INSTALL_PREFIX:PATH=$(pwd)/usr}}. However, after running {{make && 
> make install}}, I then find the following incorrect behavior when I run 
> {{otool}}.
> {code}
> [thomas@Mithlond] libhdfs3-cmake $ otool -D usr/lib/libhdfs3.dylib
> usr/lib/libhdfs3.dylib:
> libhdfs3.1.dylib
> {code}
> Note that since the install name is incorrectly set, linking against this 
> copy of the library, even by absolute path, will produce a binary that can't 
> find libhdfs3.dylib without manually altering LD_LIBRARY_PATH.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAWQ-1063) HAWQ Python library missing import

2016-09-20 Thread Kyle R Dunn (JIRA)
Kyle R Dunn created HAWQ-1063:
-

 Summary: HAWQ Python library missing import
 Key: HAWQ-1063
 URL: https://issues.apache.org/jira/browse/HAWQ-1063
 Project: Apache HAWQ
  Issue Type: Bug
  Components: Command Line Tools
Reporter: Kyle R Dunn
Assignee: Lei Chang


The file: `tools/bin/hawqpylib/hawqlib.py` is missing a required import for 
catching a DatabaseError exception. This exception is raised when HAWQ is 
stopped and a tool like `gppkg` is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1025) Check the consistency of AO/Parquet_FileLocations.Files.size attribute in extracted yaml file and the actual file size in HDFS.

2016-08-26 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439308#comment-15439308
 ] 

Kyle R Dunn commented on HAWQ-1025:
---

This functionality again relates to the discussion in HAWQ-1011. In order to 
take advantage of the register from YAML feature in the context of why it was 
requested, we will need to allow the actual file size to be different from the 
catalog size attribute, at least until a proper recovery is performed. These 
are critical aspects to the initial purpose of this feature.

> Check the consistency of AO/Parquet_FileLocations.Files.size attribute in 
> extracted yaml file and the actual file size in HDFS.
> ---
>
> Key: HAWQ-1025
> URL: https://issues.apache.org/jira/browse/HAWQ-1025
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Affects Versions: 2.0.1.0-incubating
>Reporter: hongwu
>Assignee: hongwu
> Fix For: 2.0.1.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1011) Check whether the table to be registered is existed

2016-08-22 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430891#comment-15430891
 ] 

Kyle R Dunn commented on HAWQ-1011:
---

Hi [~xunzhang] I do understand the situation you are describing, it makes good 
sense for that to be the default behavior. I am proposing to add a --force or 
--repair flag to hawq register to allow an existing table to have its metadata 
updated via hawq register without dropping it first. 

> Check whether the table to be registered is existed
> ---
>
> Key: HAWQ-1011
> URL: https://issues.apache.org/jira/browse/HAWQ-1011
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Affects Versions: 2.0.1.0-incubating
>Reporter: hongwu
>Assignee: hongwu
> Fix For: 2.0.1.0-incubating
>
>
> Check whether the table to be registered is existed or not. If it is, print a 
> error message and then exit. This is a feature for hawq register to make sure 
> the right name from user input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1011) Check whether the table to be registered is existed

2016-08-19 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429058#comment-15429058
 ] 

Kyle R Dunn commented on HAWQ-1011:
---

In the event of using register to "update" metadata about an existing table 
(e.g. in the case of catalog / data inconsistency) - we need the ability to 
overwrite the table's metadata in the catalog - meaning that table will also 
already exists. If we have to "drop" from the catalog and re-register, the data 
in HDFS will be dropped, which is not the desired behavior. Please discuss with 
[~lei_chang] and [~vVineet] for clarification.

> Check whether the table to be registered is existed
> ---
>
> Key: HAWQ-1011
> URL: https://issues.apache.org/jira/browse/HAWQ-1011
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Affects Versions: 2.0.1.0-incubating
>Reporter: hongwu
>Assignee: hongwu
> Fix For: 2.0.1.0-incubating
>
>
> Check whether the table to be registered is existed or not. If it is, print a 
> error message and then exit. This is a feature for hawq register to make sure 
> the right name from user input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-951) PXF not locating Hadoop native libraries needed for Snappy

2016-07-26 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394749#comment-15394749
 ] 

Kyle R Dunn commented on HAWQ-951:
--

cd to {{/usr/lib/hadoop/lib}} , rather than {{/usr/lib/hadoop}}

> PXF not locating Hadoop native libraries needed for Snappy
> --
>
> Key: HAWQ-951
> URL: https://issues.apache.org/jira/browse/HAWQ-951
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: PXF
>Affects Versions: 2.0.0.0-incubating
>Reporter: Pratheesh Nair
>Assignee: Goden Yao
> Fix For: backlog
>
>
> Hawq queries are failing when we try to read Snappy-compressed table from 
> hcatalog via external tables.
> After the following was performed on every PXF host and restarting, the issue 
> was resolved:
> {code}
> mkdir -p /usr/lib/hadoop/lib && cd /usr/lib/hadoop && ln -s 
> /usr/hdp/current/hadoop-client/lib/native native
> {code}
> Also, the default pxf-public-classpath should probably contain something like 
> the following line:
> {code}
> /usr/hdp/current/hadoop-client/lib/snappy*.jar
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HAWQ-951) PXF not locating Hadoop native libraries needed for Snappy

2016-07-26 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394720#comment-15394720
 ] 

Kyle R Dunn edited comment on HAWQ-951 at 7/26/16 10:59 PM:


The corrected symlink process is below, I had it incorrect in the Zendesk 
ticket.

{code:shell}
mkdir -p /usr/lib/hadoop/lib && cd /usr/lib/hadoop/lib && ln -s 
/usr/hdp/current/hadoop-client/lib/native native 
{code}


was (Author: kdunn926):
The corrected symlink process is below, I had it incorrect in the Zendesk 
ticket.

{{ 
mkdir -p /usr/lib/hadoop/lib && cd /usr/lib/hadoop/lib && ln -s 
/usr/hdp/current/hadoop-client/lib/native native 
}}

> PXF not locating Hadoop native libraries needed for Snappy
> --
>
> Key: HAWQ-951
> URL: https://issues.apache.org/jira/browse/HAWQ-951
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: PXF
>Affects Versions: 2.0.0.0-incubating
>Reporter: Pratheesh Nair
>Assignee: Goden Yao
> Fix For: backlog
>
>
> Hawq queries are failing when we try to read Snappy-compressed table from 
> hcatalog via external tables.
> After the following was performed on every PXF host and restarting, the issue 
> was resolved:
> {code}
> mkdir -p /usr/lib/hadoop/lib && cd /usr/lib/hadoop && ln -s 
> /usr/hdp/current/hadoop-client/lib/native native
> {code}
> Also, the default pxf-public-classpath should probably contain something like 
> the following line:
> {code}
> /usr/hdp/current/hadoop-client/lib/snappy*.jar
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HAWQ-951) PXF not locating Hadoop native libraries needed for Snappy

2016-07-26 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394720#comment-15394720
 ] 

Kyle R Dunn edited comment on HAWQ-951 at 7/26/16 10:59 PM:


The corrected symlink process is below, I had it incorrect in the Zendesk 
ticket.

{{ 
mkdir -p /usr/lib/hadoop/lib && cd /usr/lib/hadoop/lib && ln -s 
/usr/hdp/current/hadoop-client/lib/native native 
}}


was (Author: kdunn926):
The corrected symlink process is below, I had it incorrect in the Zendesk 
ticket.

{{{ mkdir -p /usr/lib/hadoop/lib && cd /usr/lib/hadoop/lib && ln -s 
/usr/hdp/current/hadoop-client/lib/native native }}}

> PXF not locating Hadoop native libraries needed for Snappy
> --
>
> Key: HAWQ-951
> URL: https://issues.apache.org/jira/browse/HAWQ-951
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: PXF
>Affects Versions: 2.0.0.0-incubating
>Reporter: Pratheesh Nair
>Assignee: Goden Yao
> Fix For: backlog
>
>
> Hawq queries are failing when we try to read Snappy-compressed table from 
> hcatalog via external tables.
> After the following was performed on every PXF host and restarting, the issue 
> was resolved:
> {code}
> mkdir -p /usr/lib/hadoop/lib && cd /usr/lib/hadoop && ln -s 
> /usr/hdp/current/hadoop-client/lib/native native
> {code}
> Also, the default pxf-public-classpath should probably contain something like 
> the following line:
> {code}
> /usr/hdp/current/hadoop-client/lib/snappy*.jar
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HAWQ-951) PXF not locating Hadoop native libraries needed for Snappy

2016-07-26 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394720#comment-15394720
 ] 

Kyle R Dunn edited comment on HAWQ-951 at 7/26/16 10:58 PM:


The corrected symlink process is below, I had it incorrect in the Zendesk 
ticket.

{{{
mkdir -p /usr/lib/hadoop/lib && cd /usr/lib/hadoop/lib && ln -s 
/usr/hdp/current/hadoop-client/lib/native native
}}}


was (Author: kdunn926):
The corrected symlink process is below, I had it incorrect in the Zendesk 
ticket.


mkdir -p /usr/lib/hadoop/lib && cd /usr/lib/hadoop/lib && ln -s 
/usr/hdp/current/hadoop-client/lib/native native


> PXF not locating Hadoop native libraries needed for Snappy
> --
>
> Key: HAWQ-951
> URL: https://issues.apache.org/jira/browse/HAWQ-951
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: PXF
>Affects Versions: 2.0.0.0-incubating
>Reporter: Pratheesh Nair
>Assignee: Goden Yao
> Fix For: backlog
>
>
> Hawq queries are failing when we try to read Snappy-compressed table from 
> hcatalog via external tables.
> After the following was performed on every PXF host and restarting, the issue 
> was resolved:
> {code}
> mkdir -p /usr/lib/hadoop/lib && cd /usr/lib/hadoop && ln -s 
> /usr/hdp/current/hadoop-client/lib/native native
> {code}
> Also, the default pxf-public-classpath should probably contain something like 
> the following line:
> {code}
> /usr/hdp/current/hadoop-client/lib/snappy*.jar
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HAWQ-951) PXF not locating Hadoop native libraries needed for Snappy

2016-07-26 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394720#comment-15394720
 ] 

Kyle R Dunn edited comment on HAWQ-951 at 7/26/16 10:58 PM:


The corrected symlink process is below, I had it incorrect in the Zendesk 
ticket.

{{{ mkdir -p /usr/lib/hadoop/lib && cd /usr/lib/hadoop/lib && ln -s 
/usr/hdp/current/hadoop-client/lib/native native }}}


was (Author: kdunn926):
The corrected symlink process is below, I had it incorrect in the Zendesk 
ticket.

{{{
mkdir -p /usr/lib/hadoop/lib && cd /usr/lib/hadoop/lib && ln -s 
/usr/hdp/current/hadoop-client/lib/native native
}}}

> PXF not locating Hadoop native libraries needed for Snappy
> --
>
> Key: HAWQ-951
> URL: https://issues.apache.org/jira/browse/HAWQ-951
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: PXF
>Affects Versions: 2.0.0.0-incubating
>Reporter: Pratheesh Nair
>Assignee: Goden Yao
> Fix For: backlog
>
>
> Hawq queries are failing when we try to read Snappy-compressed table from 
> hcatalog via external tables.
> After the following was performed on every PXF host and restarting, the issue 
> was resolved:
> {code}
> mkdir -p /usr/lib/hadoop/lib && cd /usr/lib/hadoop && ln -s 
> /usr/hdp/current/hadoop-client/lib/native native
> {code}
> Also, the default pxf-public-classpath should probably contain something like 
> the following line:
> {code}
> /usr/hdp/current/hadoop-client/lib/snappy*.jar
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-951) PXF not locating Hadoop native libraries needed for Snappy

2016-07-26 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394720#comment-15394720
 ] 

Kyle R Dunn commented on HAWQ-951:
--

The corrected symlink process is below, I had it incorrect in the Zendesk 
ticket.

```
mkdir -p /usr/lib/hadoop/lib && cd /usr/lib/hadoop/lib && ln -s 
/usr/hdp/current/hadoop-client/lib/native native
```


> PXF not locating Hadoop native libraries needed for Snappy
> --
>
> Key: HAWQ-951
> URL: https://issues.apache.org/jira/browse/HAWQ-951
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: PXF
>Affects Versions: 2.0.0.0-incubating
>Reporter: Pratheesh Nair
>Assignee: Goden Yao
> Fix For: backlog
>
>
> Hawq queries are failing when we try to read Snappy-compressed table from 
> hcatalog via external tables.
> After the following was performed on every PXF host and restarting, the issue 
> was resolved:
> {code}
> mkdir -p /usr/lib/hadoop/lib && cd /usr/lib/hadoop && ln -s 
> /usr/hdp/current/hadoop-client/lib/native native
> {code}
> Also, the default pxf-public-classpath should probably contain something like 
> the following line:
> {code}
> /usr/hdp/current/hadoop-client/lib/snappy*.jar
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HAWQ-951) PXF not locating Hadoop native libraries needed for Snappy

2016-07-26 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394720#comment-15394720
 ] 

Kyle R Dunn edited comment on HAWQ-951 at 7/26/16 10:58 PM:


The corrected symlink process is below, I had it incorrect in the Zendesk 
ticket.


mkdir -p /usr/lib/hadoop/lib && cd /usr/lib/hadoop/lib && ln -s 
/usr/hdp/current/hadoop-client/lib/native native



was (Author: kdunn926):
The corrected symlink process is below, I had it incorrect in the Zendesk 
ticket.

```
mkdir -p /usr/lib/hadoop/lib && cd /usr/lib/hadoop/lib && ln -s 
/usr/hdp/current/hadoop-client/lib/native native
```


> PXF not locating Hadoop native libraries needed for Snappy
> --
>
> Key: HAWQ-951
> URL: https://issues.apache.org/jira/browse/HAWQ-951
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: PXF
>Affects Versions: 2.0.0.0-incubating
>Reporter: Pratheesh Nair
>Assignee: Goden Yao
> Fix For: backlog
>
>
> Hawq queries are failing when we try to read Snappy-compressed table from 
> hcatalog via external tables.
> After the following was performed on every PXF host and restarting, the issue 
> was resolved:
> {code}
> mkdir -p /usr/lib/hadoop/lib && cd /usr/lib/hadoop && ln -s 
> /usr/hdp/current/hadoop-client/lib/native native
> {code}
> Also, the default pxf-public-classpath should probably contain something like 
> the following line:
> {code}
> /usr/hdp/current/hadoop-client/lib/snappy*.jar
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-29) Refactor HAWQ InputFormat to support Spark/Scala

2016-07-26 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394576#comment-15394576
 ] 

Kyle R Dunn commented on HAWQ-29:
-

[~ronert], [~ronert_obst] - Can you see if this example Scala code works for 
you?

https://github.com/kdunn926/sparkHawq

> Refactor HAWQ InputFormat to support Spark/Scala
> 
>
> Key: HAWQ-29
> URL: https://issues.apache.org/jira/browse/HAWQ-29
> Project: Apache HAWQ
>  Issue Type: Wish
>  Components: Storage
>Reporter: Lirong Jian
>Assignee: Lirong Jian
>Priority: Minor
>  Labels: features
> Fix For: 2.0.1.0-incubating
>
>
> Currently the implementation of HAWQ InputFormat doesn't support Spark/Scala 
> very well. We need to refactor the code to support that feature. More 
> specifically, we need implement the serializable interface for some classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-29) Refactor HAWQ InputFormat to support Spark/Scala

2016-07-18 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383015#comment-15383015
 ] 

Kyle R Dunn commented on HAWQ-29:
-

I'm planning to spend some time looking at this. 

Here is the Spark API for Hadoop-based input formats:
http://spark.apache.org/docs/latest/api/java/org/apache/spark/api/java/JavaSparkContext.html#newAPIHadoopRDD%28org.apache.hadoop.conf.Configuration

And the Parquet Hadoop Input Format as a reference implementation:
https://github.com/Parquet/parquet-mr/blob/master/parquet-hadoop/src/main/java/parquet/hadoop/ParquetInputSplit.java


> Refactor HAWQ InputFormat to support Spark/Scala
> 
>
> Key: HAWQ-29
> URL: https://issues.apache.org/jira/browse/HAWQ-29
> Project: Apache HAWQ
>  Issue Type: Wish
>  Components: Storage
>Reporter: Lirong Jian
>Assignee: Lirong Jian
>Priority: Minor
>  Labels: features
> Fix For: 2.0.1.0-incubating
>
>
> Currently the implementation of HAWQ InputFormat doesn't support Spark/Scala 
> very well. We need to refactor the code to support that feature. More 
> specifically, we need implement the serializable interface for some classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-823) Amazon S3 External Table Support

2016-06-22 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345540#comment-15345540
 ] 

Kyle R Dunn commented on HAWQ-823:
--

I'm happy to investigate how to interface this with the FDW-based approaches if 
thats the direction things are headed for HAWQ. If nothing else I thought this 
might be a good, lightweight MVP for S3 until the FDW framework details are 
worked out. Maybe we can point to the isolated module testing the GPDB team has 
already done for this code base, since there were almost no source-level 
changes for this port and it uses the "same" PROTOCOL interface. 

I certainly won't be offended if this doesn't meet the bigger picture 
requirements for S3 functionality but it does seem to be an immediately useful 
"bolt on" through the existing `CREATE PROTOCOL` functionality in HAWQ, if 
we're open to relaxing the existing restrictions on that interface from the 
"userland" entry point (i.e. allowing the command outside of utility mode).

> Amazon S3 External Table Support
> 
>
> Key: HAWQ-823
> URL: https://issues.apache.org/jira/browse/HAWQ-823
> Project: Apache HAWQ
>  Issue Type: Wish
>  Components: External Tables
>Reporter: Kyle R Dunn
>Assignee: Lei Chang
>
> As a cloud user, I'd like to be able to create readable external tables with 
> data in Amazon S3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-823) Amazon S3 External Table Support

2016-06-16 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333981#comment-15333981
 ] 

Kyle R Dunn commented on HAWQ-823:
--

[~shivram] - This is a direct port from the GPDB implementation - I claim 
neither glory nor responsibility for design choices used, I simply targeted the 
same code against HAWQ rather than GPDB. There is a significant commentary 
about the design in the PR for that code here: 
https://github.com/greenplum-db/gpdb/pull/339



> Amazon S3 External Table Support
> 
>
> Key: HAWQ-823
> URL: https://issues.apache.org/jira/browse/HAWQ-823
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: External Tables
>Reporter: Kyle R Dunn
>Assignee: Lei Chang
>
> As a cloud user, I'd like to be able to create readable external tables with 
> data in Amazon S3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAWQ-823) Amazon S3 External Table Support

2016-06-15 Thread Kyle R Dunn (JIRA)
Kyle R Dunn created HAWQ-823:


 Summary: Amazon S3 External Table Support
 Key: HAWQ-823
 URL: https://issues.apache.org/jira/browse/HAWQ-823
 Project: Apache HAWQ
  Issue Type: Improvement
  Components: External Tables
Reporter: Kyle R Dunn
Assignee: Lei Chang


As a cloud user, I'd like to be able to create readable external tables with 
data in Amazon S3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAWQ-799) PostGIS Support

2016-06-09 Thread Kyle R Dunn (JIRA)
Kyle R Dunn created HAWQ-799:


 Summary: PostGIS Support
 Key: HAWQ-799
 URL: https://issues.apache.org/jira/browse/HAWQ-799
 Project: Apache HAWQ
  Issue Type: Wish
Reporter: Kyle R Dunn
Assignee: Lei Chang


Would be great to have some basic PostGIS support in HAWQ.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-614) Table with Segment Reject Limit fails to flush AO file when all data is rejected

2016-04-06 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229355#comment-15229355
 ] 

Kyle R Dunn commented on HAWQ-614:
--

Hi @hongwu - I have shared this JIRA with the customer - she will try to add 
the information you've requested. I was unable to replicate the issue by just 
creating test raw data with the same number of rows, creating non-conforming 
DDL for that data, and using gpfdist to insert it. 

> Table with Segment Reject Limit fails to flush AO file when all data is 
> rejected
> 
>
> Key: HAWQ-614
> URL: https://issues.apache.org/jira/browse/HAWQ-614
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: External Tables
>Reporter: Kyle R Dunn
>Assignee: hongwu
>Priority: Minor
> Attachments: image008.jpg
>
>
> An error message (attached) is received if *all* data gets rejected (for any 
> reason) when using segment reject limit option with an error table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-614) Table with Segment Reject Limit fails to flush AO file when all data is rejected

2016-03-31 Thread Kyle R Dunn (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle R Dunn updated HAWQ-614:
-
Attachment: image008.jpg

Informatica error message.

> Table with Segment Reject Limit fails to flush AO file when all data is 
> rejected
> 
>
> Key: HAWQ-614
> URL: https://issues.apache.org/jira/browse/HAWQ-614
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: External Tables
>Reporter: Kyle R Dunn
>Assignee: Lei Chang
> Attachments: image008.jpg
>
>
> An error message (attached) is received if *all* data gets rejected (for any 
> reason) when using segment reject limit option with an error table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-614) Table with Segment Reject Limit fails to flush AO file when all data is rejected

2016-03-31 Thread Kyle R Dunn (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle R Dunn updated HAWQ-614:
-
Priority: Minor  (was: Major)

> Table with Segment Reject Limit fails to flush AO file when all data is 
> rejected
> 
>
> Key: HAWQ-614
> URL: https://issues.apache.org/jira/browse/HAWQ-614
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: External Tables
>Reporter: Kyle R Dunn
>Assignee: Lei Chang
>Priority: Minor
> Attachments: image008.jpg
>
>
> An error message (attached) is received if *all* data gets rejected (for any 
> reason) when using segment reject limit option with an error table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAWQ-614) Table with Segment Reject Limit fails to flush AO file when all data is rejected

2016-03-31 Thread Kyle R Dunn (JIRA)
Kyle R Dunn created HAWQ-614:


 Summary: Table with Segment Reject Limit fails to flush AO file 
when all data is rejected
 Key: HAWQ-614
 URL: https://issues.apache.org/jira/browse/HAWQ-614
 Project: Apache HAWQ
  Issue Type: Bug
  Components: External Tables
Reporter: Kyle R Dunn
Assignee: Lei Chang


An error message (attached) is received if *all* data gets rejected (for any 
reason) when using segment reject limit option with an error table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-399) Enable Clang/LLVM support for compiling SSE42 bits in crc32c.c

2016-02-04 Thread Kyle R Dunn (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle R Dunn updated HAWQ-399:
-
Description: Currently the crc32c.c file contains GCC-specific `#pragma GCC 
target ("sse4.2")` - the Clang/LLVM equivalent is to add 
`__attribute__((target("sse4.2")))` above the function. This change allows HAWQ 
to be compiled with the Clang/LLVM toolchain in addition to GCC.  (was: 
Currently the crc32c.c file contains GCC-specific ```#pragma GCC target 
("sse4.2")``` - the Clang/LLVM equivalent is to add 
```__attribute__((target("sse4.2")))``` above the function. This change allows 
HAWQ to be compiled with the Clang/LLVM toolchain in addition to GCC.)

> Enable Clang/LLVM support for compiling SSE42 bits in crc32c.c
> --
>
> Key: HAWQ-399
> URL: https://issues.apache.org/jira/browse/HAWQ-399
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Build
>Reporter: Kyle R Dunn
>Assignee: Lei Chang
>
> Currently the crc32c.c file contains GCC-specific `#pragma GCC target 
> ("sse4.2")` - the Clang/LLVM equivalent is to add 
> `__attribute__((target("sse4.2")))` above the function. This change allows 
> HAWQ to be compiled with the Clang/LLVM toolchain in addition to GCC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAWQ-399) Enable Clang/LLVM support for compiling SSE42 bits in crc32c.c

2016-02-04 Thread Kyle R Dunn (JIRA)
Kyle R Dunn created HAWQ-399:


 Summary: Enable Clang/LLVM support for compiling SSE42 bits in 
crc32c.c
 Key: HAWQ-399
 URL: https://issues.apache.org/jira/browse/HAWQ-399
 Project: Apache HAWQ
  Issue Type: Improvement
  Components: Build
Reporter: Kyle R Dunn
Assignee: Lei Chang


Currently the crc32c.c file contains GCC-specific ```#pragma GCC target 
("sse4.2")``` - the Clang/LLVM equivalent is to add 
```__attribute__((target("sse4.2")))``` above the function. This change allows 
HAWQ to be compiled with the Clang/LLVM toolchain in addition to GCC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-399) Enable Clang/LLVM support for compiling SSE42 bits in crc32c.c

2016-02-04 Thread Kyle R Dunn (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle R Dunn updated HAWQ-399:
-
Description: Currently the crc32c.c file contains GCC-specific 
{code}#pragma GCC target ("sse4.2"){code} - the Clang/LLVM equivalent is to add 
{code}__attribute__((target("sse4.2"))){code} above the function. This change 
allows HAWQ to be compiled with the Clang/LLVM toolchain in addition to GCC.  
(was: Currently the crc32c.c file contains GCC-specific `#pragma GCC target 
("sse4.2")` - the Clang/LLVM equivalent is to add 
`__attribute__((target("sse4.2")))` above the function. This change allows HAWQ 
to be compiled with the Clang/LLVM toolchain in addition to GCC.)

> Enable Clang/LLVM support for compiling SSE42 bits in crc32c.c
> --
>
> Key: HAWQ-399
> URL: https://issues.apache.org/jira/browse/HAWQ-399
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Build
>Reporter: Kyle R Dunn
>Assignee: Lei Chang
>
> Currently the crc32c.c file contains GCC-specific {code}#pragma GCC target 
> ("sse4.2"){code} - the Clang/LLVM equivalent is to add 
> {code}__attribute__((target("sse4.2"))){code} above the function. This change 
> allows HAWQ to be compiled with the Clang/LLVM toolchain in addition to GCC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-399) Enable Clang/LLVM support for compiling SSE42 bits in crc32c.c

2016-02-04 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15132579#comment-15132579
 ] 

Kyle R Dunn commented on HAWQ-399:
--

There are a couple #ifdef bits and Clang/LLVM version requirements, will submit 
a PR for the exact changes.

> Enable Clang/LLVM support for compiling SSE42 bits in crc32c.c
> --
>
> Key: HAWQ-399
> URL: https://issues.apache.org/jira/browse/HAWQ-399
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Build
>Reporter: Kyle R Dunn
>Assignee: Lei Chang
>Priority: Trivial
>
> Currently the crc32c.c file contains GCC-specific {code}#pragma GCC target 
> ("sse4.2"){code} - the Clang/LLVM equivalent is to add 
> {code}__attribute__((target("sse4.2"))){code} above the function. This change 
> allows HAWQ to be compiled with the Clang/LLVM toolchain in addition to GCC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HAWQ-399) Enable Clang/LLVM support for compiling SSE42 bits in crc32c.c

2016-02-04 Thread Kyle R Dunn (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15132579#comment-15132579
 ] 

Kyle R Dunn edited comment on HAWQ-399 at 2/4/16 4:58 PM:
--

There are a couple #ifdef bits and Clang/LLVM version requirements to make this 
work as well, will submit a PR for the exact changes.


was (Author: kdunn926):
There are a couple #ifdef bits and Clang/LLVM version requirements, will submit 
a PR for the exact changes.

> Enable Clang/LLVM support for compiling SSE42 bits in crc32c.c
> --
>
> Key: HAWQ-399
> URL: https://issues.apache.org/jira/browse/HAWQ-399
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Build
>Reporter: Kyle R Dunn
>Assignee: Lei Chang
>Priority: Trivial
>
> Currently the crc32c.c file contains GCC-specific {code}#pragma GCC target 
> ("sse4.2"){code} - the Clang/LLVM equivalent is to add 
> {code}__attribute__((target("sse4.2"))){code} above the function. This change 
> allows HAWQ to be compiled with the Clang/LLVM toolchain in addition to GCC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HAWQ-205) PXF Throws NullPointerException when DELIMITER missing in Hive table definition

2015-12-02 Thread Kyle R Dunn (JIRA)
Kyle R Dunn created HAWQ-205:


 Summary: PXF Throws NullPointerException when DELIMITER missing in 
Hive table definition
 Key: HAWQ-205
 URL: https://issues.apache.org/jira/browse/HAWQ-205
 Project: Apache HAWQ
  Issue Type: Bug
  Components: PXF
Reporter: Kyle R Dunn
Assignee: Goden Yao


HiveText and HiveRC profiles both require defining a DELIMITER as part of the 
LOCATION. The Delimiter should be the same defined in the table itself.
E.g. for delimiter ','
CREATE EXTERNAL TABLE hivetext (field1 ... ) LOCATION('pxf:///?profile=HiveText=,') format 'text' 
(delimiter=',');

In the absence of DELIMITER, a NullPoitnerException is thrown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)