subject:"\[GitHub\] incubator\-hawq\-docs pull request #17\: Updates for hawq register"

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-hawq-docs/pull/17


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread janebeckman

Github user janebeckman commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81426980
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \ is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
--- End diff --

Checked some statements from code checkins, and it's AO/Parquet tables.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81397690
  
--- Diff: reference/cli/admin_utilities/hawqregister.html.md.erb ---
@@ -187,5 +168,84 @@ group {
 | varchar  | varchar  |
 
 
+## Options
+
+**General Options**
+
+-? (show help)   
+Show help, then exit.
+
+-\\\-version   
+Show the version of this utility, then exit.
+
+
+**Connection Options**
+
+-h , -\\\-host \ 
+Specifies the host name of the machine on which the HAWQ master 
database server is running. If not specified, reads from the environment 
variable `$PGHOST` or defaults to `localhost`.
+
+ -p , -\\\-port \  
+Specifies the TCP port on which the HAWQ master database server is 
listening for connections. If not specified, reads from the environment 
variable `$PGPORT` or defaults to 5432.
+
+-U , -\\\-user \  
+The database role name to connect as. If not specified, reads from the 
environment variable `$PGUSER` or defaults to the current system user name.
+
+-d  , -\\\-database \  
+The database to register the Parquet HDFS data into. The default is 
`postgres`
+
+-f , -\\\-filepath \
+The path of the file or directory in HDFS containing the files to be 
registered.
+ 
+\ 
+The HAWQ table that will store the data to be registered. If the 
--config option is not supplied, the table cannot use hash distribution. Random 
table distribution is strongly preferred. If hash distribution must be used, 
make sure that the distribution policy for the data files described in the YAML 
file is consistent with the table being registered into.
+
+Miscellaneous Options
+
+The following options are used with specific use models.
+
+-e , -\\\-eof \
+Specify the end of the file to be registered. \ represents the 
valid content length of the file, in bytes to be used, a value between 0 the 
actual size of the file. If this option is not included, the actual file size, 
or size of files within a folder, is used. Used with Use Model 1.
+
+-F , -\\\-force
+Used for disaster recovery of a cluster. Clears all HDFS-related 
catalog contents in `pg_aoseg.pg_paqseg_$relid `and re-registers files to a 
specified table. The HDFS files are not removed or modified. To use this option 
for recovery, data is assumed to be periodically imported to the cluster to be 
recovered. Used with Use Model 2.
+
+-c , -\\\-config \  
+Registers files specified by YAML-format configuration files into 
HAWQ. Used with Use Model 2.
--- End diff --

Change Use -> Usage


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81386819
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \ is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
+
+-   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client configured and the hdfs filepath specified.
+-   The files to be registered and the HAWQ table must be located in the 
same HDFS cluster.
+-   The target table DDL is configured with the correct data type mapping.
+
+##Registering Externally Generated HDFS File Data to an Existing Table
+
+Files or folders in HDFS can be registered into an existing table, 
allowing them to be managed as a HAWQ internal table. When registering files, 
you can optionally specify the maximum amount of data to be loaded, in bytes, 
using the `--eof` option. If registering a folder, the actual file sizes are 
used. 
+
+Only HAWQ or Hive-generated Parquet tables are supported. Partitioned 
tables are not supported. Attempting to register these tables will result in an 
error.
+
+Metadata for the Parquet file(s) and the destination table must be 
consistent. Different  data types are used by HAWQ tables and Parquet files, so 
data must be mapped. You must verify that the structure of the parquet files 
and the HAWQ table are compatible before running `hawq register`. 
+
+We recommand creating a copy of the Parquet file to be registered before 
running ```hawq register```
+You can then then run ```hawq register``` on the copy,  leaving the 
original file available for additional Hive queries or if a data mapping error 
is encountered.
+
+###Limitations for Registering Hive Tables to HAWQ
+The currently-supported data types for generating Hive tables into HAWQ 
tables are: boolean, int, smallint, tinyint, bigint, float, double, string, 
binary, char, and varchar.  
+
+The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
+
+###Example: Registering a Hive-Generated Parquet File
+
+This example shows how to register a HIVE-generated parquet file in HDFS 
into the table `parquet_table` in HAWQ, which is in the database named 
`postgres`. The file path of the HIVE-generated file is 
`hdfs://localhost:8020/temp/hive.paq`.
+
+In this example, the location of the database is 
`hdfs://localhost:8020/hawq_default`, the tablespace id is 16385, the database 
id is 16387, the table filenode id is 77160, and the last file under the 
filenode is numbered 7.
+
+Enter:
+
+``` pre
+$ hawq register -d postgres -f hdfs://localhost:8020/temp/hive.paq 
parquet_table
+```
+
+After running the `hawq register` command for the file location  
`hdfs://localhost:8020/temp/hive.paq`, the corresponding new location of the 
file in HDFS is:  `hdfs://localhost:8020/hawq_default/16385/16387/77160/8`. 
+
+The command then updates the metadata of the table `parquet_table` in 
HAWQ, which is contained in the table `pg_aoseg.pg_paqseg_77160`. The pg\_aoseg 
is a fixed schema for row-oriented and Parquet AO tables. For row-oriented 
tables, the table name prefix is pg\_aoseg. The table name prefix for parquet 
tables is pg\_paqseg. 77160 is the relation id of the table.
+
+To locate the table, either find the relation ID by looking up the catalog 
table pg\_class in SQL by running 
+
+```
+select oid from pg_class where relname=$relname
+```
+or find the table name by using the SQL command 
+```
+select segrelid from pg_appendonly where relid = $relid
+```
+then running 
+```
+select relname from pg_class where oid = segrelid
+```
+
+##Registering Data Using Information from a YAML Configuration File
+ 
+The `hawq register` command can register HDFS files

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81385547
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \ is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
+
+-   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client configured and the hdfs filepath specified.
+-   The files to be registered and the HAWQ table must be located in the 
same HDFS cluster.
+-   The target table DDL is configured with the correct data type mapping.
+
+##Registering Externally Generated HDFS File Data to an Existing Table
+
+Files or folders in HDFS can be registered into an existing table, 
allowing them to be managed as a HAWQ internal table. When registering files, 
you can optionally specify the maximum amount of data to be loaded, in bytes, 
using the `--eof` option. If registering a folder, the actual file sizes are 
used. 
+
+Only HAWQ or Hive-generated Parquet tables are supported. Partitioned 
tables are not supported. Attempting to register these tables will result in an 
error.
+
+Metadata for the Parquet file(s) and the destination table must be 
consistent. Different  data types are used by HAWQ tables and Parquet files, so 
data must be mapped. You must verify that the structure of the parquet files 
and the HAWQ table are compatible before running `hawq register`. 
+
+We recommand creating a copy of the Parquet file to be registered before 
running ```hawq register```
+You can then then run ```hawq register``` on the copy,  leaving the 
original file available for additional Hive queries or if a data mapping error 
is encountered.
+
+###Limitations for Registering Hive Tables to HAWQ
+The currently-supported data types for generating Hive tables into HAWQ 
tables are: boolean, int, smallint, tinyint, bigint, float, double, string, 
binary, char, and varchar.  
+
+The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
+
+###Example: Registering a Hive-Generated Parquet File
+
+This example shows how to register a HIVE-generated parquet file in HDFS 
into the table `parquet_table` in HAWQ, which is in the database named 
`postgres`. The file path of the HIVE-generated file is 
`hdfs://localhost:8020/temp/hive.paq`.
+
+In this example, the location of the database is 
`hdfs://localhost:8020/hawq_default`, the tablespace id is 16385, the database 
id is 16387, the table filenode id is 77160, and the last file under the 
filenode is numbered 7.
+
+Enter:
+
+``` pre
+$ hawq register -d postgres -f hdfs://localhost:8020/temp/hive.paq 
parquet_table
+```
+
+After running the `hawq register` command for the file location  
`hdfs://localhost:8020/temp/hive.paq`, the corresponding new location of the 
file in HDFS is:  `hdfs://localhost:8020/hawq_default/16385/16387/77160/8`. 
+
+The command then updates the metadata of the table `parquet_table` in 
HAWQ, which is contained in the table `pg_aoseg.pg_paqseg_77160`. The pg\_aoseg 
is a fixed schema for row-oriented and Parquet AO tables. For row-oriented 
tables, the table name prefix is pg\_aoseg. The table name prefix for parquet 
tables is pg\_paqseg. 77160 is the relation id of the table.
+
+To locate the table, either find the relation ID by looking up the catalog 
table pg\_class in SQL by running 
+
+```
+select oid from pg_class where relname=$relname
+```
+or find the table name by using the SQL command 
+```
+select segrelid from pg_appendonly where relid = $relid
+```
+then running 
+```
+select relname from pg_class where oid = segrelid
+```
+
+##Registering Data Using Information from a YAML Configuration File
+ 
+The `hawq register` command can register HDFS files

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81391492
  
--- Diff: reference/cli/admin_utilities/hawqregister.html.md.erb ---
@@ -2,102 +2,83 @@
 title: hawq register
 ---
 
-Loads and registers external parquet-formatted data in HDFS into a 
corresponding table in HAWQ.
+Loads and registers 
+AO or Parquet-formatted data in HDFS into a corresponding table in HAWQ.
 
 ## Synopsis
 
 ``` pre
-hawq register
+Usage 1:
+hawq register [] [-f ] [-e ] 

+
+Usage 2:
+hawq register [] [-c ][--force] 

+
+Connection Options:
  [-h ] 
  [-p ] 
  [-U ] 
  [-d ]
- [-t ] 
+ 
+Misc. Options:
  [-f ] 
+[-e ]
+[--force] 
  [-c ]  
 hawq register help | -? 
 hawq register --version
 ```
 
 ## Prerequisites
 
-The client machine where `hawq register` is executed must have the 
following:
+The client machine where `hawq register` is executed must meet the 
following conditions:
 
 -   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
--- End diff --

See previous comments about this list.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81397353
  
--- Diff: reference/cli/admin_utilities/hawqregister.html.md.erb ---
@@ -2,102 +2,83 @@
 title: hawq register
 ---
 
-Loads and registers external parquet-formatted data in HDFS into a 
corresponding table in HAWQ.
+Loads and registers 
+AO or Parquet-formatted data in HDFS into a corresponding table in HAWQ.
 
 ## Synopsis
 
 ``` pre
-hawq register
+Usage 1:
+hawq register [] [-f ] [-e ] 

+
+Usage 2:
+hawq register [] [-c ][--force] 

+
+Connection Options:
  [-h ] 
  [-p ] 
  [-U ] 
  [-d ]
- [-t ] 
+ 
+Misc. Options:
  [-f ] 
+[-e ]
+[--force] 
  [-c ]  
 hawq register help | -? 
 hawq register --version
 ```
 
 ## Prerequisites
 
-The client machine where `hawq register` is executed must have the 
following:
+The client machine where `hawq register` is executed must meet the 
following conditions:
 
 -   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client must be configured and the hdfs filepath specified.
 -   The files to be registered and the HAWQ table located in the same HDFS 
cluster.
 -   The target table DDL is configured with the correct data type mapping.
 
 ## Description
 
-`hawq register` is a utility that loads and registers existing or external 
parquet data in HDFS into HAWQ, so that it can be directly ingested and 
accessed through HAWQ. Parquet data from the file or directory in the specified 
path is loaded into the appropriate HAWQ table directory in HDFS and the 
utility updates the corresponding HAWQ metadata for the files. 
+`hawq register` is a utility that loads and registers existing data files 
or folders in HDFS into HAWQ internal tables, allowing HAWQ to directly read 
the data and use internal table processing for operations such as transactions 
and high perforance, without needing to load or copy it. Data from the file or 
directory specified by \ is loaded into the appropriate HAWQ 
table directory in HDFS and the utility updates the corresponding HAWQ metadata 
for the files. 
 
-Only parquet tables can be loaded using the `hawq register` command. 
Metadata for the parquet file(s) and the destination table must be consistent. 
Different  data types are used by HAWQ tables and parquet tables, so the data 
is mapped. You must verify that the structure of the parquet files and the HAWQ 
table are compatible before running `hawq register`. 
+You can use `hawq register` to:
 
-Note: only HAWQ or HIVE-generated parquet tables are currently supported.
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster.
 
-###Limitations for Registering Hive Tables to HAWQ
-The currently-supported data types for generating Hive tables into HAWQ 
tables are: boolean, int, smallint, tinyint, bigint, float, double, string, 
binary, char, and varchar.  
+Two usage models are available.
 
-The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
+###Usage Model 1: register file data to an existing table.
 
+`hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-f 
filepath] [-e eof]`
 
-## Options
-
-**General Options**
-
--? (show help)   
-Show help, then exit.
-
--\\\-version   
-Show the version of this utility, then exit.
-
-
-**Connection Options**
-
--h \ 
-Specifies the host name of the machine on which the HAWQ master 
database server is running. If not specified, reads from the environment 
variable `$PGHOST` or defaults to `localhost`.
-
- -p \  
-Specifies the TCP port on which the HAWQ master database server is 
listening for connections. If not specified, reads from the environment 
variable `$PGPORT` or defaults to 5432.
+Metadata for the Parquet file(s) and the destination table must be 
consistent. Different  data types are used by HAWQ tables and Parquet files, so 
the data is mapped. Refer to the section [Data Type 
Mapping](hawqregister.html#topic1__section7) below. You must verify that the 
structure of the Parquet files and the HAWQ table are compatible before running 
`hawq register`. 
 
--U \  
-The database role name to connect as. If not specified, reads from the 
environment variable `$PGUSER` or defaults to the current system user name.
+Limitations
+Only HAWQ or Hive-generated Parquet

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81384844
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \ is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
+
+-   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client configured and the hdfs filepath specified.
+-   The files to be registered and the HAWQ table must be located in the 
same HDFS cluster.
+-   The target table DDL is configured with the correct data type mapping.
+
+##Registering Externally Generated HDFS File Data to an Existing Table
+
+Files or folders in HDFS can be registered into an existing table, 
allowing them to be managed as a HAWQ internal table. When registering files, 
you can optionally specify the maximum amount of data to be loaded, in bytes, 
using the `--eof` option. If registering a folder, the actual file sizes are 
used. 
+
+Only HAWQ or Hive-generated Parquet tables are supported. Partitioned 
tables are not supported. Attempting to register these tables will result in an 
error.
+
+Metadata for the Parquet file(s) and the destination table must be 
consistent. Different  data types are used by HAWQ tables and Parquet files, so 
data must be mapped. You must verify that the structure of the parquet files 
and the HAWQ table are compatible before running `hawq register`. 
+
+We recommand creating a copy of the Parquet file to be registered before 
running ```hawq register```
+You can then then run ```hawq register``` on the copy,  leaving the 
original file available for additional Hive queries or if a data mapping error 
is encountered.
+
+###Limitations for Registering Hive Tables to HAWQ
+The currently-supported data types for generating Hive tables into HAWQ 
tables are: boolean, int, smallint, tinyint, bigint, float, double, string, 
binary, char, and varchar.  
+
+The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
+
+###Example: Registering a Hive-Generated Parquet File
+
+This example shows how to register a HIVE-generated parquet file in HDFS 
into the table `parquet_table` in HAWQ, which is in the database named 
`postgres`. The file path of the HIVE-generated file is 
`hdfs://localhost:8020/temp/hive.paq`.
+
+In this example, the location of the database is 
`hdfs://localhost:8020/hawq_default`, the tablespace id is 16385, the database 
id is 16387, the table filenode id is 77160, and the last file under the 
filenode is numbered 7.
+
+Enter:
+
+``` pre
+$ hawq register -d postgres -f hdfs://localhost:8020/temp/hive.paq 
parquet_table
+```
+
+After running the `hawq register` command for the file location  
`hdfs://localhost:8020/temp/hive.paq`, the corresponding new location of the 
file in HDFS is:  `hdfs://localhost:8020/hawq_default/16385/16387/77160/8`. 
+
+The command then updates the metadata of the table `parquet_table` in 
HAWQ, which is contained in the table `pg_aoseg.pg_paqseg_77160`. The pg\_aoseg 
is a fixed schema for row-oriented and Parquet AO tables. For row-oriented 
tables, the table name prefix is pg\_aoseg. The table name prefix for parquet 
tables is pg\_paqseg. 77160 is the relation id of the table.
+
+To locate the table, either find the relation ID by looking up the catalog 
table pg\_class in SQL by running 
+
+```
+select oid from pg_class where relname=$relname
+```
+or find the table name by using the SQL command 
+```
+select segrelid from pg_appendonly where relid = $relid
+```
+then running 
+```
+select relname from pg_class where oid = segrelid
+```
+
+##Registering Data Using Information from a YAML Configuration File
+ 
+The `hawq register` command can register HDFS files

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81391277
  
--- Diff: reference/cli/admin_utilities/hawqregister.html.md.erb ---
@@ -2,102 +2,83 @@
 title: hawq register
 ---
 
-Loads and registers external parquet-formatted data in HDFS into a 
corresponding table in HAWQ.
+Loads and registers 
+AO or Parquet-formatted data in HDFS into a corresponding table in HAWQ.
--- End diff --

I still think this needs to say something other than AO formatted data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81384280
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \ is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
+
+-   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client configured and the hdfs filepath specified.
+-   The files to be registered and the HAWQ table must be located in the 
same HDFS cluster.
+-   The target table DDL is configured with the correct data type mapping.
+
+##Registering Externally Generated HDFS File Data to an Existing Table
+
+Files or folders in HDFS can be registered into an existing table, 
allowing them to be managed as a HAWQ internal table. When registering files, 
you can optionally specify the maximum amount of data to be loaded, in bytes, 
using the `--eof` option. If registering a folder, the actual file sizes are 
used. 
+
+Only HAWQ or Hive-generated Parquet tables are supported. Partitioned 
tables are not supported. Attempting to register these tables will result in an 
error.
+
+Metadata for the Parquet file(s) and the destination table must be 
consistent. Different  data types are used by HAWQ tables and Parquet files, so 
data must be mapped. You must verify that the structure of the parquet files 
and the HAWQ table are compatible before running `hawq register`. 
+
+We recommand creating a copy of the Parquet file to be registered before 
running ```hawq register```
+You can then then run ```hawq register``` on the copy,  leaving the 
original file available for additional Hive queries or if a data mapping error 
is encountered.
+
+###Limitations for Registering Hive Tables to HAWQ
+The currently-supported data types for generating Hive tables into HAWQ 
tables are: boolean, int, smallint, tinyint, bigint, float, double, string, 
binary, char, and varchar.  
+
+The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
+
+###Example: Registering a Hive-Generated Parquet File
+
+This example shows how to register a HIVE-generated parquet file in HDFS 
into the table `parquet_table` in HAWQ, which is in the database named 
`postgres`. The file path of the HIVE-generated file is 
`hdfs://localhost:8020/temp/hive.paq`.
+
+In this example, the location of the database is 
`hdfs://localhost:8020/hawq_default`, the tablespace id is 16385, the database 
id is 16387, the table filenode id is 77160, and the last file under the 
filenode is numbered 7.
+
+Enter:
+
+``` pre
+$ hawq register -d postgres -f hdfs://localhost:8020/temp/hive.paq 
parquet_table
+```
+
+After running the `hawq register` command for the file location  
`hdfs://localhost:8020/temp/hive.paq`, the corresponding new location of the 
file in HDFS is:  `hdfs://localhost:8020/hawq_default/16385/16387/77160/8`. 
+
+The command then updates the metadata of the table `parquet_table` in 
HAWQ, which is contained in the table `pg_aoseg.pg_paqseg_77160`. The pg\_aoseg 
is a fixed schema for row-oriented and Parquet AO tables. For row-oriented 
tables, the table name prefix is pg\_aoseg. The table name prefix for parquet 
tables is pg\_paqseg. 77160 is the relation id of the table.
--- End diff --

Change "The pg\_aoseg is" to "The pg\_aoseg table is"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81386257
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \ is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
+
+-   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client configured and the hdfs filepath specified.
+-   The files to be registered and the HAWQ table must be located in the 
same HDFS cluster.
+-   The target table DDL is configured with the correct data type mapping.
+
+##Registering Externally Generated HDFS File Data to an Existing Table
+
+Files or folders in HDFS can be registered into an existing table, 
allowing them to be managed as a HAWQ internal table. When registering files, 
you can optionally specify the maximum amount of data to be loaded, in bytes, 
using the `--eof` option. If registering a folder, the actual file sizes are 
used. 
+
+Only HAWQ or Hive-generated Parquet tables are supported. Partitioned 
tables are not supported. Attempting to register these tables will result in an 
error.
+
+Metadata for the Parquet file(s) and the destination table must be 
consistent. Different  data types are used by HAWQ tables and Parquet files, so 
data must be mapped. You must verify that the structure of the parquet files 
and the HAWQ table are compatible before running `hawq register`. 
+
+We recommand creating a copy of the Parquet file to be registered before 
running ```hawq register```
+You can then then run ```hawq register``` on the copy,  leaving the 
original file available for additional Hive queries or if a data mapping error 
is encountered.
+
+###Limitations for Registering Hive Tables to HAWQ
+The currently-supported data types for generating Hive tables into HAWQ 
tables are: boolean, int, smallint, tinyint, bigint, float, double, string, 
binary, char, and varchar.  
+
+The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
+
+###Example: Registering a Hive-Generated Parquet File
+
+This example shows how to register a HIVE-generated parquet file in HDFS 
into the table `parquet_table` in HAWQ, which is in the database named 
`postgres`. The file path of the HIVE-generated file is 
`hdfs://localhost:8020/temp/hive.paq`.
+
+In this example, the location of the database is 
`hdfs://localhost:8020/hawq_default`, the tablespace id is 16385, the database 
id is 16387, the table filenode id is 77160, and the last file under the 
filenode is numbered 7.
+
+Enter:
+
+``` pre
+$ hawq register -d postgres -f hdfs://localhost:8020/temp/hive.paq 
parquet_table
+```
+
+After running the `hawq register` command for the file location  
`hdfs://localhost:8020/temp/hive.paq`, the corresponding new location of the 
file in HDFS is:  `hdfs://localhost:8020/hawq_default/16385/16387/77160/8`. 
+
+The command then updates the metadata of the table `parquet_table` in 
HAWQ, which is contained in the table `pg_aoseg.pg_paqseg_77160`. The pg\_aoseg 
is a fixed schema for row-oriented and Parquet AO tables. For row-oriented 
tables, the table name prefix is pg\_aoseg. The table name prefix for parquet 
tables is pg\_paqseg. 77160 is the relation id of the table.
+
+To locate the table, either find the relation ID by looking up the catalog 
table pg\_class in SQL by running 
+
+```
+select oid from pg_class where relname=$relname
+```
+or find the table name by using the SQL command 
+```
+select segrelid from pg_appendonly where relid = $relid
+```
+then running 
+```
+select relname from pg_class where oid = segrelid
+```
+
+##Registering Data Using Information from a YAML Configuration File
+ 
+The `hawq register` command can register HDFS files

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81390790
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \ is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
+
+-   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client configured and the hdfs filepath specified.
+-   The files to be registered and the HAWQ table must be located in the 
same HDFS cluster.
+-   The target table DDL is configured with the correct data type mapping.
+
+##Registering Externally Generated HDFS File Data to an Existing Table
+
+Files or folders in HDFS can be registered into an existing table, 
allowing them to be managed as a HAWQ internal table. When registering files, 
you can optionally specify the maximum amount of data to be loaded, in bytes, 
using the `--eof` option. If registering a folder, the actual file sizes are 
used. 
+
+Only HAWQ or Hive-generated Parquet tables are supported. Partitioned 
tables are not supported. Attempting to register these tables will result in an 
error.
+
+Metadata for the Parquet file(s) and the destination table must be 
consistent. Different  data types are used by HAWQ tables and Parquet files, so 
data must be mapped. You must verify that the structure of the parquet files 
and the HAWQ table are compatible before running `hawq register`. 
+
+We recommand creating a copy of the Parquet file to be registered before 
running ```hawq register```
+You can then then run ```hawq register``` on the copy,  leaving the 
original file available for additional Hive queries or if a data mapping error 
is encountered.
+
+###Limitations for Registering Hive Tables to HAWQ
+The currently-supported data types for generating Hive tables into HAWQ 
tables are: boolean, int, smallint, tinyint, bigint, float, double, string, 
binary, char, and varchar.  
+
+The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
+
+###Example: Registering a Hive-Generated Parquet File
+
+This example shows how to register a HIVE-generated parquet file in HDFS 
into the table `parquet_table` in HAWQ, which is in the database named 
`postgres`. The file path of the HIVE-generated file is 
`hdfs://localhost:8020/temp/hive.paq`.
+
+In this example, the location of the database is 
`hdfs://localhost:8020/hawq_default`, the tablespace id is 16385, the database 
id is 16387, the table filenode id is 77160, and the last file under the 
filenode is numbered 7.
+
+Enter:
+
+``` pre
+$ hawq register -d postgres -f hdfs://localhost:8020/temp/hive.paq 
parquet_table
+```
+
+After running the `hawq register` command for the file location  
`hdfs://localhost:8020/temp/hive.paq`, the corresponding new location of the 
file in HDFS is:  `hdfs://localhost:8020/hawq_default/16385/16387/77160/8`. 
+
+The command then updates the metadata of the table `parquet_table` in 
HAWQ, which is contained in the table `pg_aoseg.pg_paqseg_77160`. The pg\_aoseg 
is a fixed schema for row-oriented and Parquet AO tables. For row-oriented 
tables, the table name prefix is pg\_aoseg. The table name prefix for parquet 
tables is pg\_paqseg. 77160 is the relation id of the table.
+
+To locate the table, either find the relation ID by looking up the catalog 
table pg\_class in SQL by running 
+
+```
+select oid from pg_class where relname=$relname
+```
+or find the table name by using the SQL command 
+```
+select segrelid from pg_appendonly where relid = $relid
+```
+then running 
+```
+select relname from pg_class where oid = segrelid
+```
+
+##Registering Data Using Information from a YAML Configuration File
+ 
+The `hawq register` command can register HDFS files

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81386025
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \ is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
+
+-   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client configured and the hdfs filepath specified.
+-   The files to be registered and the HAWQ table must be located in the 
same HDFS cluster.
+-   The target table DDL is configured with the correct data type mapping.
+
+##Registering Externally Generated HDFS File Data to an Existing Table
+
+Files or folders in HDFS can be registered into an existing table, 
allowing them to be managed as a HAWQ internal table. When registering files, 
you can optionally specify the maximum amount of data to be loaded, in bytes, 
using the `--eof` option. If registering a folder, the actual file sizes are 
used. 
+
+Only HAWQ or Hive-generated Parquet tables are supported. Partitioned 
tables are not supported. Attempting to register these tables will result in an 
error.
+
+Metadata for the Parquet file(s) and the destination table must be 
consistent. Different  data types are used by HAWQ tables and Parquet files, so 
data must be mapped. You must verify that the structure of the parquet files 
and the HAWQ table are compatible before running `hawq register`. 
+
+We recommand creating a copy of the Parquet file to be registered before 
running ```hawq register```
+You can then then run ```hawq register``` on the copy,  leaving the 
original file available for additional Hive queries or if a data mapping error 
is encountered.
+
+###Limitations for Registering Hive Tables to HAWQ
+The currently-supported data types for generating Hive tables into HAWQ 
tables are: boolean, int, smallint, tinyint, bigint, float, double, string, 
binary, char, and varchar.  
+
+The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
+
+###Example: Registering a Hive-Generated Parquet File
+
+This example shows how to register a HIVE-generated parquet file in HDFS 
into the table `parquet_table` in HAWQ, which is in the database named 
`postgres`. The file path of the HIVE-generated file is 
`hdfs://localhost:8020/temp/hive.paq`.
+
+In this example, the location of the database is 
`hdfs://localhost:8020/hawq_default`, the tablespace id is 16385, the database 
id is 16387, the table filenode id is 77160, and the last file under the 
filenode is numbered 7.
+
+Enter:
+
+``` pre
+$ hawq register -d postgres -f hdfs://localhost:8020/temp/hive.paq 
parquet_table
+```
+
+After running the `hawq register` command for the file location  
`hdfs://localhost:8020/temp/hive.paq`, the corresponding new location of the 
file in HDFS is:  `hdfs://localhost:8020/hawq_default/16385/16387/77160/8`. 
+
+The command then updates the metadata of the table `parquet_table` in 
HAWQ, which is contained in the table `pg_aoseg.pg_paqseg_77160`. The pg\_aoseg 
is a fixed schema for row-oriented and Parquet AO tables. For row-oriented 
tables, the table name prefix is pg\_aoseg. The table name prefix for parquet 
tables is pg\_paqseg. 77160 is the relation id of the table.
+
+To locate the table, either find the relation ID by looking up the catalog 
table pg\_class in SQL by running 
+
+```
+select oid from pg_class where relname=$relname
+```
+or find the table name by using the SQL command 
+```
+select segrelid from pg_appendonly where relid = $relid
+```
+then running 
+```
+select relname from pg_class where oid = segrelid
+```
+
+##Registering Data Using Information from a YAML Configuration File
+ 
+The `hawq register` command can register HDFS files

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81386403
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \ is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
+
+-   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client configured and the hdfs filepath specified.
+-   The files to be registered and the HAWQ table must be located in the 
same HDFS cluster.
+-   The target table DDL is configured with the correct data type mapping.
+
+##Registering Externally Generated HDFS File Data to an Existing Table
+
+Files or folders in HDFS can be registered into an existing table, 
allowing them to be managed as a HAWQ internal table. When registering files, 
you can optionally specify the maximum amount of data to be loaded, in bytes, 
using the `--eof` option. If registering a folder, the actual file sizes are 
used. 
+
+Only HAWQ or Hive-generated Parquet tables are supported. Partitioned 
tables are not supported. Attempting to register these tables will result in an 
error.
+
+Metadata for the Parquet file(s) and the destination table must be 
consistent. Different  data types are used by HAWQ tables and Parquet files, so 
data must be mapped. You must verify that the structure of the parquet files 
and the HAWQ table are compatible before running `hawq register`. 
+
+We recommand creating a copy of the Parquet file to be registered before 
running ```hawq register```
+You can then then run ```hawq register``` on the copy,  leaving the 
original file available for additional Hive queries or if a data mapping error 
is encountered.
+
+###Limitations for Registering Hive Tables to HAWQ
+The currently-supported data types for generating Hive tables into HAWQ 
tables are: boolean, int, smallint, tinyint, bigint, float, double, string, 
binary, char, and varchar.  
+
+The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
+
+###Example: Registering a Hive-Generated Parquet File
+
+This example shows how to register a HIVE-generated parquet file in HDFS 
into the table `parquet_table` in HAWQ, which is in the database named 
`postgres`. The file path of the HIVE-generated file is 
`hdfs://localhost:8020/temp/hive.paq`.
+
+In this example, the location of the database is 
`hdfs://localhost:8020/hawq_default`, the tablespace id is 16385, the database 
id is 16387, the table filenode id is 77160, and the last file under the 
filenode is numbered 7.
+
+Enter:
+
+``` pre
+$ hawq register -d postgres -f hdfs://localhost:8020/temp/hive.paq 
parquet_table
+```
+
+After running the `hawq register` command for the file location  
`hdfs://localhost:8020/temp/hive.paq`, the corresponding new location of the 
file in HDFS is:  `hdfs://localhost:8020/hawq_default/16385/16387/77160/8`. 
+
+The command then updates the metadata of the table `parquet_table` in 
HAWQ, which is contained in the table `pg_aoseg.pg_paqseg_77160`. The pg\_aoseg 
is a fixed schema for row-oriented and Parquet AO tables. For row-oriented 
tables, the table name prefix is pg\_aoseg. The table name prefix for parquet 
tables is pg\_paqseg. 77160 is the relation id of the table.
+
+To locate the table, either find the relation ID by looking up the catalog 
table pg\_class in SQL by running 
+
+```
+select oid from pg_class where relname=$relname
+```
+or find the table name by using the SQL command 
+```
+select segrelid from pg_appendonly where relid = $relid
+```
+then running 
+```
+select relname from pg_class where oid = segrelid
+```
+
+##Registering Data Using Information from a YAML Configuration File
+ 
+The `hawq register` command can register HDFS files

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81383227
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \ is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
+
+-   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client configured and the hdfs filepath specified.
+-   The files to be registered and the HAWQ table must be located in the 
same HDFS cluster.
+-   The target table DDL is configured with the correct data type mapping.
+
+##Registering Externally Generated HDFS File Data to an Existing Table
--- End diff --

Global: ID entries need to appear before the title text, like ## Registering...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81383986
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \ is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
+
+-   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client configured and the hdfs filepath specified.
+-   The files to be registered and the HAWQ table must be located in the 
same HDFS cluster.
+-   The target table DDL is configured with the correct data type mapping.
+
+##Registering Externally Generated HDFS File Data to an Existing Table
+
+Files or folders in HDFS can be registered into an existing table, 
allowing them to be managed as a HAWQ internal table. When registering files, 
you can optionally specify the maximum amount of data to be loaded, in bytes, 
using the `--eof` option. If registering a folder, the actual file sizes are 
used. 
+
+Only HAWQ or Hive-generated Parquet tables are supported. Partitioned 
tables are not supported. Attempting to register these tables will result in an 
error.
+
+Metadata for the Parquet file(s) and the destination table must be 
consistent. Different  data types are used by HAWQ tables and Parquet files, so 
data must be mapped. You must verify that the structure of the parquet files 
and the HAWQ table are compatible before running `hawq register`. 
+
+We recommand creating a copy of the Parquet file to be registered before 
running ```hawq register```
+You can then then run ```hawq register``` on the copy,  leaving the 
original file available for additional Hive queries or if a data mapping error 
is encountered.
+
+###Limitations for Registering Hive Tables to HAWQ
+The currently-supported data types for generating Hive tables into HAWQ 
tables are: boolean, int, smallint, tinyint, bigint, float, double, string, 
binary, char, and varchar.  
+
+The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
+
+###Example: Registering a Hive-Generated Parquet File
+
+This example shows how to register a HIVE-generated parquet file in HDFS 
into the table `parquet_table` in HAWQ, which is in the database named 
`postgres`. The file path of the HIVE-generated file is 
`hdfs://localhost:8020/temp/hive.paq`.
+
+In this example, the location of the database is 
`hdfs://localhost:8020/hawq_default`, the tablespace id is 16385, the database 
id is 16387, the table filenode id is 77160, and the last file under the 
filenode is numbered 7.
--- End diff --

For future work, it would be nice to provide commands for determining what 
these ID values will be before executing the register command.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81386491
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \ is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
+
+-   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client configured and the hdfs filepath specified.
+-   The files to be registered and the HAWQ table must be located in the 
same HDFS cluster.
+-   The target table DDL is configured with the correct data type mapping.
+
+##Registering Externally Generated HDFS File Data to an Existing Table
+
+Files or folders in HDFS can be registered into an existing table, 
allowing them to be managed as a HAWQ internal table. When registering files, 
you can optionally specify the maximum amount of data to be loaded, in bytes, 
using the `--eof` option. If registering a folder, the actual file sizes are 
used. 
+
+Only HAWQ or Hive-generated Parquet tables are supported. Partitioned 
tables are not supported. Attempting to register these tables will result in an 
error.
+
+Metadata for the Parquet file(s) and the destination table must be 
consistent. Different  data types are used by HAWQ tables and Parquet files, so 
data must be mapped. You must verify that the structure of the parquet files 
and the HAWQ table are compatible before running `hawq register`. 
+
+We recommand creating a copy of the Parquet file to be registered before 
running ```hawq register```
+You can then then run ```hawq register``` on the copy,  leaving the 
original file available for additional Hive queries or if a data mapping error 
is encountered.
+
+###Limitations for Registering Hive Tables to HAWQ
+The currently-supported data types for generating Hive tables into HAWQ 
tables are: boolean, int, smallint, tinyint, bigint, float, double, string, 
binary, char, and varchar.  
+
+The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
+
+###Example: Registering a Hive-Generated Parquet File
+
+This example shows how to register a HIVE-generated parquet file in HDFS 
into the table `parquet_table` in HAWQ, which is in the database named 
`postgres`. The file path of the HIVE-generated file is 
`hdfs://localhost:8020/temp/hive.paq`.
+
+In this example, the location of the database is 
`hdfs://localhost:8020/hawq_default`, the tablespace id is 16385, the database 
id is 16387, the table filenode id is 77160, and the last file under the 
filenode is numbered 7.
+
+Enter:
+
+``` pre
+$ hawq register -d postgres -f hdfs://localhost:8020/temp/hive.paq 
parquet_table
+```
+
+After running the `hawq register` command for the file location  
`hdfs://localhost:8020/temp/hive.paq`, the corresponding new location of the 
file in HDFS is:  `hdfs://localhost:8020/hawq_default/16385/16387/77160/8`. 
+
+The command then updates the metadata of the table `parquet_table` in 
HAWQ, which is contained in the table `pg_aoseg.pg_paqseg_77160`. The pg\_aoseg 
is a fixed schema for row-oriented and Parquet AO tables. For row-oriented 
tables, the table name prefix is pg\_aoseg. The table name prefix for parquet 
tables is pg\_paqseg. 77160 is the relation id of the table.
+
+To locate the table, either find the relation ID by looking up the catalog 
table pg\_class in SQL by running 
+
+```
+select oid from pg_class where relname=$relname
+```
+or find the table name by using the SQL command 
+```
+select segrelid from pg_appendonly where relid = $relid
+```
+then running 
+```
+select relname from pg_class where oid = segrelid
+```
+
+##Registering Data Using Information from a YAML Configuration File
+ 
+The `hawq register` command can register HDFS files

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81391750
  
--- Diff: reference/cli/admin_utilities/hawqregister.html.md.erb ---
@@ -2,102 +2,83 @@
 title: hawq register
 ---
 
-Loads and registers external parquet-formatted data in HDFS into a 
corresponding table in HAWQ.
+Loads and registers 
+AO or Parquet-formatted data in HDFS into a corresponding table in HAWQ.
 
 ## Synopsis
 
 ``` pre
-hawq register
+Usage 1:
+hawq register [] [-f ] [-e ] 

+
+Usage 2:
+hawq register [] [-c ][--force] 

+
+Connection Options:
  [-h ] 
  [-p ] 
  [-U ] 
  [-d ]
- [-t ] 
+ 
+Misc. Options:
  [-f ] 
+[-e ]
+[--force] 
  [-c ]  
 hawq register help | -? 
 hawq register --version
 ```
 
 ## Prerequisites
 
-The client machine where `hawq register` is executed must have the 
following:
+The client machine where `hawq register` is executed must meet the 
following conditions:
 
 -   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client must be configured and the hdfs filepath specified.
 -   The files to be registered and the HAWQ table located in the same HDFS 
cluster.
 -   The target table DDL is configured with the correct data type mapping.
 
 ## Description
 
-`hawq register` is a utility that loads and registers existing or external 
parquet data in HDFS into HAWQ, so that it can be directly ingested and 
accessed through HAWQ. Parquet data from the file or directory in the specified 
path is loaded into the appropriate HAWQ table directory in HDFS and the 
utility updates the corresponding HAWQ metadata for the files. 
+`hawq register` is a utility that loads and registers existing data files 
or folders in HDFS into HAWQ internal tables, allowing HAWQ to directly read 
the data and use internal table processing for operations such as transactions 
and high perforance, without needing to load or copy it. Data from the file or 
directory specified by \ is loaded into the appropriate HAWQ 
table directory in HDFS and the utility updates the corresponding HAWQ metadata 
for the files. 
 
-Only parquet tables can be loaded using the `hawq register` command. 
Metadata for the parquet file(s) and the destination table must be consistent. 
Different  data types are used by HAWQ tables and parquet tables, so the data 
is mapped. You must verify that the structure of the parquet files and the HAWQ 
table are compatible before running `hawq register`. 
+You can use `hawq register` to:
 
-Note: only HAWQ or HIVE-generated parquet tables are currently supported.
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster.
 
-###Limitations for Registering Hive Tables to HAWQ
-The currently-supported data types for generating Hive tables into HAWQ 
tables are: boolean, int, smallint, tinyint, bigint, float, double, string, 
binary, char, and varchar.  
+Two usage models are available.
 
-The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
+###Usage Model 1: register file data to an existing table.
--- End diff --

Capitalize "Register"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81382521
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \ is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
--- End diff --

Need to change "client server" to one or the other, I think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81382033
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \ is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
--- End diff --

This sentence has problems.  Maybe change it to:  Either AO **or** 
Parquet-formatted **files** in HDFS can be loaded into a corresponding table in 
HAWQ.  

But I'm not sure that a *file* can be AO.  Doesn't append-only apply to the 
associated tables?  Maybe a different term is required here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81383086
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \ is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
+
+-   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client configured and the hdfs filepath specified.
+-   The files to be registered and the HAWQ table must be located in the 
same HDFS cluster.
+-   The target table DDL is configured with the correct data type mapping.
+
--- End diff --

Need to make the above list items parallel.  And each should be a 
stand-alone sentence if you keep the punctuation after them.  Ie) change to 
"Network access is available between all hosts in your HAWQ cluster (master and 
segments) and the hosts from which the data to load will be loaded."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-28 Thread ictmalili

Github user ictmalili commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81051968
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,214 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \ is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
+
+-   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client configured and the hdfs filepath specified.
+-   The files to be registered and the HAWQ table must be located in the 
same HDFS cluster.
+-   The target table DDL is configured with the correct data type mapping.
+
+##Registering Externally Generated HDFS File Data to an Existing Table
+
+Files or folders in HDFS can be registered into an existing table, 
allowing them to be managed as a HAWQ internal table. When registering files, 
you can optionally specify the maximum amount of data to be loaded, in bytes, 
using the `--eof` option. If registering a folder, the actual file sizes are 
used. 
+
+Only HAWQ or Hive-generated Parquet tables are supported. Partitioned 
tables are not supported. Attempting to register these tables will result in an 
error.
+
+Metadata for the Parquet file(s) and the destination table must be 
consistent. Different  data types are used by HAWQ tables and Parquet files, so 
data must be mapped. You must verify that the structure of the parquet files 
and the HAWQ table are compatible before running `hawq register`. 
+
+We recommand creating a copy of the Parquet file to be registered before 
running ```hawq register```
+You can then then run ```hawq register``` on the copy,  leaving the 
original file available for additional Hive queries or if a data mapping error 
is encountered.
+
+###Limitations for Registering Hive Tables to HAWQ
+The currently-supported data types for generating Hive tables into HAWQ 
tables are: boolean, int, smallint, tinyint, bigint, float, double, string, 
binary, char, and varchar.  
+
+The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
+
+###Example: Registering a Hive-Generated Parquet File
+
+This example shows how to register a HIVE-generated parquet file in HDFS 
into the table `parquet_table` in HAWQ, which is in the database named 
`postgres`. The file path of the HIVE-generated file is 
`hdfs://localhost:8020/temp/hive.paq`.
+
+In this example, the location of the database is 
`hdfs://localhost:8020/hawq_default`, the tablespace id is 16385, the database 
id is 16387, the table filenode id is 77160, and the last file under the 
filenode is numbered 7.
+
+Enter:
+
+``` pre
+$ hawq register postgres -f hdfs://localhost:8020/temp/hive.paq 
parquet_table
+```
+
+After running the `hawq register` command for the file location  
`hdfs://localhost:8020/temp/hive.paq`, the corresponding new location of the 
file in HDFS is:  `hdfs://localhost:8020/hawq_default/16385/16387/77160/8`. 
+
+The command then updates the metadata of the table `parquet_table` in 
HAWQ, which is contained in the table `pg_aoseg.pg_paqseg_77160`. The pg\_aoseg 
is a fixed schema for row-oriented and Parquet AO tables. For row-oriented 
tables, the table name prefix is pg\_aoseg. The table name prefix for parquet 
tables is pg\_paqseg. 77160 is the relation id of the table.
+
+To locate the table, either find the relation ID by looking up the catalog 
table pg\_class in SQL by running 
+
+```
+select oid from pg_class where relname=$relname
+```
+or find the table name by using the SQL command 
+```
+select segrelid from pg_appendonly where relid = $relid
+```
+then running 
+```
+select relname from pg_class where oid = segrelid
+```
+
+##Registering Data Using Information from a .yml Configuration File
+ 
+The `hawq register` command can register HDFS files

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-28 Thread ictmalili

Github user ictmalili commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81052024
  
--- Diff: reference/cli/admin_utilities/hawqregister.html.md.erb ---
@@ -2,102 +2,84 @@
 title: hawq register
 ---
 
-Loads and registers external parquet-formatted data in HDFS into a 
corresponding table in HAWQ.
+Loads and registers 
+AO or Parquet-formatted data in HDFS into a corresponding table in HAWQ.
 
 ## Synopsis
 
 ``` pre
-hawq register
+Usage 1:
+hawq register [] [-f ] [-e ] 

+
+Usage 2:
+hawq register [] [-c ][--force] 

+
+Connection Options:
  [-h ] 
  [-p ] 
  [-U ] 
  [-d ]
- [-t ] 
+ 
+Misc. Options:
  [-f ] 
+[-e ]
+[--force] 
+[--repair]
--- End diff --

Please remove repair. Thanks 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-28 Thread janebeckman

GitHub user janebeckman opened a pull request:

https://github.com/apache/incubator-hawq-docs/pull/17

Updates for hawq register

Updates for hawq register, new section on registering files.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/janebeckman/incubator-hawq-docs 
feature/newregister

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq-docs/pull/17.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17


commit deb1c4b5b26cba9691c27ddb86ca4b980fdf0a00
Author: Jane Beckman 
Date:   2016-09-28T17:44:07Z

Updates for hawq register




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

24 matches

Site Navigation

Mail list logo

Footer information