[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

2016-10-31 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85845000
  
--- Diff: pxf/HDFSWritablePXF.html.md.erb ---
@@ -0,0 +1,410 @@
+---
+title: Writing Data to HDFS
+---
+
+The PXF HDFS plug-in supports writable external tables using the 
`HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable 
table to export data from a HAWQ internal table to HDFS.
+
+This section describes how to use these PXF profiles to create writable 
external tables.
+
+**Note**: You cannot directly query data in a HAWQ writable table.  After 
creating the external writable table, you must create a HAWQ readable external 
table accessing the HDFS file, then query that table. ??You can also create a 
Hive table to access the HDFS file.??
+
+## Prerequisites
+
+Before working with HDFS file data using HAWQ and PXF, ensure that:
+
+-   The HDFS plug-in is installed on all cluster nodes. See [Installing 
PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
+
+## Writing to PXF External Tables
+The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and 
`SequenceWritable`.
+
+Use the following syntax to create a HAWQ external writable table 
representing HDFS data: 
+
+``` sql
+CREATE EXTERNAL WRITABLE TABLE  
+(   [, ...] | LIKE  )
+LOCATION ('pxf://[:]/
+
?PROFILE=HdfsTextSimple|SequenceWritable[&=[...]]')
+FORMAT '[TEXT|CSV|CUSTOM]' ();
+```
+
+HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL 
TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the 
table below.
+
+| Keyword  | Value |
+|---|-|
+| \<host\>[:\<port\>]| The HDFS NameNode and port. |
+| \<path-to-hdfs-file\>| The path to the file in the HDFS data store. |
+| PROFILE| The `PROFILE` keyword must specify one of the values 
`HdfsTextSimple` or `SequenceWritable`. |
+| \<custom-option\>  | \<custom-option\> is profile-specific. These 
options are discussed in the next topic.|
+| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile 
when \<path-to-hdfs-file\> will reference a plain text delimited file. The 
`HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in 
`(delimiter=)` \<formatting-property\>. |
+| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` when 
\<path-to-hdfs-file\> will reference a comma-separated value file.  |
+| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the 
`SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports 
only the built-in `(formatter='pxfwritable_export)` (write) and 
`(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
+
+**Note**: When creating PXF external tables, you cannot use the `HEADER` 
option in your `FORMAT` specification.
+
+## Custom Options
+
+The `HdfsTextSimple` and `SequenceWritable` profiles support the following 
\<custom-options\>:
+
+| Keyword  | Value Description |
+|---|-|
+| COMPRESSION_CODEC| The compression codec Java class name. If this 
option is not provided, no data compression is performed. Supported compression 
codecs include: `org.apache.hadoop.io.compress.DefaultCodec`, 
`org.apache.hadoop.io.compress.BZip2Codec`, and 
`org.apache.hadoop.io.compress.GzipCodec` (`HdfsTextSimple` profile only) |
+| COMPRESSION_TYPE| The compression type to employ; supported values 
are `RECORD` (the default) or `BLOCK`. |
+| DATA-SCHEMA| (`SequenceWritable` profile only) The name of the 
writer serialization/deserialization class. The jar file in which this class 
resides must be in the PXF class path. This option has no default value. |
+| THREAD-SAFE | Boolean value determining if a table query can run in 
multi-thread mode. Default value is `TRUE`, requests run in multi-threaded 
mode. When set to `FALSE`, requests will be handled in a single thread.  
`THREAD-SAFE` should be set appropriately when operations that are not 
thread-safe are performed (i.e. compression). |
+
+## HdfsTextSimple Profile
+
+Use the `HdfsTextSimple` profile when writing delimited data to a plain 
text file where each row is a single record.
+
+Writable tables created using the `HdfsTextSimple` profile can use no, 
record, or block compression. When compression is used, the default, gzip, and 
bzip2 Hadoop compression codecs are supported:
+
+- org.apache.hadoop.io.compress.DefaultCodec
+- org.apache.hadoop.io.co

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

2016-10-31 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85814125
  
--- Diff: pxf/HDFSWritablePXF.html.md.erb ---
@@ -0,0 +1,410 @@
+---
+title: Writing Data to HDFS
+---
+
+The PXF HDFS plug-in supports writable external tables using the 
`HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable 
table to export data from a HAWQ internal table to HDFS.
+
+This section describes how to use these PXF profiles to create writable 
external tables.
+
+**Note**: You cannot directly query data in a HAWQ writable table.  After 
creating the external writable table, you must create a HAWQ readable external 
table accessing the HDFS file, then query that table. ??You can also create a 
Hive table to access the HDFS file.??
+
+## Prerequisites
+
+Before working with HDFS file data using HAWQ and PXF, ensure that:
+
+-   The HDFS plug-in is installed on all cluster nodes. See [Installing 
PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
+
+## Writing to PXF External Tables
+The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and 
`SequenceWritable`.
+
+Use the following syntax to create a HAWQ external writable table 
representing HDFS data: 
+
+``` sql
+CREATE EXTERNAL WRITABLE TABLE  
+(   [, ...] | LIKE  )
+LOCATION ('pxf://[:]/
+
?PROFILE=HdfsTextSimple|SequenceWritable[&=[...]]')
+FORMAT '[TEXT|CSV|CUSTOM]' ();
+```
+
+HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL 
TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the 
table below.
+
+| Keyword  | Value |
+|---|-|
+| \<host\>[:\<port\>]| The HDFS NameNode and port. |
+| \<path-to-hdfs-file\>| The path to the file in the HDFS data store. |
+| PROFILE| The `PROFILE` keyword must specify one of the values 
`HdfsTextSimple` or `SequenceWritable`. |
+| \<custom-option\>  | \<custom-option\> is profile-specific. These 
options are discussed in the next topic.|
+| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile 
when \<path-to-hdfs-file\> will reference a plain text delimited file. The 
`HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in 
`(delimiter=)` \<formatting-property\>. |
+| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` when 
\<path-to-hdfs-file\> will reference a comma-separated value file.  |
+| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the 
`SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports 
only the built-in `(formatter='pxfwritable_export)` (write) and 
`(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
+
+**Note**: When creating PXF external tables, you cannot use the `HEADER` 
option in your `FORMAT` specification.
+
+## Custom Options
+
+The `HdfsTextSimple` and `SequenceWritable` profiles support the following 
\<custom-options\>:
+
+| Keyword  | Value Description |
+|---|-|
+| COMPRESSION_CODEC| The compression codec Java class name. If this 
option is not provided, no data compression is performed. Supported compression 
codecs include: `org.apache.hadoop.io.compress.DefaultCodec`, 
`org.apache.hadoop.io.compress.BZip2Codec`, and 
`org.apache.hadoop.io.compress.GzipCodec` (`HdfsTextSimple` profile only) |
+| COMPRESSION_TYPE| The compression type to employ; supported values 
are `RECORD` (the default) or `BLOCK`. |
+| DATA-SCHEMA| (`SequenceWritable` profile only) The name of the 
writer serialization/deserialization class. The jar file in which this class 
resides must be in the PXF class path. This option has no default value. |
+| THREAD-SAFE | Boolean value determining if a table query can run in 
multi-thread mode. Default value is `TRUE`, requests run in multi-threaded 
mode. When set to `FALSE`, requests will be handled in a single thread.  
`THREAD-SAFE` should be set appropriately when operations that are not 
thread-safe are performed (i.e. compression). |
+
+## HdfsTextSimple Profile
+
+Use the `HdfsTextSimple` profile when writing delimited data to a plain 
text file where each row is a single record.
+
+Writable tables created using the `HdfsTextSimple` profile can use no, 
record, or block compression. When compression is used, the default, gzip, and 
bzip2 Hadoop compression codecs are supported:
+
+- org.apache.hadoop.io.compress.DefaultCodec
+- org.apache.hadoop.io.co

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

2016-10-31 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85794241
  
--- Diff: pxf/HDFSWritablePXF.html.md.erb ---
@@ -0,0 +1,410 @@
+---
+title: Writing Data to HDFS
+---
+
+The PXF HDFS plug-in supports writable external tables using the 
`HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable 
table to export data from a HAWQ internal table to HDFS.
+
+This section describes how to use these PXF profiles to create writable 
external tables.
+
+**Note**: You cannot directly query data in a HAWQ writable table.  After 
creating the external writable table, you must create a HAWQ readable external 
table accessing the HDFS file, then query that table. ??You can also create a 
Hive table to access the HDFS file.??
+
+## Prerequisites
+
+Before working with HDFS file data using HAWQ and PXF, ensure that:
+
+-   The HDFS plug-in is installed on all cluster nodes. See [Installing 
PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
+
+## Writing to PXF External Tables
+The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and 
`SequenceWritable`.
+
+Use the following syntax to create a HAWQ external writable table 
representing HDFS data: 
+
+``` sql
+CREATE EXTERNAL WRITABLE TABLE  
+(   [, ...] | LIKE  )
+LOCATION ('pxf://[:]/
+
?PROFILE=HdfsTextSimple|SequenceWritable[&=[...]]')
+FORMAT '[TEXT|CSV|CUSTOM]' ();
+```
+
+HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL 
TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the 
table below.
+
+| Keyword  | Value |
+|---|-|
+| \<host\>[:\<port\>]| The HDFS NameNode and port. |
+| \<path-to-hdfs-file\>| The path to the file in the HDFS data store. |
+| PROFILE| The `PROFILE` keyword must specify one of the values 
`HdfsTextSimple` or `SequenceWritable`. |
+| \<custom-option\>  | \<custom-option\> is profile-specific. These 
options are discussed in the next topic.|
+| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile 
when \<path-to-hdfs-file\> will reference a plain text delimited file. The 
`HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in 
`(delimiter=)` \<formatting-property\>. |
+| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` when 
\<path-to-hdfs-file\> will reference a comma-separated value file.  |
+| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the 
`SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports 
only the built-in `(formatter='pxfwritable_export)` (write) and 
`(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
+
+**Note**: When creating PXF external tables, you cannot use the `HEADER` 
option in your `FORMAT` specification.
+
+## Custom Options
+
+The `HdfsTextSimple` and `SequenceWritable` profiles support the following 
\<custom-options\>:
--- End diff --

Change  to " values"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

2016-10-31 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85792675
  
--- Diff: pxf/HDFSWritablePXF.html.md.erb ---
@@ -0,0 +1,410 @@
+---
+title: Writing Data to HDFS
+---
+
+The PXF HDFS plug-in supports writable external tables using the 
`HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable 
table to export data from a HAWQ internal table to HDFS.
+
+This section describes how to use these PXF profiles to create writable 
external tables.
+
+**Note**: You cannot directly query data in a HAWQ writable table.  After 
creating the external writable table, you must create a HAWQ readable external 
table accessing the HDFS file, then query that table. ??You can also create a 
Hive table to access the HDFS file.??
+
+## Prerequisites
+
+Before working with HDFS file data using HAWQ and PXF, ensure that:
+
+-   The HDFS plug-in is installed on all cluster nodes. See [Installing 
PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
+
+## Writing to PXF External Tables
+The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and 
`SequenceWritable`.
+
+Use the following syntax to create a HAWQ external writable table 
representing HDFS data: 
+
+``` sql
+CREATE EXTERNAL WRITABLE TABLE  
+(   [, ...] | LIKE  )
+LOCATION ('pxf://[:]/
+
?PROFILE=HdfsTextSimple|SequenceWritable[&=[...]]')
+FORMAT '[TEXT|CSV|CUSTOM]' ();
+```
+
+HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL 
TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the 
table below.
+
+| Keyword  | Value |
+|---|-|
+| \<host\>[:\<port\>]| The HDFS NameNode and port. |
+| \<path-to-hdfs-file\>| The path to the file in the HDFS data store. |
+| PROFILE| The `PROFILE` keyword must specify one of the values 
`HdfsTextSimple` or `SequenceWritable`. |
+| \<custom-option\>  | \<custom-option\> is profile-specific. These 
options are discussed in the next topic.|
--- End diff --

Maybe change this to  ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

2016-10-31 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85792301
  
--- Diff: pxf/HDFSWritablePXF.html.md.erb ---
@@ -0,0 +1,410 @@
+---
+title: Writing Data to HDFS
+---
+
+The PXF HDFS plug-in supports writable external tables using the 
`HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable 
table to export data from a HAWQ internal table to HDFS.
+
+This section describes how to use these PXF profiles to create writable 
external tables.
+
+**Note**: You cannot directly query data in a HAWQ writable table.  After 
creating the external writable table, you must create a HAWQ readable external 
table accessing the HDFS file, then query that table. ??You can also create a 
Hive table to access the HDFS file.??
+
+## Prerequisites
+
+Before working with HDFS file data using HAWQ and PXF, ensure that:
+
+-   The HDFS plug-in is installed on all cluster nodes. See [Installing 
PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
+
+## Writing to PXF External Tables
+The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and 
`SequenceWritable`.
--- End diff --

writable two -> two writable

Also, seems like there should be some mention of the difference between 
these profiles by now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

2016-10-31 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85845296
  
--- Diff: pxf/HDFSWritablePXF.html.md.erb ---
@@ -0,0 +1,410 @@
+---
+title: Writing Data to HDFS
+---
+
+The PXF HDFS plug-in supports writable external tables using the 
`HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable 
table to export data from a HAWQ internal table to HDFS.
+
+This section describes how to use these PXF profiles to create writable 
external tables.
+
+**Note**: You cannot directly query data in a HAWQ writable table.  After 
creating the external writable table, you must create a HAWQ readable external 
table accessing the HDFS file, then query that table. ??You can also create a 
Hive table to access the HDFS file.??
+
+## Prerequisites
+
+Before working with HDFS file data using HAWQ and PXF, ensure that:
+
+-   The HDFS plug-in is installed on all cluster nodes. See [Installing 
PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
+
+## Writing to PXF External Tables
+The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and 
`SequenceWritable`.
+
+Use the following syntax to create a HAWQ external writable table 
representing HDFS data: 
+
+``` sql
+CREATE EXTERNAL WRITABLE TABLE  
+(   [, ...] | LIKE  )
+LOCATION ('pxf://[:]/
+
?PROFILE=HdfsTextSimple|SequenceWritable[&=[...]]')
+FORMAT '[TEXT|CSV|CUSTOM]' ();
+```
+
+HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL 
TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the 
table below.
+
+| Keyword  | Value |
+|---|-|
+| \<host\>[:\<port\>]| The HDFS NameNode and port. |
+| \<path-to-hdfs-file\>| The path to the file in the HDFS data store. |
+| PROFILE| The `PROFILE` keyword must specify one of the values 
`HdfsTextSimple` or `SequenceWritable`. |
+| \<custom-option\>  | \<custom-option\> is profile-specific. These 
options are discussed in the next topic.|
+| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile 
when \<path-to-hdfs-file\> will reference a plain text delimited file. The 
`HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in 
`(delimiter=)` \<formatting-property\>. |
+| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` when 
\<path-to-hdfs-file\> will reference a comma-separated value file.  |
+| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the 
`SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports 
only the built-in `(formatter='pxfwritable_export)` (write) and 
`(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
+
+**Note**: When creating PXF external tables, you cannot use the `HEADER` 
option in your `FORMAT` specification.
+
+## Custom Options
+
+The `HdfsTextSimple` and `SequenceWritable` profiles support the following 
\<custom-options\>:
+
+| Keyword  | Value Description |
+|---|-|
+| COMPRESSION_CODEC| The compression codec Java class name. If this 
option is not provided, no data compression is performed. Supported compression 
codecs include: `org.apache.hadoop.io.compress.DefaultCodec`, 
`org.apache.hadoop.io.compress.BZip2Codec`, and 
`org.apache.hadoop.io.compress.GzipCodec` (`HdfsTextSimple` profile only) |
+| COMPRESSION_TYPE| The compression type to employ; supported values 
are `RECORD` (the default) or `BLOCK`. |
+| DATA-SCHEMA| (`SequenceWritable` profile only) The name of the 
writer serialization/deserialization class. The jar file in which this class 
resides must be in the PXF class path. This option has no default value. |
+| THREAD-SAFE | Boolean value determining if a table query can run in 
multi-thread mode. Default value is `TRUE`, requests run in multi-threaded 
mode. When set to `FALSE`, requests will be handled in a single thread.  
`THREAD-SAFE` should be set appropriately when operations that are not 
thread-safe are performed (i.e. compression). |
+
+## HdfsTextSimple Profile
+
+Use the `HdfsTextSimple` profile when writing delimited data to a plain 
text file where each row is a single record.
+
+Writable tables created using the `HdfsTextSimple` profile can use no, 
record, or block compression. When compression is used, the default, gzip, and 
bzip2 Hadoop compression codecs are supported:
+
+- org.apache.hadoop.io.compress.DefaultCodec
+- org.apache.hadoop.io.co

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

2016-10-31 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85791950
  
--- Diff: pxf/HDFSWritablePXF.html.md.erb ---
@@ -0,0 +1,410 @@
+---
+title: Writing Data to HDFS
+---
+
+The PXF HDFS plug-in supports writable external tables using the 
`HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable 
table to export data from a HAWQ internal table to HDFS.
+
+This section describes how to use these PXF profiles to create writable 
external tables.
+
+**Note**: You cannot directly query data in a HAWQ writable table.  After 
creating the external writable table, you must create a HAWQ readable external 
table accessing the HDFS file, then query that table. ??You can also create a 
Hive table to access the HDFS file.??
+
+## Prerequisites
+
+Before working with HDFS file data using HAWQ and PXF, ensure that:
+
+-   The HDFS plug-in is installed on all cluster nodes. See [Installing 
PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
+
+## Writing to PXF External Tables
+The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and 
`SequenceWritable`.
+
+Use the following syntax to create a HAWQ external writable table 
representing HDFS data: 
+
+``` sql
+CREATE EXTERNAL WRITABLE TABLE  
--- End diff --

That syntax is unfortunate.  GPDB uses CREATE WRITABLE EXTERNAL instead of 
CREATE EXTERNAL WRITABLE :(


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

2016-10-31 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85794550
  
--- Diff: pxf/HDFSWritablePXF.html.md.erb ---
@@ -0,0 +1,410 @@
+---
+title: Writing Data to HDFS
+---
+
+The PXF HDFS plug-in supports writable external tables using the 
`HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable 
table to export data from a HAWQ internal table to HDFS.
+
+This section describes how to use these PXF profiles to create writable 
external tables.
+
+**Note**: You cannot directly query data in a HAWQ writable table.  After 
creating the external writable table, you must create a HAWQ readable external 
table accessing the HDFS file, then query that table. ??You can also create a 
Hive table to access the HDFS file.??
+
+## Prerequisites
+
+Before working with HDFS file data using HAWQ and PXF, ensure that:
+
+-   The HDFS plug-in is installed on all cluster nodes. See [Installing 
PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
+
+## Writing to PXF External Tables
+The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and 
`SequenceWritable`.
+
+Use the following syntax to create a HAWQ external writable table 
representing HDFS data: 
+
+``` sql
+CREATE EXTERNAL WRITABLE TABLE  
+(   [, ...] | LIKE  )
+LOCATION ('pxf://[:]/
+
?PROFILE=HdfsTextSimple|SequenceWritable[&=[...]]')
+FORMAT '[TEXT|CSV|CUSTOM]' ();
+```
+
+HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL 
TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the 
table below.
+
+| Keyword  | Value |
+|---|-|
+| \<host\>[:\<port\>]| The HDFS NameNode and port. |
+| \<path-to-hdfs-file\>| The path to the file in the HDFS data store. |
+| PROFILE| The `PROFILE` keyword must specify one of the values 
`HdfsTextSimple` or `SequenceWritable`. |
+| \<custom-option\>  | \<custom-option\> is profile-specific. These 
options are discussed in the next topic.|
+| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile 
when \<path-to-hdfs-file\> will reference a plain text delimited file. The 
`HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in 
`(delimiter=)` \<formatting-property\>. |
+| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` when 
\<path-to-hdfs-file\> will reference a comma-separated value file.  |
+| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the 
`SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports 
only the built-in `(formatter='pxfwritable_export)` (write) and 
`(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
+
+**Note**: When creating PXF external tables, you cannot use the `HEADER` 
option in your `FORMAT` specification.
+
+## Custom Options
+
+The `HdfsTextSimple` and `SequenceWritable` profiles support the following 
\<custom-options\>:
+
+| Keyword  | Value Description |
+|---|-|
+| COMPRESSION_CODEC| The compression codec Java class name. If this 
option is not provided, no data compression is performed. Supported compression 
codecs include: `org.apache.hadoop.io.compress.DefaultCodec`, 
`org.apache.hadoop.io.compress.BZip2Codec`, and 
`org.apache.hadoop.io.compress.GzipCodec` (`HdfsTextSimple` profile only) |
+| COMPRESSION_TYPE| The compression type to employ; supported values 
are `RECORD` (the default) or `BLOCK`. |
+| DATA-SCHEMA| (`SequenceWritable` profile only) The name of the 
writer serialization/deserialization class. The jar file in which this class 
resides must be in the PXF class path. This option has no default value. |
--- End diff --

Is DATA-SCHEMA an option, or is it required?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

2016-10-31 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85793887
  
--- Diff: pxf/HDFSWritablePXF.html.md.erb ---
@@ -0,0 +1,410 @@
+---
+title: Writing Data to HDFS
+---
+
+The PXF HDFS plug-in supports writable external tables using the 
`HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable 
table to export data from a HAWQ internal table to HDFS.
+
+This section describes how to use these PXF profiles to create writable 
external tables.
+
+**Note**: You cannot directly query data in a HAWQ writable table.  After 
creating the external writable table, you must create a HAWQ readable external 
table accessing the HDFS file, then query that table. ??You can also create a 
Hive table to access the HDFS file.??
+
+## Prerequisites
+
+Before working with HDFS file data using HAWQ and PXF, ensure that:
+
+-   The HDFS plug-in is installed on all cluster nodes. See [Installing 
PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
+
+## Writing to PXF External Tables
+The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and 
`SequenceWritable`.
+
+Use the following syntax to create a HAWQ external writable table 
representing HDFS data: 
+
+``` sql
+CREATE EXTERNAL WRITABLE TABLE  
+(   [, ...] | LIKE  )
+LOCATION ('pxf://[:]/
+
?PROFILE=HdfsTextSimple|SequenceWritable[&=[...]]')
+FORMAT '[TEXT|CSV|CUSTOM]' ();
+```
+
+HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL 
TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the 
table below.
+
+| Keyword  | Value |
+|---|-|
+| \<host\>[:\<port\>]| The HDFS NameNode and port. |
+| \<path-to-hdfs-file\>| The path to the file in the HDFS data store. |
+| PROFILE| The `PROFILE` keyword must specify one of the values 
`HdfsTextSimple` or `SequenceWritable`. |
+| \<custom-option\>  | \<custom-option\> is profile-specific. These 
options are discussed in the next topic.|
+| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile 
when \<path-to-hdfs-file\> will reference a plain text delimited file. The 
`HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in 
`(delimiter=)` \<formatting-property\>. |
+| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` when 
\<path-to-hdfs-file\> will reference a comma-separated value file.  |
+| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the 
`SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports 
only the built-in `(formatter='pxfwritable_export)` (write) and 
`(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
+
+**Note**: When creating PXF external tables, you cannot use the `HEADER` 
option in your `FORMAT` specification.
+
+## Custom Options
+
+The `HdfsTextSimple` and `SequenceWritable` profiles support the following 
\<custom-options\>:
+
+| Keyword  | Value Description |
+|---|-|
+| COMPRESSION_CODEC| The compression codec Java class name. If this 
option is not provided, no data compression is performed. Supported compression 
codecs include: `org.apache.hadoop.io.compress.DefaultCodec`, 
`org.apache.hadoop.io.compress.BZip2Codec`, and 
`org.apache.hadoop.io.compress.GzipCodec` (`HdfsTextSimple` profile only) |
--- End diff --

Instead of including parentheticals here (`HdfsTextSimple` profile only), 
add a third column to indicate which profile(s) the option applies to.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #45: Revise section on work_mem

2016-10-31 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/45#discussion_r85779030
  
--- Diff: bestpractices/querying_data_bestpractices.html.md.erb ---
@@ -16,14 +16,14 @@ If a query performs poorly, examine its query plan and 
ask the following questio
 If the plan is not choosing the optimal join order, set 
`join_collapse_limit=1` and use explicit `JOIN` syntax in your SQL statement to 
force the legacy query optimizer (planner) to the specified join order. You can 
also collect more statistics on the relevant join columns.
 
 -   **Does the optimizer selectively scan partitioned tables?** If you use 
table partitioning, is the optimizer selectively scanning only the child tables 
required to satisfy the query predicates? Scans of the parent tables should 
return 0 rows since the parent tables do not contain any data. See [Verifying 
Your Partition Strategy](../ddl/ddl-partition.html#topic74) for an example of a 
query plan that shows a selective partition scan.
--   **Does the optimizer choose hash aggregate and hash join operations 
where applicable?** Hash operations are typically much faster than other types 
of joins or aggregations. Row comparison and sorting is done in memory rather 
than reading/writing from disk. To enable the query optimizer to choose hash 
operations, there must be sufficient memory available to hold the estimated 
number of rows. Try increasing work memory to improve performance for a query. 
If possible, run an `EXPLAIN  ANALYZE` for the query to show which plan 
operations spilled to disk, how much work memory they used, and how much memory 
was required to avoid spilling to disk. For example:
+-   **Does the optimizer choose hash aggregate and hash join operations 
where applicable?** Hash operations are typically much faster than other types 
of joins or aggregations. Row comparison and sorting is done in memory rather 
than reading/writing from disk. To enable the query optimizer to choose hash 
operations, there must be sufficient memory available to hold the estimated 
number of rows. You may wish to  run an `EXPLAIN  ANALYZE` for the query to 
show which plan operations spilled to disk, how much work memory they used, and 
how much memory was required to avoid spilling to disk. For example:
--- End diff --

Let's change "You may wish to  run" to just "Run" here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #36: Note consistent use of Ambari vs. hawq...

2016-10-31 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/36#discussion_r85776994
  
--- Diff: bestpractices/general_bestpractices.html.md.erb ---
@@ -9,6 +9,8 @@ When using HAWQ, adhere to the following guidelines for 
best results:
 -   **Use a consistent `hawq-site.xml` file to configure your entire 
cluster**:
 
 Configuration guc/parameters are located in 
`$GPHOME/etc/hawq-site.xml`. This configuration file resides on all HAWQ 
instances and can be modified by using the `hawq config` utility. You can use 
the same configuration file cluster-wide across both master and segments.
+
+If you install and manage HAWQ using Ambari, do not use `hawq config` 
to set or change HAWQ configuration properties. Use the Ambari interface for 
all configuration changes. Configuration changes to `hawq-site.xml` made 
outside the Ambari interface will be overwritten when HAWQ restarted or 
reconfigured using Ambari.
--- End diff --

Missing word:  when HAWQ **is** restarted

And/or edit for passive voice


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #39: HAWQ-1071 - add examples for HiveText ...

2016-10-27 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85367943
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -2,121 +2,450 @@
 title: Accessing Hive Data
 ---
 
-This topic describes how to access Hive data using PXF. You have several 
options for querying data stored in Hive. You can create external tables in PXF 
and then query those tables, or you can easily query Hive tables by using HAWQ 
and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored 
in HCatalog.
+Apache Hive is a distributed data warehousing infrastructure.  Hive 
facilitates managing large data sets supporting multiple data formats, 
including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive 
plug-in reads data stored in Hive, as well as HDFS or HBase.
+
+This section describes how to use PXF to access Hive data. Options for 
querying data stored in Hive include:
+
+-  Creating an external table in PXF and querying that table
+-  Querying Hive tables via PXF's integration with HCatalog
 
 ## Prerequisites
 
-Check the following before using PXF to access Hive:
+Before accessing Hive data with HAWQ and PXF, ensure that:
 
--   The PXF HDFS plug-in is installed on all cluster nodes.
+-   The PXF HDFS plug-in is installed on all cluster nodes. See 
[Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation 
information.
 -   The PXF Hive plug-in is installed on all cluster nodes.
 -   The Hive JAR files and conf directory are installed on all cluster 
nodes.
--   Test PXF on HDFS before connecting to Hive or HBase.
+-   You have tested PXF on HDFS.
 -   You are running the Hive Metastore service on a machine in your 
cluster. 
 -   You have set the `hive.metastore.uris` property in the 
`hive-site.xml` on the NameNode.
 
+## Hive File Formats
+
+Hive supports several file formats:
+
+-   TextFile - flat file with data in comma-, tab-, or space-separated 
value format or JSON notation
+-   SequenceFile - flat file consisting of binary key/value pairs
+-   RCFile - record columnar data consisting of binary key/value pairs; 
high row compression rate
+-   ORCFile - optimized row columnar data with stripe, footer, and 
postscript sections; reduces data size
+-   Parquet - compressed columnar data representation
+-   Avro - JSON-defined, schema-based data serialization format
+
+Refer to [File 
Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for 
detailed information about the file formats supported by Hive.
+
+The PXF Hive plug-in supports the following profiles for accessing the 
Hive file formats listed above. These include:
+
+- `Hive`
+- `HiveText`
+- `HiveRC`
+
+## Data Type Mapping
+
+### Primitive Data Types
+
+To represent Hive data in HAWQ, map data values that use a primitive data 
type to HAWQ columns of the same type.
+
+The following table summarizes external mapping rules for Hive primitive 
types.
+
+| Hive Data Type  | Hawq Data Type |
+|---|---|
+| boolean| bool |
+| int   | int4 |
+| smallint   | int2 |
+| tinyint   | int2 |
+| bigint   | int8 |
+| decimal  |  numeric  |
+| float   | float4 |
+| double   | float8 |
+| string   | text |
+| binary   | bytea |
+| char   | bpchar |
+| varchar   | varchar |
+| timestamp   | timestamp |
+| date   | date |
+
+
+### Complex Data Types
+
+Hive supports complex data types including array, struct, map, and union. 
PXF maps each of these complex types to `text`.  While HAWQ does not natively 
support these types, you can create HAWQ functions or application code to 
extract subcomponents of these complex data types.
+
+An example using complex data types is provided later in this topic.
+
+
+## Sample Data Set
+
+Examples used in this topic will operate on a common data set. This simple 
data set models a retail sales operation and includes fields with the following 
names and data types:
+
+- location - text
+- month - text
+- number\_of\_orders - integer
+- total\_sales - double
+
+Prepare the sample data set for use:
+
+1. First, create a text file:
+
+```
+$ vi /tmp/pxf_hive_datafile.txt
+```
+
+2. Add the following data to `pxf_hive_datafile.txt`; notice the use of 
the comma `,` to separate the four field values:
+
+```
+Prague,Jan,101,4875.33
+Rome,Mar,87,1557.39
+Bangalore,May,317,8936.99
+Beijing,Jul,411,11600.67
+San Francisco,Sept,156,6846.34
+Paris,Nov,159,7134.56
+San Francisco,Jan,113,5397.89
+Prague,Dec,333,9894.77

[GitHub] incubator-hawq-docs pull request #39: HAWQ-1071 - add examples for HiveText ...

2016-10-27 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85371576
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -151,184 +477,120 @@ To enable HCatalog query integration in HAWQ, 
perform the following steps:
 postgres=# GRANT ALL ON PROTOCOL pxf TO "role";
 ``` 
 
-3.  To query a Hive table with HCatalog integration, simply query HCatalog 
directly from HAWQ. The query syntax is:
 
-``` sql
-postgres=# SELECT * FROM hcatalog.hive-db-name.hive-table-name;
-```
+To query a Hive table with HCatalog integration, query HCatalog directly 
from HAWQ. The query syntax is:
+
+``` sql
+postgres=# SELECT * FROM hcatalog.hive-db-name.hive-table-name;
+```
 
-For example:
+For example:
 
-``` sql
-postgres=# SELECT * FROM hcatalog.default.sales;
-```
-
-4.  To obtain a description of a Hive table with HCatalog integration, you 
can use the `psql` client interface.
--   Within HAWQ, use either the `\d
 hcatalog.hive-db-name.hive-table-name` or `\d+ 
hcatalog.hive-db-name.hive-table-name` commands to describe a 
single table. For example, from the `psql` client interface:
-
-``` shell
-$ psql -d postgres
-postgres=# \d hcatalog.default.test
-
-PXF Hive Table "default.test"
-Column|  Type  
---+
- name | text
- type | text
- supplier_key | int4
- full_price   | float8 
-```
--   Use `\d hcatalog.hive-db-name.*` to describe the whole database 
schema. For example:
-
-``` shell
-postgres=# \d hcatalog.default.*
-
-PXF Hive Table "default.test"
-Column|  Type  
---+
- type | text
- name | text
- supplier_key | int4
- full_price   | float8
-
-PXF Hive Table "default.testabc"
- Column | Type 
-+--
- type   | text
- name   | text
-```
--   Use `\d hcatalog.*.*` to describe the whole schema:
-
-``` shell
-postgres=# \d hcatalog.*.*
-
-PXF Hive Table "default.test"
-Column|  Type  
---+
- type | text
- name | text
- supplier_key | int4
- full_price   | float8
-
-PXF Hive Table "default.testabc"
- Column | Type 
-+--
- type   | text
- name   | text
-
-PXF Hive Table "userdb.test"
-  Column  | Type 
---+--
- address  | text
- username | text
- 
-```
-
-**Note:** When using `\d` or `\d+` commands in the `psql` HAWQ client, 
`hcatalog` will not be listed as a database. If you use other `psql` compatible 
clients, `hcatalog` will be listed as a database with a size value of `-1` 
since `hcatalog` is not a real database in HAWQ.
-
-5.  Alternatively, you can use the **pxf\_get\_item\_fields** user-defined 
function (UDF) to obtain Hive table descriptions from other client interfaces 
or third-party applications. The UDF takes a PXF profile and a table pattern 
string as its input parameters.
-
-**Note:** Currently the only supported input profile is `'Hive'`.
-
-For example, the following statement returns a description of a 
specific table. The description includes path, itemname (table), fieldname, and 
fieldtype.
+``` sql
+postgres=# SELECT * FROM hcatalog.default.sales_info;
+```
+
+To obtain a description of a Hive table with HCatalog integration, you can 
use the `psql` client interface.
+
+-   Within HAWQ, use either the `\d
 hcatalog.hive-db-name.hive-table-name` or `\d+ 
hcatalog.hive-db-name.hive-table-name` commands to describe a single 
table. For example, from the `psql` client interface:
+
+``` shell
+$ psql -d postgres
+```
 
 ``` sql
-postgres=# select * from pxf_get_item_fields('Hive','default.test');
+postgres=# \d hcatalog.default.sales_info_rcfile;
 ```
-
-``` pre
-  path   | itemname |  fieldname   | fieldtype 
--+--+--+---
- default | test | name | text
- default | test | 

[GitHub] incubator-hawq-docs pull request #39: HAWQ-1071 - add examples for HiveText ...

2016-10-27 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85370681
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -151,184 +477,120 @@ To enable HCatalog query integration in HAWQ, 
perform the following steps:
 postgres=# GRANT ALL ON PROTOCOL pxf TO "role";
 ``` 
 
-3.  To query a Hive table with HCatalog integration, simply query HCatalog 
directly from HAWQ. The query syntax is:
 
-``` sql
-postgres=# SELECT * FROM hcatalog.hive-db-name.hive-table-name;
-```
+To query a Hive table with HCatalog integration, query HCatalog directly 
from HAWQ. The query syntax is:
--- End diff --

It's a bit awkward to drop out of the procedure and into free-form 
discussion of the various operations.  I think it might be better to put the 
previous 3-step procedure into a new subsection like "Enabling HCatalog 
Integration" and then putting the remaining non-procedural content into "Usage" 
?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #39: HAWQ-1071 - add examples for HiveText ...

2016-10-27 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85368752
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -2,121 +2,450 @@
 title: Accessing Hive Data
 ---
 
-This topic describes how to access Hive data using PXF. You have several 
options for querying data stored in Hive. You can create external tables in PXF 
and then query those tables, or you can easily query Hive tables by using HAWQ 
and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored 
in HCatalog.
+Apache Hive is a distributed data warehousing infrastructure.  Hive 
facilitates managing large data sets supporting multiple data formats, 
including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive 
plug-in reads data stored in Hive, as well as HDFS or HBase.
+
+This section describes how to use PXF to access Hive data. Options for 
querying data stored in Hive include:
+
+-  Creating an external table in PXF and querying that table
+-  Querying Hive tables via PXF's integration with HCatalog
 
 ## Prerequisites
 
-Check the following before using PXF to access Hive:
+Before accessing Hive data with HAWQ and PXF, ensure that:
 
--   The PXF HDFS plug-in is installed on all cluster nodes.
+-   The PXF HDFS plug-in is installed on all cluster nodes. See 
[Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation 
information.
 -   The PXF Hive plug-in is installed on all cluster nodes.
 -   The Hive JAR files and conf directory are installed on all cluster 
nodes.
--   Test PXF on HDFS before connecting to Hive or HBase.
+-   You have tested PXF on HDFS.
 -   You are running the Hive Metastore service on a machine in your 
cluster. 
 -   You have set the `hive.metastore.uris` property in the 
`hive-site.xml` on the NameNode.
 
+## Hive File Formats
+
+Hive supports several file formats:
+
+-   TextFile - flat file with data in comma-, tab-, or space-separated 
value format or JSON notation
+-   SequenceFile - flat file consisting of binary key/value pairs
+-   RCFile - record columnar data consisting of binary key/value pairs; 
high row compression rate
+-   ORCFile - optimized row columnar data with stripe, footer, and 
postscript sections; reduces data size
+-   Parquet - compressed columnar data representation
+-   Avro - JSON-defined, schema-based data serialization format
+
+Refer to [File 
Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for 
detailed information about the file formats supported by Hive.
+
+The PXF Hive plug-in supports the following profiles for accessing the 
Hive file formats listed above. These include:
+
+- `Hive`
+- `HiveText`
+- `HiveRC`
+
+## Data Type Mapping
+
+### Primitive Data Types
+
+To represent Hive data in HAWQ, map data values that use a primitive data 
type to HAWQ columns of the same type.
+
+The following table summarizes external mapping rules for Hive primitive 
types.
+
+| Hive Data Type  | Hawq Data Type |
+|---|---|
+| boolean| bool |
+| int   | int4 |
+| smallint   | int2 |
+| tinyint   | int2 |
+| bigint   | int8 |
+| decimal  |  numeric  |
+| float   | float4 |
+| double   | float8 |
+| string   | text |
+| binary   | bytea |
+| char   | bpchar |
+| varchar   | varchar |
+| timestamp   | timestamp |
+| date   | date |
+
+
+### Complex Data Types
+
+Hive supports complex data types including array, struct, map, and union. 
PXF maps each of these complex types to `text`.  While HAWQ does not natively 
support these types, you can create HAWQ functions or application code to 
extract subcomponents of these complex data types.
+
+An example using complex data types is provided later in this topic.
+
+
+## Sample Data Set
+
+Examples used in this topic will operate on a common data set. This simple 
data set models a retail sales operation and includes fields with the following 
names and data types:
+
+- location - text
+- month - text
+- number\_of\_orders - integer
+- total\_sales - double
+
+Prepare the sample data set for use:
+
+1. First, create a text file:
+
+```
+$ vi /tmp/pxf_hive_datafile.txt
+```
+
+2. Add the following data to `pxf_hive_datafile.txt`; notice the use of 
the comma `,` to separate the four field values:
+
+```
+Prague,Jan,101,4875.33
+Rome,Mar,87,1557.39
+Bangalore,May,317,8936.99
+Beijing,Jul,411,11600.67
+San Francisco,Sept,156,6846.34
+Paris,Nov,159,7134.56
+San Francisco,Jan,113,5397.89
+Prague,Dec,333,9894.77

[GitHub] incubator-hawq-docs pull request #39: HAWQ-1071 - add examples for HiveText ...

2016-10-27 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85365959
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -2,121 +2,450 @@
 title: Accessing Hive Data
 ---
 
-This topic describes how to access Hive data using PXF. You have several 
options for querying data stored in Hive. You can create external tables in PXF 
and then query those tables, or you can easily query Hive tables by using HAWQ 
and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored 
in HCatalog.
+Apache Hive is a distributed data warehousing infrastructure.  Hive 
facilitates managing large data sets supporting multiple data formats, 
including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive 
plug-in reads data stored in Hive, as well as HDFS or HBase.
+
+This section describes how to use PXF to access Hive data. Options for 
querying data stored in Hive include:
+
+-  Creating an external table in PXF and querying that table
+-  Querying Hive tables via PXF's integration with HCatalog
 
 ## Prerequisites
 
-Check the following before using PXF to access Hive:
+Before accessing Hive data with HAWQ and PXF, ensure that:
 
--   The PXF HDFS plug-in is installed on all cluster nodes.
+-   The PXF HDFS plug-in is installed on all cluster nodes. See 
[Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation 
information.
 -   The PXF Hive plug-in is installed on all cluster nodes.
 -   The Hive JAR files and conf directory are installed on all cluster 
nodes.
--   Test PXF on HDFS before connecting to Hive or HBase.
+-   You have tested PXF on HDFS.
 -   You are running the Hive Metastore service on a machine in your 
cluster. 
 -   You have set the `hive.metastore.uris` property in the 
`hive-site.xml` on the NameNode.
 
+## Hive File Formats
+
+Hive supports several file formats:
+
+-   TextFile - flat file with data in comma-, tab-, or space-separated 
value format or JSON notation
+-   SequenceFile - flat file consisting of binary key/value pairs
+-   RCFile - record columnar data consisting of binary key/value pairs; 
high row compression rate
+-   ORCFile - optimized row columnar data with stripe, footer, and 
postscript sections; reduces data size
+-   Parquet - compressed columnar data representation
+-   Avro - JSON-defined, schema-based data serialization format
+
+Refer to [File 
Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for 
detailed information about the file formats supported by Hive.
+
+The PXF Hive plug-in supports the following profiles for accessing the 
Hive file formats listed above. These include:
+
+- `Hive`
+- `HiveText`
+- `HiveRC`
+
+## Data Type Mapping
+
+### Primitive Data Types
+
+To represent Hive data in HAWQ, map data values that use a primitive data 
type to HAWQ columns of the same type.
+
+The following table summarizes external mapping rules for Hive primitive 
types.
+
+| Hive Data Type  | Hawq Data Type |
+|---|---|
+| boolean| bool |
+| int   | int4 |
+| smallint   | int2 |
+| tinyint   | int2 |
+| bigint   | int8 |
+| decimal  |  numeric  |
+| float   | float4 |
+| double   | float8 |
+| string   | text |
+| binary   | bytea |
+| char   | bpchar |
+| varchar   | varchar |
+| timestamp   | timestamp |
+| date   | date |
+
+
+### Complex Data Types
+
+Hive supports complex data types including array, struct, map, and union. 
PXF maps each of these complex types to `text`.  While HAWQ does not natively 
support these types, you can create HAWQ functions or application code to 
extract subcomponents of these complex data types.
+
+An example using complex data types is provided later in this topic.
+
+
+## Sample Data Set
+
+Examples used in this topic will operate on a common data set. This simple 
data set models a retail sales operation and includes fields with the following 
names and data types:
+
+- location - text
+- month - text
+- number\_of\_orders - integer
+- total\_sales - double
--- End diff --

Also consider term/definition table here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #39: HAWQ-1071 - add examples for HiveText ...

2016-10-27 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85366470
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -2,121 +2,450 @@
 title: Accessing Hive Data
 ---
 
-This topic describes how to access Hive data using PXF. You have several 
options for querying data stored in Hive. You can create external tables in PXF 
and then query those tables, or you can easily query Hive tables by using HAWQ 
and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored 
in HCatalog.
+Apache Hive is a distributed data warehousing infrastructure.  Hive 
facilitates managing large data sets supporting multiple data formats, 
including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive 
plug-in reads data stored in Hive, as well as HDFS or HBase.
+
+This section describes how to use PXF to access Hive data. Options for 
querying data stored in Hive include:
+
+-  Creating an external table in PXF and querying that table
+-  Querying Hive tables via PXF's integration with HCatalog
 
 ## Prerequisites
 
-Check the following before using PXF to access Hive:
+Before accessing Hive data with HAWQ and PXF, ensure that:
 
--   The PXF HDFS plug-in is installed on all cluster nodes.
+-   The PXF HDFS plug-in is installed on all cluster nodes. See 
[Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation 
information.
 -   The PXF Hive plug-in is installed on all cluster nodes.
 -   The Hive JAR files and conf directory are installed on all cluster 
nodes.
--   Test PXF on HDFS before connecting to Hive or HBase.
+-   You have tested PXF on HDFS.
 -   You are running the Hive Metastore service on a machine in your 
cluster. 
 -   You have set the `hive.metastore.uris` property in the 
`hive-site.xml` on the NameNode.
 
+## Hive File Formats
+
+Hive supports several file formats:
+
+-   TextFile - flat file with data in comma-, tab-, or space-separated 
value format or JSON notation
+-   SequenceFile - flat file consisting of binary key/value pairs
+-   RCFile - record columnar data consisting of binary key/value pairs; 
high row compression rate
+-   ORCFile - optimized row columnar data with stripe, footer, and 
postscript sections; reduces data size
+-   Parquet - compressed columnar data representation
+-   Avro - JSON-defined, schema-based data serialization format
+
+Refer to [File 
Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for 
detailed information about the file formats supported by Hive.
+
+The PXF Hive plug-in supports the following profiles for accessing the 
Hive file formats listed above. These include:
+
+- `Hive`
+- `HiveText`
+- `HiveRC`
+
+## Data Type Mapping
+
+### Primitive Data Types
+
+To represent Hive data in HAWQ, map data values that use a primitive data 
type to HAWQ columns of the same type.
+
+The following table summarizes external mapping rules for Hive primitive 
types.
+
+| Hive Data Type  | Hawq Data Type |
+|---|---|
+| boolean| bool |
+| int   | int4 |
+| smallint   | int2 |
+| tinyint   | int2 |
+| bigint   | int8 |
+| decimal  |  numeric  |
+| float   | float4 |
+| double   | float8 |
+| string   | text |
+| binary   | bytea |
+| char   | bpchar |
+| varchar   | varchar |
+| timestamp   | timestamp |
+| date   | date |
+
+
+### Complex Data Types
+
+Hive supports complex data types including array, struct, map, and union. 
PXF maps each of these complex types to `text`.  While HAWQ does not natively 
support these types, you can create HAWQ functions or application code to 
extract subcomponents of these complex data types.
+
+An example using complex data types is provided later in this topic.
+
+
+## Sample Data Set
+
+Examples used in this topic will operate on a common data set. This simple 
data set models a retail sales operation and includes fields with the following 
names and data types:
+
+- location - text
+- month - text
+- number\_of\_orders - integer
+- total\_sales - double
+
+Prepare the sample data set for use:
+
+1. First, create a text file:
+
+```
+$ vi /tmp/pxf_hive_datafile.txt
+```
+
+2. Add the following data to `pxf_hive_datafile.txt`; notice the use of 
the comma `,` to separate the four field values:
+
+```
+Prague,Jan,101,4875.33
+Rome,Mar,87,1557.39
+Bangalore,May,317,8936.99
+Beijing,Jul,411,11600.67
+San Francisco,Sept,156,6846.34
+Paris,Nov,159,7134.56
+San Francisco,Jan,113,5397.89
+Prague,Dec,333,9894.77

[GitHub] incubator-hawq-docs pull request #39: HAWQ-1071 - add examples for HiveText ...

2016-10-27 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85367789
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -2,121 +2,450 @@
 title: Accessing Hive Data
 ---
 
-This topic describes how to access Hive data using PXF. You have several 
options for querying data stored in Hive. You can create external tables in PXF 
and then query those tables, or you can easily query Hive tables by using HAWQ 
and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored 
in HCatalog.
+Apache Hive is a distributed data warehousing infrastructure.  Hive 
facilitates managing large data sets supporting multiple data formats, 
including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive 
plug-in reads data stored in Hive, as well as HDFS or HBase.
+
+This section describes how to use PXF to access Hive data. Options for 
querying data stored in Hive include:
+
+-  Creating an external table in PXF and querying that table
+-  Querying Hive tables via PXF's integration with HCatalog
 
 ## Prerequisites
 
-Check the following before using PXF to access Hive:
+Before accessing Hive data with HAWQ and PXF, ensure that:
 
--   The PXF HDFS plug-in is installed on all cluster nodes.
+-   The PXF HDFS plug-in is installed on all cluster nodes. See 
[Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation 
information.
 -   The PXF Hive plug-in is installed on all cluster nodes.
 -   The Hive JAR files and conf directory are installed on all cluster 
nodes.
--   Test PXF on HDFS before connecting to Hive or HBase.
+-   You have tested PXF on HDFS.
 -   You are running the Hive Metastore service on a machine in your 
cluster. 
 -   You have set the `hive.metastore.uris` property in the 
`hive-site.xml` on the NameNode.
 
+## Hive File Formats
+
+Hive supports several file formats:
+
+-   TextFile - flat file with data in comma-, tab-, or space-separated 
value format or JSON notation
+-   SequenceFile - flat file consisting of binary key/value pairs
+-   RCFile - record columnar data consisting of binary key/value pairs; 
high row compression rate
+-   ORCFile - optimized row columnar data with stripe, footer, and 
postscript sections; reduces data size
+-   Parquet - compressed columnar data representation
+-   Avro - JSON-defined, schema-based data serialization format
+
+Refer to [File 
Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for 
detailed information about the file formats supported by Hive.
+
+The PXF Hive plug-in supports the following profiles for accessing the 
Hive file formats listed above. These include:
+
+- `Hive`
+- `HiveText`
+- `HiveRC`
+
+## Data Type Mapping
+
+### Primitive Data Types
+
+To represent Hive data in HAWQ, map data values that use a primitive data 
type to HAWQ columns of the same type.
+
+The following table summarizes external mapping rules for Hive primitive 
types.
+
+| Hive Data Type  | Hawq Data Type |
+|---|---|
+| boolean| bool |
+| int   | int4 |
+| smallint   | int2 |
+| tinyint   | int2 |
+| bigint   | int8 |
+| decimal  |  numeric  |
+| float   | float4 |
+| double   | float8 |
+| string   | text |
+| binary   | bytea |
+| char   | bpchar |
+| varchar   | varchar |
+| timestamp   | timestamp |
+| date   | date |
+
+
+### Complex Data Types
+
+Hive supports complex data types including array, struct, map, and union. 
PXF maps each of these complex types to `text`.  While HAWQ does not natively 
support these types, you can create HAWQ functions or application code to 
extract subcomponents of these complex data types.
+
+An example using complex data types is provided later in this topic.
+
+
+## Sample Data Set
+
+Examples used in this topic will operate on a common data set. This simple 
data set models a retail sales operation and includes fields with the following 
names and data types:
+
+- location - text
+- month - text
+- number\_of\_orders - integer
+- total\_sales - double
+
+Prepare the sample data set for use:
+
+1. First, create a text file:
+
+```
+$ vi /tmp/pxf_hive_datafile.txt
+```
+
+2. Add the following data to `pxf_hive_datafile.txt`; notice the use of 
the comma `,` to separate the four field values:
+
+```
+Prague,Jan,101,4875.33
+Rome,Mar,87,1557.39
+Bangalore,May,317,8936.99
+Beijing,Jul,411,11600.67
+San Francisco,Sept,156,6846.34
+Paris,Nov,159,7134.56
+San Francisco,Jan,113,5397.89
+Prague,Dec,333,9894.77

[GitHub] incubator-hawq-docs pull request #39: HAWQ-1071 - add examples for HiveText ...

2016-10-27 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85372086
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -2,121 +2,450 @@
 title: Accessing Hive Data
 ---
 
-This topic describes how to access Hive data using PXF. You have several 
options for querying data stored in Hive. You can create external tables in PXF 
and then query those tables, or you can easily query Hive tables by using HAWQ 
and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored 
in HCatalog.
+Apache Hive is a distributed data warehousing infrastructure.  Hive 
facilitates managing large data sets supporting multiple data formats, 
including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive 
plug-in reads data stored in Hive, as well as HDFS or HBase.
+
+This section describes how to use PXF to access Hive data. Options for 
querying data stored in Hive include:
+
+-  Creating an external table in PXF and querying that table
+-  Querying Hive tables via PXF's integration with HCatalog
 
 ## Prerequisites
 
-Check the following before using PXF to access Hive:
+Before accessing Hive data with HAWQ and PXF, ensure that:
 
--   The PXF HDFS plug-in is installed on all cluster nodes.
+-   The PXF HDFS plug-in is installed on all cluster nodes. See 
[Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation 
information.
 -   The PXF Hive plug-in is installed on all cluster nodes.
 -   The Hive JAR files and conf directory are installed on all cluster 
nodes.
--   Test PXF on HDFS before connecting to Hive or HBase.
+-   You have tested PXF on HDFS.
 -   You are running the Hive Metastore service on a machine in your 
cluster. 
 -   You have set the `hive.metastore.uris` property in the 
`hive-site.xml` on the NameNode.
 
+## Hive File Formats
+
+Hive supports several file formats:
+
+-   TextFile - flat file with data in comma-, tab-, or space-separated 
value format or JSON notation
+-   SequenceFile - flat file consisting of binary key/value pairs
+-   RCFile - record columnar data consisting of binary key/value pairs; 
high row compression rate
+-   ORCFile - optimized row columnar data with stripe, footer, and 
postscript sections; reduces data size
+-   Parquet - compressed columnar data representation
+-   Avro - JSON-defined, schema-based data serialization format
+
+Refer to [File 
Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for 
detailed information about the file formats supported by Hive.
+
+The PXF Hive plug-in supports the following profiles for accessing the 
Hive file formats listed above. These include:
+
+- `Hive`
+- `HiveText`
+- `HiveRC`
+
+## Data Type Mapping
+
+### Primitive Data Types
+
+To represent Hive data in HAWQ, map data values that use a primitive data 
type to HAWQ columns of the same type.
+
+The following table summarizes external mapping rules for Hive primitive 
types.
+
+| Hive Data Type  | Hawq Data Type |
+|---|---|
+| boolean| bool |
+| int   | int4 |
+| smallint   | int2 |
+| tinyint   | int2 |
+| bigint   | int8 |
+| decimal  |  numeric  |
+| float   | float4 |
+| double   | float8 |
+| string   | text |
+| binary   | bytea |
+| char   | bpchar |
+| varchar   | varchar |
+| timestamp   | timestamp |
+| date   | date |
+
+
+### Complex Data Types
+
+Hive supports complex data types including array, struct, map, and union. 
PXF maps each of these complex types to `text`.  While HAWQ does not natively 
support these types, you can create HAWQ functions or application code to 
extract subcomponents of these complex data types.
+
+An example using complex data types is provided later in this topic.
+
+
+## Sample Data Set
+
+Examples used in this topic will operate on a common data set. This simple 
data set models a retail sales operation and includes fields with the following 
names and data types:
+
+- location - text
+- month - text
+- number\_of\_orders - integer
+- total\_sales - double
+
+Prepare the sample data set for use:
+
+1. First, create a text file:
+
+```
+$ vi /tmp/pxf_hive_datafile.txt
+```
+
+2. Add the following data to `pxf_hive_datafile.txt`; notice the use of 
the comma `,` to separate the four field values:
+
+```
+Prague,Jan,101,4875.33
+Rome,Mar,87,1557.39
+Bangalore,May,317,8936.99
+Beijing,Jul,411,11600.67
+San Francisco,Sept,156,6846.34
+Paris,Nov,159,7134.56
+San Francisco,Jan,113,5397.89
+Prague,Dec,333,9894.77

[GitHub] incubator-hawq-docs pull request #37: Add note on hawq stop -u -M fast

2016-10-25 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/37#discussion_r85026282
  
--- Diff: admin/startstop.html.md.erb ---
@@ -147,7 +147,7 @@ For best results in using `hawq start` and `hawq stop` 
to manage your HAWQ syste
 $ hawq stop master -M fast
 $ hawq stop master -M immediate
 ```
--   If you want to reload server parameter setting on a HAWQ database 
where there are active connections, use the command:
+-   If you want to reload server parameter settings on a HAWQ database 
where there are active connections, use the command:
--- End diff --

Change:

where there are -> that has


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #37: Add note on hawq stop -u -M fast

2016-10-25 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/37#discussion_r85026323
  
--- Diff: reference/cli/admin_utilities/hawqstop.html.md.erb ---
@@ -23,7 +23,7 @@ The `hawq stop` utility is used to stop the database 
servers that comprise a HAW
 By default, you are not allowed to shut down HAWQ if there are any client 
connections to the database. Use the `-M fast` option to roll back all in 
progress transactions and terminate any connections before shutting down. If 
there are any transactions in progress, the default behavior is to wait for 
them to commit before shutting down.
 
 With the `-u` option, the utility uploads changes made to the master 
`pg_hba.conf` file or to *runtime* configuration parameters in the master 
`hawq-site.xml` file without interruption of service. Note that any active 
sessions will not pick up the changes until they reconnect to the database.
-When active connections to the database are present, use the command `hawq 
stop cluster -u -M fast` to ensure that changes to the parameters are reloaded. 
 
+If the hawq cluster has active connections, use the command `hawq stop 
cluster -u -M fast` to ensure that changes to the parameters are reloaded.  
--- End diff --

hawq -> HAWQ


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #33: HAWQ-1107 - enhance PXF HDFS plugin do...

2016-10-25 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r84996425
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,388 +2,282 @@
 title: Accessing HDFS File Data
 ---
 
-## Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
 
-Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
+This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
 
--   Test PXF on HDFS before connecting to Hive or HBase.
--   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
+## Prerequisites
 
-## Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
 
-The syntax for creating an external HDFS file is as follows: 
+-   The HDFS plug-in is installed on all cluster nodes.
--- End diff --

Add an XREF here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #36: Note consistent use of Ambari vs. hawq...

2016-10-25 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/36#discussion_r85015279
  
--- Diff: reference/cli/admin_utilities/hawqactivate.html.md.erb ---
@@ -4,6 +4,8 @@ title: hawq activate
 
 Activates a standby master host and makes it the active master for the 
HAWQ system.
 
+**Note:** If HAWQ was installed using Ambari, do not use `hawq activate` 
to activate a standby master host. The system catalogs could become 
unsynchronized if you mix Ambari and command line functions. For Ambari-managed 
HAWQ clusters, always use the Ambari administration interface to activate a 
standby master.
--- End diff --

Let's include a link to the ambari section here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #36: Note consistent use of Ambari vs. hawq...

2016-10-25 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/36#discussion_r85016338
  
--- Diff: bestpractices/general_bestpractices.html.md.erb ---
@@ -9,6 +9,8 @@ When using HAWQ, adhere to the following guidelines for 
best results:
 -   **Use a consistent `hawq-site.xml` file to configure your entire 
cluster**:
 
 Configuration guc/parameters are located in 
`$GPHOME/etc/hawq-site.xml`. This configuration file resides on all HAWQ 
instances and can be modified by using the `hawq config` utility. You can use 
the same configuration file cluster-wide across both master and segments.
+
+If you install and manage HAWQ using Ambari, be aware that any 
property changes to `hawq-site.xml` made using the command line could be 
overwritten by Ambari. For Ambari-managed HAWQ clusters, always use the Ambari 
administration interface, not `hawq config`, to set or change HAWQ 
configuration properties. 
--- End diff --

Because we know that Ambari will overwrite local changes, I don't think we 
should say this "could" happen.  Maybe edit this note (globally) to be more 
succinct:

If you install and manage HAWQ using Ambari, do not use `hawq config` to 
set or change HAWQ configuration properties. Instead, use the Ambari interface 
for all configuration changes.  Any property changes that are made to 
`hawq-site.xml` outside of the Ambari interface will be overwritten when when 
you restart or reconfigure HAWQ with Ambari.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #33: HAWQ-1107 - enhance PXF HDFS plugin do...

2016-10-25 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r84999127
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,388 +2,282 @@
 title: Accessing HDFS File Data
 ---
 
-## Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
 
-Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
+This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
 
--   Test PXF on HDFS before connecting to Hive or HBase.
--   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
+## Prerequisites
 
-## Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
 
-The syntax for creating an external HDFS file is as follows: 
+-   The HDFS plug-in is installed on all cluster nodes.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
 
-``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
-( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://host[:port]/path-to-data?[=value...]')
-  FORMAT '[TEXT | CSV | CUSTOM]' ();
-```
+## HDFS File Formats
 
-where `` is:
+The PXF HDFS plug-in supports reading the following file formats:
 
-``` pre
-   
FRAGMENTER=fragmenter_class=accessor_class=resolver_class]
- | PROFILE=profile-name
-```
+- Text File - comma-separated value (.csv) or delimited format plain text 
file
+- Avro - JSON-defined, schema-based data serialization format
 
-**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
+The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
 
-Use an SQL `SELECT` statement to read from an HDFS READABLE table:
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
 
-``` sql
-SELECT ... FROM table_name;
-```
 
-Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table:
+## HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.  
These tools support typical file system operations including copying and 
listing files, changing file permissions, etc. 
 
-``` sql
-INSERT INTO table_name ...;
-```
+The HDFS file system command is `hdfs dfs  []`. Invoked 
with no options, `hdfs dfs` lists the file system options supported by the tool.
+
+`hdfs dfs` options used in this section are identified in the table below:
+
+| Option  | Description |
+|---|-|
+| `-cat`| Display file contents. |
+| `-mkdir`| Create directory in HDFS. |
+| `-put`| Copy file from local file system to HDFS. |
+
+### Create Data Files
+
+Perform the following steps to create data files used in subsequent 
exercises:
--- End diff --

I think this procedure needs a bit more explanation about what its trying 
to accomplish. It seems like this should be optional in the context of the 
larger topic, as readers might already have files in HDFS that they want to 
reference.  Just add some notes to say that you can optionally follow the steps 
to create some sample files in HDFS for use in later examples.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #33: HAWQ-1107 - enhance PXF HDFS plugin do...

2016-10-25 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r84998515
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,388 +2,282 @@
 title: Accessing HDFS File Data
 ---
 
-## Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
 
-Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
+This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
 
--   Test PXF on HDFS before connecting to Hive or HBase.
--   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
+## Prerequisites
 
-## Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
 
-The syntax for creating an external HDFS file is as follows: 
+-   The HDFS plug-in is installed on all cluster nodes.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
 
-``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
-( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://host[:port]/path-to-data?[=value...]')
-  FORMAT '[TEXT | CSV | CUSTOM]' ();
-```
+## HDFS File Formats
 
-where `` is:
+The PXF HDFS plug-in supports reading the following file formats:
 
-``` pre
-   
FRAGMENTER=fragmenter_class=accessor_class=resolver_class]
- | PROFILE=profile-name
-```
+- Text File - comma-separated value (.csv) or delimited format plain text 
file
+- Avro - JSON-defined, schema-based data serialization format
 
-**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
+The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
 
-Use an SQL `SELECT` statement to read from an HDFS READABLE table:
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
 
-``` sql
-SELECT ... FROM table_name;
-```
 
-Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table:
+## HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.  
These tools support typical file system operations including copying and 
listing files, changing file permissions, etc. 
 
-``` sql
-INSERT INTO table_name ...;
-```
+The HDFS file system command is `hdfs dfs  []`. Invoked 
with no options, `hdfs dfs` lists the file system options supported by the tool.
+
+`hdfs dfs` options used in this section are identified in the table below:
--- End diff --

Edit:

The `hdfs dfs` options used in this section are:


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #33: HAWQ-1107 - enhance PXF HDFS plugin do...

2016-10-25 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r84997631
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,388 +2,282 @@
 title: Accessing HDFS File Data
 ---
 
-## Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
 
-Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
+This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
 
--   Test PXF on HDFS before connecting to Hive or HBase.
--   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
+## Prerequisites
 
-## Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
 
-The syntax for creating an external HDFS file is as follows: 
+-   The HDFS plug-in is installed on all cluster nodes.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
 
-``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
-( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://host[:port]/path-to-data?[=value...]')
-  FORMAT '[TEXT | CSV | CUSTOM]' ();
-```
+## HDFS File Formats
 
-where `` is:
+The PXF HDFS plug-in supports reading the following file formats:
 
-``` pre
-   
FRAGMENTER=fragmenter_class=accessor_class=resolver_class]
- | PROFILE=profile-name
-```
+- Text File - comma-separated value (.csv) or delimited format plain text 
file
+- Avro - JSON-defined, schema-based data serialization format
 
-**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
+The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
 
-Use an SQL `SELECT` statement to read from an HDFS READABLE table:
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
 
-``` sql
-SELECT ... FROM table_name;
-```
 
-Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table:
+## HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.  
These tools support typical file system operations including copying and 
listing files, changing file permissions, etc. 
--- End diff --

Change "etc." to "and so forth."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #33: HAWQ-1107 - enhance PXF HDFS plugin do...

2016-10-25 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r84997781
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,388 +2,282 @@
 title: Accessing HDFS File Data
 ---
 
-## Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
 
-Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
+This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
 
--   Test PXF on HDFS before connecting to Hive or HBase.
--   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
+## Prerequisites
 
-## Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
 
-The syntax for creating an external HDFS file is as follows: 
+-   The HDFS plug-in is installed on all cluster nodes.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
 
-``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
-( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://host[:port]/path-to-data?[=value...]')
-  FORMAT '[TEXT | CSV | CUSTOM]' ();
-```
+## HDFS File Formats
 
-where `` is:
+The PXF HDFS plug-in supports reading the following file formats:
 
-``` pre
-   
FRAGMENTER=fragmenter_class=accessor_class=resolver_class]
- | PROFILE=profile-name
-```
+- Text File - comma-separated value (.csv) or delimited format plain text 
file
+- Avro - JSON-defined, schema-based data serialization format
 
-**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
+The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
 
-Use an SQL `SELECT` statement to read from an HDFS READABLE table:
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
 
-``` sql
-SELECT ... FROM table_name;
-```
 
-Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table:
+## HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.  
These tools support typical file system operations including copying and 
listing files, changing file permissions, etc. 
 
-``` sql
-INSERT INTO table_name ...;
-```
+The HDFS file system command is `hdfs dfs  []`. Invoked 
with no options, `hdfs dfs` lists the file system options supported by the tool.
--- End diff --

command -> command syntax


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #33: HAWQ-1107 - enhance PXF HDFS plugin do...

2016-10-25 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85002565
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,388 +2,282 @@
 title: Accessing HDFS File Data
 ---
 
-## Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
 
-Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
+This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
 
--   Test PXF on HDFS before connecting to Hive or HBase.
--   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
+## Prerequisites
 
-## Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
 
-The syntax for creating an external HDFS file is as follows: 
+-   The HDFS plug-in is installed on all cluster nodes.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
 
-``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
-( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://host[:port]/path-to-data?[=value...]')
-  FORMAT '[TEXT | CSV | CUSTOM]' ();
-```
+## HDFS File Formats
 
-where `` is:
+The PXF HDFS plug-in supports reading the following file formats:
 
-``` pre
-   
FRAGMENTER=fragmenter_class=accessor_class=resolver_class]
- | PROFILE=profile-name
-```
+- Text File - comma-separated value (.csv) or delimited format plain text 
file
+- Avro - JSON-defined, schema-based data serialization format
 
-**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
+The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
 
-Use an SQL `SELECT` statement to read from an HDFS READABLE table:
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
 
-``` sql
-SELECT ... FROM table_name;
-```
 
-Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table:
+## HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.  
These tools support typical file system operations including copying and 
listing files, changing file permissions, etc. 
 
-``` sql
-INSERT INTO table_name ...;
-```
+The HDFS file system command is `hdfs dfs  []`. Invoked 
with no options, `hdfs dfs` lists the file system options supported by the tool.
+
+`hdfs dfs` options used in this section are identified in the table below:
+
+| Option  | Description |
+|---|-|
+| `-cat`| Display file contents. |
+| `-mkdir`| Create directory in HDFS. |
+| `-put`| Copy file from local file system to HDFS. |
+
+### Create Data Files
+
+Perform the following steps to create data files used in subsequent 
exercises:
+
+1. Create an HDFS directory for PXF example data files:
+
+``` shell
+ $ sudo -u hdfs hdfs dfs -mkdir -p /data/pxf_examples
+```
+
+2. Create a delimited plain text file:
+
+``` shell
+$ vi /tmp/pxf_hdfs_simple.txt
+```
+
+3. Copy and paste the following data into `pxf_hdfs_simple.txt`:
+
+``` pre
+Prague,Jan,101,4875.33
+Rome,Mar,87,1557.39
+Bangalore,May,317,8936.99
+Beijing,Jul,411,11600.67
+```
+
+Notice the use of the comma `,` to separate the four data fields.
+
+4. Add the data file to HDFS:
+
+``` shell
+$ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_simple.txt 
/data/pxf_examples/
+```
+
+5. Display the contents of the `pxf_hdfs_simple.txt` file stored in HDFS:
+
+``` shell
+$ sudo -u hdfs hdfs dfs -cat /data/pxf_examples/pxf_hdfs_simple.txt
+```
+
+6. Create a second delimited plain text file:
+
+``` shell
+$ vi /tmp/pxf_hdfs_multi.txt
+```
 
-To read the data in the files or to write based on the existing format, 
use `FORMAT`, `PROFILE`, or one of the classes.
-
-This topic describes the following:
-
--   FORMAT clause
--   Profile
--   Accessor
--   Resolver
--   Avro
-
-**Note:** For more details about the API and classes, see [PXF External 
Tables and 
API

[GitHub] incubator-hawq-docs pull request #33: HAWQ-1107 - enhance PXF HDFS plugin do...

2016-10-25 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85003214
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,388 +2,282 @@
 title: Accessing HDFS File Data
 ---
 
-## Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
 
-Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
+This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
 
--   Test PXF on HDFS before connecting to Hive or HBase.
--   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
+## Prerequisites
 
-## Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
 
-The syntax for creating an external HDFS file is as follows: 
+-   The HDFS plug-in is installed on all cluster nodes.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
 
-``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
-( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://host[:port]/path-to-data?[=value...]')
-  FORMAT '[TEXT | CSV | CUSTOM]' ();
-```
+## HDFS File Formats
 
-where `` is:
+The PXF HDFS plug-in supports reading the following file formats:
 
-``` pre
-   
FRAGMENTER=fragmenter_class=accessor_class=resolver_class]
- | PROFILE=profile-name
-```
+- Text File - comma-separated value (.csv) or delimited format plain text 
file
+- Avro - JSON-defined, schema-based data serialization format
 
-**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
+The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
 
-Use an SQL `SELECT` statement to read from an HDFS READABLE table:
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
 
-``` sql
-SELECT ... FROM table_name;
-```
 
-Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table:
+## HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.  
These tools support typical file system operations including copying and 
listing files, changing file permissions, etc. 
 
-``` sql
-INSERT INTO table_name ...;
-```
+The HDFS file system command is `hdfs dfs  []`. Invoked 
with no options, `hdfs dfs` lists the file system options supported by the tool.
+
+`hdfs dfs` options used in this section are identified in the table below:
+
+| Option  | Description |
+|---|-|
+| `-cat`| Display file contents. |
+| `-mkdir`| Create directory in HDFS. |
+| `-put`| Copy file from local file system to HDFS. |
+
+### Create Data Files
+
+Perform the following steps to create data files used in subsequent 
exercises:
+
+1. Create an HDFS directory for PXF example data files:
+
+``` shell
+ $ sudo -u hdfs hdfs dfs -mkdir -p /data/pxf_examples
+```
+
+2. Create a delimited plain text file:
+
+``` shell
+$ vi /tmp/pxf_hdfs_simple.txt
+```
+
+3. Copy and paste the following data into `pxf_hdfs_simple.txt`:
+
+``` pre
+Prague,Jan,101,4875.33
+Rome,Mar,87,1557.39
+Bangalore,May,317,8936.99
+Beijing,Jul,411,11600.67
+```
+
+Notice the use of the comma `,` to separate the four data fields.
+
+4. Add the data file to HDFS:
+
+``` shell
+$ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_simple.txt 
/data/pxf_examples/
+```
+
+5. Display the contents of the `pxf_hdfs_simple.txt` file stored in HDFS:
+
+``` shell
+$ sudo -u hdfs hdfs dfs -cat /data/pxf_examples/pxf_hdfs_simple.txt
+```
+
+6. Create a second delimited plain text file:
+
+``` shell
+$ vi /tmp/pxf_hdfs_multi.txt
+```
 
-To read the data in the files or to write based on the existing format, 
use `FORMAT`, `PROFILE`, or one of the classes.
-
-This topic describes the following:
-
--   FORMAT clause
--   Profile
--   Accessor
--   Resolver
--   Avro
-
-**Note:** For more details about the API and classes, see [PXF External 
Tables and 
API

[GitHub] incubator-hawq-docs pull request #33: HAWQ-1107 - enhance PXF HDFS plugin do...

2016-10-25 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85000415
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,388 +2,282 @@
 title: Accessing HDFS File Data
 ---
 
-## Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
 
-Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
+This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
 
--   Test PXF on HDFS before connecting to Hive or HBase.
--   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
+## Prerequisites
 
-## Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
 
-The syntax for creating an external HDFS file is as follows: 
+-   The HDFS plug-in is installed on all cluster nodes.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
 
-``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
-( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://host[:port]/path-to-data?[=value...]')
-  FORMAT '[TEXT | CSV | CUSTOM]' ();
-```
+## HDFS File Formats
 
-where `` is:
+The PXF HDFS plug-in supports reading the following file formats:
 
-``` pre
-   
FRAGMENTER=fragmenter_class=accessor_class=resolver_class]
- | PROFILE=profile-name
-```
+- Text File - comma-separated value (.csv) or delimited format plain text 
file
+- Avro - JSON-defined, schema-based data serialization format
 
-**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
+The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
 
-Use an SQL `SELECT` statement to read from an HDFS READABLE table:
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
 
-``` sql
-SELECT ... FROM table_name;
-```
 
-Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table:
+## HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.  
These tools support typical file system operations including copying and 
listing files, changing file permissions, etc. 
 
-``` sql
-INSERT INTO table_name ...;
-```
+The HDFS file system command is `hdfs dfs  []`. Invoked 
with no options, `hdfs dfs` lists the file system options supported by the tool.
+
+`hdfs dfs` options used in this section are identified in the table below:
+
+| Option  | Description |
+|---|-|
+| `-cat`| Display file contents. |
+| `-mkdir`| Create directory in HDFS. |
+| `-put`| Copy file from local file system to HDFS. |
+
+### Create Data Files
+
+Perform the following steps to create data files used in subsequent 
exercises:
+
+1. Create an HDFS directory for PXF example data files:
+
+``` shell
+ $ sudo -u hdfs hdfs dfs -mkdir -p /data/pxf_examples
+```
+
+2. Create a delimited plain text file:
+
+``` shell
+$ vi /tmp/pxf_hdfs_simple.txt
+```
+
+3. Copy and paste the following data into `pxf_hdfs_simple.txt`:
+
+``` pre
+Prague,Jan,101,4875.33
+Rome,Mar,87,1557.39
+Bangalore,May,317,8936.99
+Beijing,Jul,411,11600.67
+```
+
+Notice the use of the comma `,` to separate the four data fields.
+
+4. Add the data file to HDFS:
+
+``` shell
+$ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_simple.txt 
/data/pxf_examples/
+```
+
+5. Display the contents of the `pxf_hdfs_simple.txt` file stored in HDFS:
+
+``` shell
+$ sudo -u hdfs hdfs dfs -cat /data/pxf_examples/pxf_hdfs_simple.txt
+```
+
+6. Create a second delimited plain text file:
+
+``` shell
+$ vi /tmp/pxf_hdfs_multi.txt
+```
 
-To read the data in the files or to write based on the existing format, 
use `FORMAT`, `PROFILE`, or one of the classes.
-
-This topic describes the following:
-
--   FORMAT clause
--   Profile
--   Accessor
--   Resolver
--   Avro
-
-**Note:** For more details about the API and classes, see [PXF External 
Tables and 
API

[GitHub] incubator-hawq-docs pull request #33: HAWQ-1107 - enhance PXF HDFS plugin do...

2016-10-25 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85003579
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -415,93 +312,101 @@ The following example uses the Avro schema shown in 
[Sample Avro Schema](#topic_
 {"name":"street", "type":"string"},
 {"name":"city", "type":"string"}]
 }
-  }, {
-   "name": "relationship",
-"type": {
-"type": "enum",
-"name": "relationshipEnum",
-"symbols": 
["MARRIED","LOVE","FRIEND","COLLEAGUE","STRANGER","ENEMY"]
-}
-  }, {
-"name" : "md5",
-"type": {
-"type" : "fixed",
-"name" : "md5Fixed",
-"size" : 4
-}
   } ],
   "doc:" : "A basic schema for storing messages"
 }
 ```
 
- Sample Avro Data 
(JSON)
+### Sample Avro Data 
(JSON)
+
+Create a text file named `pxf_hdfs_avro.txt`:
+
+``` shell
+$ vi /tmp/pxf_hdfs_avro.txt
+```
+
+Enter the following data into `pxf_hdfs_avro.txt`:
 
 ``` pre
-{"id":1, "username":"john","followers":["kate", "santosh"], "rank":null, 
"relationship": "FRIEND", "fmap": {"kate":10,"santosh":4},
-"address":{"street":"renaissance drive", "number":1,"city":"san jose"}, 
"md5":\u3F00\u007A\u0073\u0074}
+{"id":1, "username":"john","followers":["kate", "santosh"], 
"relationship": "FRIEND", "fmap": {"kate":10,"santosh":4}, 
"address":{"number":1, "street":"renaissance drive", "city":"san jose"}}
+
+{"id":2, "username":"jim","followers":["john", "pam"], "relationship": 
"COLLEAGUE", "fmap": {"john":3,"pam":3}, "address":{"number":9, "street":"deer 
creek", "city":"palo alto"}}
+```
+
+The sample data uses a comma `,` to separate top level records and a colon 
`:` to separate map/key values and record field name/values.
 
-{"id":2, "username":"jim","followers":["john", "pam"], "rank":3, 
"relationship": "COLLEAGUE", "fmap": {"john":3,"pam":3}, 
-"address":{"street":"deer creek", "number":9,"city":"palo alto"}, 
"md5":\u0010\u0021\u0003\u0004}
+Convert the text file to Avro format. There are various ways to perform 
the conversion programmatically and via the command line. In this example, we 
use the [Java Avro tools](http://avro.apache.org/releases.html), and the jar 
file resides in the current directory:
+
+``` shell
+$ java -jar ./avro-tools-1.8.1.jar fromjson --schema-file 
/tmp/avro_schema.avsc /tmp/pxf_hdfs_avro.txt > /tmp/pxf_hdfs_avro.avro
 ```
 
-To map this Avro file to an external table, the top-level primitive fields 
("id" of type long and "username" of type string) are mapped to their 
equivalent HAWQ types (bigint and text). The remaining complex fields are 
mapped to text columns:
+The generated Avro binary data file is written to 
`/tmp/pxf_hdfs_avro.avro`. Copy this file to HDFS:
 
-``` sql
-gpadmin=# CREATE EXTERNAL TABLE avro_complex 
-  (id bigint, 
-  username text, 
-  followers text, 
-  rank int, 
-  fmap text, 
-  address text, 
-  relationship text,
-  md5 bytea) 
-LOCATION ('pxf://namehost:51200/tmp/avro_complex?PROFILE=Avro')
-FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
+``` shell
+$ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_avro.avro /data/pxf_examples/
 ```
+### Querying Avro Data
+
+Create a queryable external table from this Avro file:
 
-The above command uses default delimiters for separating components of the 
complex types. This command is equivalent to the one above, but it explicitly 
sets the delimiters using the Avro profile parameters:
+-  Map the top-level primitive fields, `id` (type long) and `username` 
(type string), to their equivalent HAWQ types (bigint and text). 
+-  M

[GitHub] incubator-hawq-docs pull request #33: HAWQ-1107 - enhance PXF HDFS plugin do...

2016-10-25 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r84999604
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,388 +2,282 @@
 title: Accessing HDFS File Data
 ---
 
-## Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
 
-Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
+This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
 
--   Test PXF on HDFS before connecting to Hive or HBase.
--   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
+## Prerequisites
 
-## Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
 
-The syntax for creating an external HDFS file is as follows: 
+-   The HDFS plug-in is installed on all cluster nodes.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
 
-``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
-( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://host[:port]/path-to-data?[=value...]')
-  FORMAT '[TEXT | CSV | CUSTOM]' ();
-```
+## HDFS File Formats
 
-where `` is:
+The PXF HDFS plug-in supports reading the following file formats:
 
-``` pre
-   
FRAGMENTER=fragmenter_class=accessor_class=resolver_class]
- | PROFILE=profile-name
-```
+- Text File - comma-separated value (.csv) or delimited format plain text 
file
+- Avro - JSON-defined, schema-based data serialization format
 
-**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
+The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
 
-Use an SQL `SELECT` statement to read from an HDFS READABLE table:
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
 
-``` sql
-SELECT ... FROM table_name;
-```
 
-Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table:
+## HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.  
These tools support typical file system operations including copying and 
listing files, changing file permissions, etc. 
 
-``` sql
-INSERT INTO table_name ...;
-```
+The HDFS file system command is `hdfs dfs  []`. Invoked 
with no options, `hdfs dfs` lists the file system options supported by the tool.
+
+`hdfs dfs` options used in this section are identified in the table below:
+
+| Option  | Description |
+|---|-|
+| `-cat`| Display file contents. |
+| `-mkdir`| Create directory in HDFS. |
+| `-put`| Copy file from local file system to HDFS. |
+
+### Create Data Files
+
+Perform the following steps to create data files used in subsequent 
exercises:
+
+1. Create an HDFS directory for PXF example data files:
+
+``` shell
+ $ sudo -u hdfs hdfs dfs -mkdir -p /data/pxf_examples
+```
+
+2. Create a delimited plain text file:
+
+``` shell
+$ vi /tmp/pxf_hdfs_simple.txt
--- End diff --

Does it make sense to change these into `echo` commands so they can just be 
cut/pasted?  Like:

$ echo 'Prague,Jan,101,4875.33
Rome,Mar,87,1557.39
Bangalore,May,317,8936.99
Beijing,Jul,411,11600.67' >> pxf_hdfs_simple.txt


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #22: Feature/start init

2016-10-19 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/22#discussion_r84120777
  
--- Diff: reference/cli/admin_utilities/hawqinit.html.md.erb ---
@@ -4,7 +4,7 @@ title: hawq init
 
 The `hawq init cluster` command initializes a HAWQ system and starts it.
 
-The master or segment nodes can be individually initialized by using `hawq 
init master` and `hawq init segment` commands, respectively. The `hawq init 
standby` command initializes a standby master host for a HAWQ system.
+The master or segment nodes can be individually initialized by using `hawq 
init master` and `hawq init segment` commands, respectively.  Format options 
can also be specified at this time. The `hawq init standby` command initializes 
a standby master host for a HAWQ system.
--- End diff --

Small edit: Change "initialized by using" to "initialized by using the"

Also edit for passive voice here.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #23: HAWQ-1095 - enhance database api docs

2016-10-19 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r84117499
  
--- Diff: clientaccess/g-database-application-interfaces.html.md.erb ---
@@ -1,8 +1,96 @@
 ---
-title: ODBC/JDBC Application Interfaces
+title: HAWQ Database Drivers and APIs
 ---
 
+You may want to connect your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The database application programming 
interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs.
 
-You may want to deploy your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The most commonly used database application 
programming interfaces with HAWQ are the ODBC and JDBC APIs. 
+HAWQ provides the following connectivity tools for connecting to the 
database:
+
+  - ODBC driver
+  - JDBC driver
+  - `libpq` - PostgreSQL C API
+
+## HAWQ Drivers
+
+ODBC and JDBC drivers for HAWQ are available as a separate download from 
Pivotal Network [Pivotal 
Network](https://network.pivotal.io/products/pivotal-hdb).
+
+### ODBC Driver
+
+The ODBC API specifies a standard set of C interfaces for accessing 
database management systems.  For additional information on using the ODBC API, 
refer to the [ODBC Programmer's 
Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) 
documentation.
+
+HAWQ supports the DataDirect ODBC Driver. Installation instructions for 
this driver are provided on the Pivotal Network driver download page. Refer to 
[HAWQ ODBC 
Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23)
 for HAWQ-specific ODBC driver information.
--- End diff --

Ok - thanks.  I think in other cases PDFs of the actual docs are included.  
This might only be in the Windows downloads.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #25: HAWQ-1096 - add content for hawq built...

2016-10-18 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83978854
  
--- Diff: plext/builtin_langs.html.md.erb ---
@@ -0,0 +1,110 @@
+---
+title: Using HAWQ Built-In Languages
+---
+
+This section provides an introduction to using the HAWQ built-in languages.
+
+HAWQ supports user-defined functions created with the SQL and C built-in 
languages. HAWQ also supports user-defined aliases for internal functions.
+
+
+## Enabling Built-in Language Support
+
+Support for SQL, internal, and C language user-defined functions is 
enabled by default for all HAWQ databases.
+
+## SQL
+
+SQL functions execute an arbitrary list of SQL statements. The SQL 
statements in the body of an SQL function must be separated by semicolons. The 
final statement in a non-void-returning SQL function must be a 
[SELECT](../reference/sql/SELECT.html) that returns data of the type specified 
by the function's return type. The function will return a single or set of rows 
corresponding to this last SQL query.
+
+The following example creates and calls an SQL function to count the 
number of rows of the database named `orders`:
+
+``` sql
+gpadmin=# CREATE FUNCTION count_orders() RETURNS bigint AS $$
+ SELECT count(*) FROM orders;
+$$ LANGUAGE SQL;
+CREATE FUNCTION
+gpadmin=# select count_orders();
+ my_count 
+--
+   830513
+(1 row)
+```
+
+For additional information on creating SQL functions, refer to [Query 
Language (SQL) 
Functions](https://www.postgresql.org/docs/8.2/static/xfunc-sql.html) in the 
PostgreSQL documentation.
+
+## Internal
+
+Many HAWQ internal functions are written in C. These functions are 
declared during initialization of the database cluster and statically linked to 
the HAWQ server. See [Built-in Functions and 
Operators](../query/functions-operators.html#topic29) for detailed information 
on HAWQ internal functions.
+
+While users cannot define new internal functions, they can create aliases 
for existing internal functions.
+
+The following example creates a new function named `all_caps` that will be 
defined as an alias for the `upper` HAWQ internal function:
+
+
+``` sql
+gpadmin=# CREATE FUNCTION all_caps (text) RETURNS text AS 'upper'
+LANGUAGE internal STRICT;
+CREATE FUNCTION
+gpadmin=# SELECT all_caps('change me');
+ all_caps  
+---
+ CHANGE ME
+(1 row)
+
+```
+
+For more information on aliasing internal functions, refer to [Internal 
Functions](https://www.postgresql.org/docs/8.2/static/xfunc-internal.html) in 
the PostgreSQL documentation.
+
+## C
+
+User-defined functions written in C must be compiled into shared libraries 
to be loaded by the HAWQ server on demand. This dynamic loading distinguishes C 
language functions from internal functions that are written in C.
--- End diff --

Avoid passive voice here:  "You must compile user-defined functions written 
in C into shared libraries so that the HAWQ server can load them on demand."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #25: HAWQ-1096 - add content for hawq built...

2016-10-18 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83976414
  
--- Diff: plext/UsingProceduralLanguages.html.md.erb ---
@@ -1,13 +1,16 @@
 ---
-title: Using Procedural Languages and Extensions in HAWQ
+title: Using Languages and Extensions in HAWQ
 ---
 
-HAWQ allows user-defined functions to be written in other languages 
besides SQL and C. These other languages are generically called *procedural 
languages* (PLs).
+HAWQ supports user-defined functions created with the SQL and C built-in 
languages, including supporting user-defined aliases for internal functions.
--- End diff --

This needs a bit of an edit:  HAWQ supports user-defined functions **that 
are** created with the SQL and C built-in languages, **and also supports** 
user-defined aliases for internal functions.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #25: HAWQ-1096 - add content for hawq built...

2016-10-18 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83978465
  
--- Diff: plext/builtin_langs.html.md.erb ---
@@ -0,0 +1,110 @@
+---
+title: Using HAWQ Built-In Languages
+---
+
+This section provides an introduction to using the HAWQ built-in languages.
+
+HAWQ supports user-defined functions created with the SQL and C built-in 
languages. HAWQ also supports user-defined aliases for internal functions.
+
+
+## Enabling Built-in Language Support
+
+Support for SQL, internal, and C language user-defined functions is 
enabled by default for all HAWQ databases.
+
+## SQL
+
+SQL functions execute an arbitrary list of SQL statements. The SQL 
statements in the body of an SQL function must be separated by semicolons. The 
final statement in a non-void-returning SQL function must be a 
[SELECT](../reference/sql/SELECT.html) that returns data of the type specified 
by the function's return type. The function will return a single or set of rows 
corresponding to this last SQL query.
+
+The following example creates and calls an SQL function to count the 
number of rows of the database named `orders`:
+
+``` sql
+gpadmin=# CREATE FUNCTION count_orders() RETURNS bigint AS $$
+ SELECT count(*) FROM orders;
+$$ LANGUAGE SQL;
+CREATE FUNCTION
+gpadmin=# select count_orders();
+ my_count 
+--
+   830513
+(1 row)
+```
+
+For additional information on creating SQL functions, refer to [Query 
Language (SQL) 
Functions](https://www.postgresql.org/docs/8.2/static/xfunc-sql.html) in the 
PostgreSQL documentation.
+
+## Internal
+
+Many HAWQ internal functions are written in C. These functions are 
declared during initialization of the database cluster and statically linked to 
the HAWQ server. See [Built-in Functions and 
Operators](../query/functions-operators.html#topic29) for detailed information 
on HAWQ internal functions.
+
+While users cannot define new internal functions, they can create aliases 
for existing internal functions.
--- End diff --

Reword:  **You** cannot define new internal functions, **but you** can 
create...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #25: HAWQ-1096 - add content for hawq built...

2016-10-18 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83978174
  
--- Diff: plext/builtin_langs.html.md.erb ---
@@ -0,0 +1,110 @@
+---
+title: Using HAWQ Built-In Languages
+---
+
+This section provides an introduction to using the HAWQ built-in languages.
+
+HAWQ supports user-defined functions created with the SQL and C built-in 
languages. HAWQ also supports user-defined aliases for internal functions.
+
+
+## Enabling Built-in Language Support
+
+Support for SQL, internal, and C language user-defined functions is 
enabled by default for all HAWQ databases.
+
+## SQL
+
+SQL functions execute an arbitrary list of SQL statements. The SQL 
statements in the body of an SQL function must be separated by semicolons. The 
final statement in a non-void-returning SQL function must be a 
[SELECT](../reference/sql/SELECT.html) that returns data of the type specified 
by the function's return type. The function will return a single or set of rows 
corresponding to this last SQL query.
+
+The following example creates and calls an SQL function to count the 
number of rows of the database named `orders`:
+
+``` sql
+gpadmin=# CREATE FUNCTION count_orders() RETURNS bigint AS $$
+ SELECT count(*) FROM orders;
+$$ LANGUAGE SQL;
+CREATE FUNCTION
+gpadmin=# select count_orders();
+ my_count 
+--
+   830513
+(1 row)
+```
+
+For additional information on creating SQL functions, refer to [Query 
Language (SQL) 
Functions](https://www.postgresql.org/docs/8.2/static/xfunc-sql.html) in the 
PostgreSQL documentation.
+
+## Internal
--- End diff --

Change title to "Internal Functions"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #25: HAWQ-1096 - add content for hawq built...

2016-10-18 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83978153
  
--- Diff: plext/builtin_langs.html.md.erb ---
@@ -0,0 +1,110 @@
+---
+title: Using HAWQ Built-In Languages
+---
+
+This section provides an introduction to using the HAWQ built-in languages.
+
+HAWQ supports user-defined functions created with the SQL and C built-in 
languages. HAWQ also supports user-defined aliases for internal functions.
+
+
+## Enabling Built-in Language Support
+
+Support for SQL, internal, and C language user-defined functions is 
enabled by default for all HAWQ databases.
+
+## SQL
+
+SQL functions execute an arbitrary list of SQL statements. The SQL 
statements in the body of an SQL function must be separated by semicolons. The 
final statement in a non-void-returning SQL function must be a 
[SELECT](../reference/sql/SELECT.html) that returns data of the type specified 
by the function's return type. The function will return a single or set of rows 
corresponding to this last SQL query.
+
+The following example creates and calls an SQL function to count the 
number of rows of the database named `orders`:
+
+``` sql
+gpadmin=# CREATE FUNCTION count_orders() RETURNS bigint AS $$
+ SELECT count(*) FROM orders;
+$$ LANGUAGE SQL;
+CREATE FUNCTION
+gpadmin=# select count_orders();
+ my_count 
+--
+   830513
+(1 row)
+```
+
+For additional information on creating SQL functions, refer to [Query 
Language (SQL) 
Functions](https://www.postgresql.org/docs/8.2/static/xfunc-sql.html) in the 
PostgreSQL documentation.
+
+## Internal
+
+Many HAWQ internal functions are written in C. These functions are 
declared during initialization of the database cluster and statically linked to 
the HAWQ server. See [Built-in Functions and 
Operators](../query/functions-operators.html#topic29) for detailed information 
on HAWQ internal functions.
+
+While users cannot define new internal functions, they can create aliases 
for existing internal functions.
+
+The following example creates a new function named `all_caps` that will be 
defined as an alias for the `upper` HAWQ internal function:
+
+
+``` sql
+gpadmin=# CREATE FUNCTION all_caps (text) RETURNS text AS 'upper'
+LANGUAGE internal STRICT;
+CREATE FUNCTION
+gpadmin=# SELECT all_caps('change me');
+ all_caps  
+---
+ CHANGE ME
+(1 row)
+
+```
+
+For more information on aliasing internal functions, refer to [Internal 
Functions](https://www.postgresql.org/docs/8.2/static/xfunc-internal.html) in 
the PostgreSQL documentation.
+
+## C
--- End diff --

This id value is the same as the previous one - should be unique.  Also 
change header to "C Functions"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #23: HAWQ-1095 - enhance database api docs

2016-10-18 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r83974367
  
--- Diff: clientaccess/g-database-application-interfaces.html.md.erb ---
@@ -1,8 +1,96 @@
 ---
-title: ODBC/JDBC Application Interfaces
+title: HAWQ Database Drivers and APIs
 ---
 
+You may want to connect your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The database application programming 
interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs.
 
-You may want to deploy your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The most commonly used database application 
programming interfaces with HAWQ are the ODBC and JDBC APIs. 
+HAWQ provides the following connectivity tools for connecting to the 
database:
+
+  - ODBC driver
+  - JDBC driver
+  - `libpq` - PostgreSQL C API
+
+## HAWQ Drivers
+
+ODBC and JDBC drivers for HAWQ are available as a separate download from 
Pivotal Network [Pivotal 
Network](https://network.pivotal.io/products/pivotal-hdb).
+
+### ODBC Driver
+
+The ODBC API specifies a standard set of C interfaces for accessing 
database management systems.  For additional information on using the ODBC API, 
refer to the [ODBC Programmer's 
Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) 
documentation.
+
+HAWQ supports the DataDirect ODBC Driver. Installation instructions for 
this driver are provided on the Pivotal Network driver download page. Refer to 
[HAWQ ODBC 
Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23)
 for HAWQ-specific ODBC driver information.
--- End diff --

Are you sure the datadirect link contains the same info available in the 
HAWQ ODBC download?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #23: HAWQ-1095 - enhance database api docs

2016-10-18 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r83974918
  
--- Diff: clientaccess/g-database-application-interfaces.html.md.erb ---
@@ -1,8 +1,96 @@
 ---
-title: ODBC/JDBC Application Interfaces
+title: HAWQ Database Drivers and APIs
 ---
 
+You may want to connect your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The database application programming 
interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs.
 
-You may want to deploy your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The most commonly used database application 
programming interfaces with HAWQ are the ODBC and JDBC APIs. 
+HAWQ provides the following connectivity tools for connecting to the 
database:
+
+  - ODBC driver
+  - JDBC driver
+  - `libpq` - PostgreSQL C API
+
+## HAWQ Drivers
+
+ODBC and JDBC drivers for HAWQ are available as a separate download from 
Pivotal Network [Pivotal 
Network](https://network.pivotal.io/products/pivotal-hdb).
+
+### ODBC Driver
+
+The ODBC API specifies a standard set of C interfaces for accessing 
database management systems.  For additional information on using the ODBC API, 
refer to the [ODBC Programmer's 
Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) 
documentation.
+
+HAWQ supports the DataDirect ODBC Driver. Installation instructions for 
this driver are provided on the Pivotal Network driver download page. Refer to 
[HAWQ ODBC 
Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23)
 for HAWQ-specific ODBC driver information.
+
+ Connection Data Source
+The information required by the HAWQ ODBC driver to connect to a database 
is typically stored in a named data source. Depending on your platform, you may 
use 
[GUI](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_through_a_GUI_14.html%23)
 or [command 
line](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_in_the_UNIX_2fLinux_odbc_13.html%23)
 tools to create your data source definition. On Linux, ODBC data sources are 
typically defined in a file named `odbc.ini`. 
+
+Commonly-specified HAWQ ODBC data source connection properties include:
+
+| Property Name| Value 
Description 

|

+|---|-|
+| Database | name of the database to which you want to connect |
+| Driver   | full path to the ODBC driver library file 
  |
+| HostName  | HAWQ master host name
 |
+| MaxLongVarcharSize  | maximum size of columns of type long varchar   

   |
+| Password  | password used to connect to the specified 
database
   |
+| PortNumber  | HAWQ master database port number   
|
+
+Refer to [Connection Option 
Descriptions](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fgreenplum-connection-option-descriptions.html%23)
 for a list of ODBC connection properties supported by the HAWQ DataDirect ODBC 
driver.
+
+Example HAWQ DataDirect ODBC driver data source definition:
+
+``` shell
+[HAWQ-201]
+Driver=/usr/local/hawq_drivers/odbc/lib/ddgplm27.so
+Description=DataDirect 7.1 Greenplum Wire Protocol - for HAWQ
+Database=getstartdb
+HostName=hdm1
+PortNumber=5432
+Password=changeme
+MaxLongVarcharSize=8192
+```
+
+The first line, `[HAWQ-201]`, identifies the name of the data source.
+
+ODBC connection properties may also be specified in a connection string 
identifying either a data source name, the name of a file data source, or the 
name of a driver.  A HAWQ ODBC connection string has the following format:
+
+``` shell

+([DSN=]|[FILEDSN=]|[DRIVER=])[;

[GitHub] incubator-hawq-docs pull request #23: HAWQ-1095 - enhance database api docs

2016-10-18 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r83974424
  
--- Diff: clientaccess/g-database-application-interfaces.html.md.erb ---
@@ -1,8 +1,96 @@
 ---
-title: ODBC/JDBC Application Interfaces
+title: HAWQ Database Drivers and APIs
 ---
 
+You may want to connect your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The database application programming 
interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs.
 
-You may want to deploy your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The most commonly used database application 
programming interfaces with HAWQ are the ODBC and JDBC APIs. 
+HAWQ provides the following connectivity tools for connecting to the 
database:
+
+  - ODBC driver
+  - JDBC driver
+  - `libpq` - PostgreSQL C API
+
+## HAWQ Drivers
+
+ODBC and JDBC drivers for HAWQ are available as a separate download from 
Pivotal Network [Pivotal 
Network](https://network.pivotal.io/products/pivotal-hdb).
+
+### ODBC Driver
+
+The ODBC API specifies a standard set of C interfaces for accessing 
database management systems.  For additional information on using the ODBC API, 
refer to the [ODBC Programmer's 
Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) 
documentation.
+
+HAWQ supports the DataDirect ODBC Driver. Installation instructions for 
this driver are provided on the Pivotal Network driver download page. Refer to 
[HAWQ ODBC 
Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23)
 for HAWQ-specific ODBC driver information.
+
+ Connection Data Source
+The information required by the HAWQ ODBC driver to connect to a database 
is typically stored in a named data source. Depending on your platform, you may 
use 
[GUI](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_through_a_GUI_14.html%23)
 or [command 
line](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_in_the_UNIX_2fLinux_odbc_13.html%23)
 tools to create your data source definition. On Linux, ODBC data sources are 
typically defined in a file named `odbc.ini`. 
+
+Commonly-specified HAWQ ODBC data source connection properties include:
+
+| Property Name| Value 
Description 

|

+|---|-|
+| Database | name of the database to which you want to connect |
+| Driver   | full path to the ODBC driver library file 
  |
+| HostName  | HAWQ master host name
 |
+| MaxLongVarcharSize  | maximum size of columns of type long varchar   

   |
+| Password  | password used to connect to the specified 
database
   |
+| PortNumber  | HAWQ master database port number   
|
--- End diff --

Let's initial-capitalize the second column.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #23: HAWQ-1095 - enhance database api docs

2016-10-18 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r83974977
  
--- Diff: clientaccess/g-database-application-interfaces.html.md.erb ---
@@ -1,8 +1,96 @@
 ---
-title: ODBC/JDBC Application Interfaces
+title: HAWQ Database Drivers and APIs
 ---
 
+You may want to connect your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The database application programming 
interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs.
 
-You may want to deploy your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The most commonly used database application 
programming interfaces with HAWQ are the ODBC and JDBC APIs. 
+HAWQ provides the following connectivity tools for connecting to the 
database:
+
+  - ODBC driver
+  - JDBC driver
+  - `libpq` - PostgreSQL C API
+
+## HAWQ Drivers
+
+ODBC and JDBC drivers for HAWQ are available as a separate download from 
Pivotal Network [Pivotal 
Network](https://network.pivotal.io/products/pivotal-hdb).
+
+### ODBC Driver
+
+The ODBC API specifies a standard set of C interfaces for accessing 
database management systems.  For additional information on using the ODBC API, 
refer to the [ODBC Programmer's 
Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) 
documentation.
+
+HAWQ supports the DataDirect ODBC Driver. Installation instructions for 
this driver are provided on the Pivotal Network driver download page. Refer to 
[HAWQ ODBC 
Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23)
 for HAWQ-specific ODBC driver information.
+
+ Connection Data Source
+The information required by the HAWQ ODBC driver to connect to a database 
is typically stored in a named data source. Depending on your platform, you may 
use 
[GUI](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_through_a_GUI_14.html%23)
 or [command 
line](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_in_the_UNIX_2fLinux_odbc_13.html%23)
 tools to create your data source definition. On Linux, ODBC data sources are 
typically defined in a file named `odbc.ini`. 
+
+Commonly-specified HAWQ ODBC data source connection properties include:
+
+| Property Name| Value 
Description 

|

+|---|-|
+| Database | name of the database to which you want to connect |
+| Driver   | full path to the ODBC driver library file 
  |
+| HostName  | HAWQ master host name
 |
+| MaxLongVarcharSize  | maximum size of columns of type long varchar   

   |
+| Password  | password used to connect to the specified 
database
   |
+| PortNumber  | HAWQ master database port number   
|
+
+Refer to [Connection Option 
Descriptions](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fgreenplum-connection-option-descriptions.html%23)
 for a list of ODBC connection properties supported by the HAWQ DataDirect ODBC 
driver.
+
+Example HAWQ DataDirect ODBC driver data source definition:
+
+``` shell
+[HAWQ-201]
+Driver=/usr/local/hawq_drivers/odbc/lib/ddgplm27.so
+Description=DataDirect 7.1 Greenplum Wire Protocol - for HAWQ
+Database=getstartdb
+HostName=hdm1
+PortNumber=5432
+Password=changeme
+MaxLongVarcharSize=8192
+```
+
+The first line, `[HAWQ-201]`, identifies the name of the data source.
+
+ODBC connection properties may also be specified in a connection string 
identifying either a data source name, the name of a file data source, or the 
name of a driver.  A HAWQ ODBC connection string has the following format:
+
+``` shell

+([DSN=]|[FILEDSN=]|[DRIVER=])[;

[GitHub] incubator-hawq-docs pull request #23: HAWQ-1095 - enhance database api docs

2016-10-18 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r83974668
  
--- Diff: clientaccess/g-database-application-interfaces.html.md.erb ---
@@ -1,8 +1,96 @@
 ---
-title: ODBC/JDBC Application Interfaces
+title: HAWQ Database Drivers and APIs
 ---
 
+You may want to connect your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The database application programming 
interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs.
 
-You may want to deploy your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The most commonly used database application 
programming interfaces with HAWQ are the ODBC and JDBC APIs. 
+HAWQ provides the following connectivity tools for connecting to the 
database:
+
+  - ODBC driver
+  - JDBC driver
+  - `libpq` - PostgreSQL C API
+
+## HAWQ Drivers
+
+ODBC and JDBC drivers for HAWQ are available as a separate download from 
Pivotal Network [Pivotal 
Network](https://network.pivotal.io/products/pivotal-hdb).
+
+### ODBC Driver
+
+The ODBC API specifies a standard set of C interfaces for accessing 
database management systems.  For additional information on using the ODBC API, 
refer to the [ODBC Programmer's 
Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) 
documentation.
+
+HAWQ supports the DataDirect ODBC Driver. Installation instructions for 
this driver are provided on the Pivotal Network driver download page. Refer to 
[HAWQ ODBC 
Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23)
 for HAWQ-specific ODBC driver information.
+
+ Connection Data Source
+The information required by the HAWQ ODBC driver to connect to a database 
is typically stored in a named data source. Depending on your platform, you may 
use 
[GUI](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_through_a_GUI_14.html%23)
 or [command 
line](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_in_the_UNIX_2fLinux_odbc_13.html%23)
 tools to create your data source definition. On Linux, ODBC data sources are 
typically defined in a file named `odbc.ini`. 
+
+Commonly-specified HAWQ ODBC data source connection properties include:
+
+| Property Name| Value 
Description 

|

+|---|-|
+| Database | name of the database to which you want to connect |
+| Driver   | full path to the ODBC driver library file 
  |
+| HostName  | HAWQ master host name
 |
+| MaxLongVarcharSize  | maximum size of columns of type long varchar   

   |
+| Password  | password used to connect to the specified 
database
   |
+| PortNumber  | HAWQ master database port number   
|
+
+Refer to [Connection Option 
Descriptions](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fgreenplum-connection-option-descriptions.html%23)
 for a list of ODBC connection properties supported by the HAWQ DataDirect ODBC 
driver.
+
+Example HAWQ DataDirect ODBC driver data source definition:
+
+``` shell
+[HAWQ-201]
+Driver=/usr/local/hawq_drivers/odbc/lib/ddgplm27.so
+Description=DataDirect 7.1 Greenplum Wire Protocol - for HAWQ
+Database=getstartdb
+HostName=hdm1
+PortNumber=5432
+Password=changeme
+MaxLongVarcharSize=8192
+```
+
+The first line, `[HAWQ-201]`, identifies the name of the data source.
+
+ODBC connection properties may also be specified in a connection string 
identifying either a data source name, the name of a file data source, or the 
name of a driver.  A HAWQ ODBC connection string has the following format:
+
+``` shell

+([DSN=]|[FILEDSN=]|[DRIVER=])[;

[GitHub] incubator-hawq-docs pull request #18: Feature/subnav register

2016-09-30 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/18#discussion_r81435562
  
--- Diff: master_middleman/source/subnavs/apache-hawq-nav.erb ---
@@ -263,6 +263,7 @@
   URL-based Web 
External Tables
 
   
+  Registering 
Files into HAWQ Internal Tables
--- End diff --

Jane - this entry doesn't look like a valid href (no .html) and it seems 
like a duplicate of the later entry.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81397690
  
--- Diff: reference/cli/admin_utilities/hawqregister.html.md.erb ---
@@ -187,5 +168,84 @@ group {
 | varchar  | varchar  |
 
 
+## Options
+
+**General Options**
+
+-? (show help)   
+Show help, then exit.
+
+-\\\-version   
+Show the version of this utility, then exit.
+
+
+**Connection Options**
+
+-h , -\\\-host \<hostname\> 
+Specifies the host name of the machine on which the HAWQ master 
database server is running. If not specified, reads from the environment 
variable `$PGHOST` or defaults to `localhost`.
+
+ -p , -\\\-port \<port\>  
+Specifies the TCP port on which the HAWQ master database server is 
listening for connections. If not specified, reads from the environment 
variable `$PGPORT` or defaults to 5432.
+
+-U , -\\\-user \<username\>  
+The database role name to connect as. If not specified, reads from the 
environment variable `$PGUSER` or defaults to the current system user name.
+
+-d  , -\\\-database \<databasename\>  
+The database to register the Parquet HDFS data into. The default is 
`postgres`
+
+-f , -\\\-filepath \<hdfspath\>
+The path of the file or directory in HDFS containing the files to be 
registered.
+ 
+\<tablename\> 
+The HAWQ table that will store the data to be registered. If the 
--config option is not supplied, the table cannot use hash distribution. Random 
table distribution is strongly preferred. If hash distribution must be used, 
make sure that the distribution policy for the data files described in the YAML 
file is consistent with the table being registered into.
+
+Miscellaneous Options
+
+The following options are used with specific use models.
+
+-e , -\\\-eof \<eof\>
+Specify the end of the file to be registered. \<eof\> represents the 
valid content length of the file, in bytes to be used, a value between 0 the 
actual size of the file. If this option is not included, the actual file size, 
or size of files within a folder, is used. Used with Use Model 1.
+
+-F , -\\\-force
+Used for disaster recovery of a cluster. Clears all HDFS-related 
catalog contents in `pg_aoseg.pg_paqseg_$relid `and re-registers files to a 
specified table. The HDFS files are not removed or modified. To use this option 
for recovery, data is assumed to be periodically imported to the cluster to be 
recovered. Used with Use Model 2.
+
+-c , -\\\-config \<yml_config\>  
+Registers files specified by YAML-format configuration files into 
HAWQ. Used with Use Model 2.
--- End diff --

Change Use -> Usage


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81386819
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \<hdfsfilepath\> is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
+
+-   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client configured and the hdfs filepath specified.
+-   The files to be registered and the HAWQ table must be located in the 
same HDFS cluster.
+-   The target table DDL is configured with the correct data type mapping.
+
+##Registering Externally Generated HDFS File Data to an Existing Table
+
+Files or folders in HDFS can be registered into an existing table, 
allowing them to be managed as a HAWQ internal table. When registering files, 
you can optionally specify the maximum amount of data to be loaded, in bytes, 
using the `--eof` option. If registering a folder, the actual file sizes are 
used. 
+
+Only HAWQ or Hive-generated Parquet tables are supported. Partitioned 
tables are not supported. Attempting to register these tables will result in an 
error.
+
+Metadata for the Parquet file(s) and the destination table must be 
consistent. Different  data types are used by HAWQ tables and Parquet files, so 
data must be mapped. You must verify that the structure of the parquet files 
and the HAWQ table are compatible before running `hawq register`. 
+
+We recommand creating a copy of the Parquet file to be registered before 
running ```hawq register```
+You can then then run ```hawq register``` on the copy,  leaving the 
original file available for additional Hive queries or if a data mapping error 
is encountered.
+
+###Limitations for Registering Hive Tables to HAWQ
+The currently-supported data types for generating Hive tables into HAWQ 
tables are: boolean, int, smallint, tinyint, bigint, float, double, string, 
binary, char, and varchar.  
+
+The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
+
+###Example: Registering a Hive-Generated Parquet File
+
+This example shows how to register a HIVE-generated parquet file in HDFS 
into the table `parquet_table` in HAWQ, which is in the database named 
`postgres`. The file path of the HIVE-generated file is 
`hdfs://localhost:8020/temp/hive.paq`.
+
+In this example, the location of the database is 
`hdfs://localhost:8020/hawq_default`, the tablespace id is 16385, the database 
id is 16387, the table filenode id is 77160, and the last file under the 
filenode is numbered 7.
+
+Enter:
+
+``` pre
+$ hawq register -d postgres -f hdfs://localhost:8020/temp/hive.paq 
parquet_table
+```
+
+After running the `hawq register` command for the file location  
`hdfs://localhost:8020/temp/hive.paq`, the corresponding new location of the 
file in HDFS is:  `hdfs://localhost:8020/hawq_default/16385/16387/77160/8`. 
+
+The command then updates the metadata of the table `parquet_table` in 
HAWQ, which is contained in the table `pg_aoseg.pg_paqseg_77160`. The pg\_aoseg 
is a fixed schema for row-oriented and Parquet AO tables. For row-oriented 
tables, the table name prefix is pg\_aoseg. The table name prefix for parquet 
tables is pg\_paqseg. 77160 is the relation id of the table.
+
+To locate the table, either find the relation ID by looking up the catalog 
table pg\_class in SQL by running 
+
+```
+select oid from pg_class where relname=$relname
+```
+or find the table name by using the SQL command 
+```
+select segrelid from pg_appendonly where relid = $relid
+```
+then running 
+```
+select relname from pg_class where oid = segrelid
+```
+
+##Registering Data Using Information from a YAML Configuration File
+ 
+The `hawq register` command can register HDFS

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81385547
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \<hdfsfilepath\> is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
+
+-   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client configured and the hdfs filepath specified.
+-   The files to be registered and the HAWQ table must be located in the 
same HDFS cluster.
+-   The target table DDL is configured with the correct data type mapping.
+
+##Registering Externally Generated HDFS File Data to an Existing Table
+
+Files or folders in HDFS can be registered into an existing table, 
allowing them to be managed as a HAWQ internal table. When registering files, 
you can optionally specify the maximum amount of data to be loaded, in bytes, 
using the `--eof` option. If registering a folder, the actual file sizes are 
used. 
+
+Only HAWQ or Hive-generated Parquet tables are supported. Partitioned 
tables are not supported. Attempting to register these tables will result in an 
error.
+
+Metadata for the Parquet file(s) and the destination table must be 
consistent. Different  data types are used by HAWQ tables and Parquet files, so 
data must be mapped. You must verify that the structure of the parquet files 
and the HAWQ table are compatible before running `hawq register`. 
+
+We recommand creating a copy of the Parquet file to be registered before 
running ```hawq register```
+You can then then run ```hawq register``` on the copy,  leaving the 
original file available for additional Hive queries or if a data mapping error 
is encountered.
+
+###Limitations for Registering Hive Tables to HAWQ
+The currently-supported data types for generating Hive tables into HAWQ 
tables are: boolean, int, smallint, tinyint, bigint, float, double, string, 
binary, char, and varchar.  
+
+The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
+
+###Example: Registering a Hive-Generated Parquet File
+
+This example shows how to register a HIVE-generated parquet file in HDFS 
into the table `parquet_table` in HAWQ, which is in the database named 
`postgres`. The file path of the HIVE-generated file is 
`hdfs://localhost:8020/temp/hive.paq`.
+
+In this example, the location of the database is 
`hdfs://localhost:8020/hawq_default`, the tablespace id is 16385, the database 
id is 16387, the table filenode id is 77160, and the last file under the 
filenode is numbered 7.
+
+Enter:
+
+``` pre
+$ hawq register -d postgres -f hdfs://localhost:8020/temp/hive.paq 
parquet_table
+```
+
+After running the `hawq register` command for the file location  
`hdfs://localhost:8020/temp/hive.paq`, the corresponding new location of the 
file in HDFS is:  `hdfs://localhost:8020/hawq_default/16385/16387/77160/8`. 
+
+The command then updates the metadata of the table `parquet_table` in 
HAWQ, which is contained in the table `pg_aoseg.pg_paqseg_77160`. The pg\_aoseg 
is a fixed schema for row-oriented and Parquet AO tables. For row-oriented 
tables, the table name prefix is pg\_aoseg. The table name prefix for parquet 
tables is pg\_paqseg. 77160 is the relation id of the table.
+
+To locate the table, either find the relation ID by looking up the catalog 
table pg\_class in SQL by running 
+
+```
+select oid from pg_class where relname=$relname
+```
+or find the table name by using the SQL command 
+```
+select segrelid from pg_appendonly where relid = $relid
+```
+then running 
+```
+select relname from pg_class where oid = segrelid
+```
+
+##Registering Data Using Information from a YAML Configuration File
+ 
+The `hawq register` command can register HDFS

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81391492
  
--- Diff: reference/cli/admin_utilities/hawqregister.html.md.erb ---
@@ -2,102 +2,83 @@
 title: hawq register
 ---
 
-Loads and registers external parquet-formatted data in HDFS into a 
corresponding table in HAWQ.
+Loads and registers 
+AO or Parquet-formatted data in HDFS into a corresponding table in HAWQ.
 
 ## Synopsis
 
 ``` pre
-hawq register
+Usage 1:
+hawq register [] [-f ] [-e ] 

+
+Usage 2:
+hawq register [] [-c ][--force] 

+
+Connection Options:
  [-h ] 
  [-p ] 
  [-U ] 
  [-d ]
- [-t ] 
+ 
+Misc. Options:
  [-f ] 
+[-e ]
+[--force] 
  [-c ]  
 hawq register help | -? 
 hawq register --version
 ```
 
 ## Prerequisites
 
-The client machine where `hawq register` is executed must have the 
following:
+The client machine where `hawq register` is executed must meet the 
following conditions:
 
 -   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
--- End diff --

See previous comments about this list.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81397353
  
--- Diff: reference/cli/admin_utilities/hawqregister.html.md.erb ---
@@ -2,102 +2,83 @@
 title: hawq register
 ---
 
-Loads and registers external parquet-formatted data in HDFS into a 
corresponding table in HAWQ.
+Loads and registers 
+AO or Parquet-formatted data in HDFS into a corresponding table in HAWQ.
 
 ## Synopsis
 
 ``` pre
-hawq register
+Usage 1:
+hawq register [] [-f ] [-e ] 

+
+Usage 2:
+hawq register [] [-c ][--force] 

+
+Connection Options:
  [-h ] 
  [-p ] 
  [-U ] 
  [-d ]
- [-t ] 
+ 
+Misc. Options:
  [-f ] 
+[-e ]
+[--force] 
  [-c ]  
 hawq register help | -? 
 hawq register --version
 ```
 
 ## Prerequisites
 
-The client machine where `hawq register` is executed must have the 
following:
+The client machine where `hawq register` is executed must meet the 
following conditions:
 
 -   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client must be configured and the hdfs filepath specified.
 -   The files to be registered and the HAWQ table located in the same HDFS 
cluster.
 -   The target table DDL is configured with the correct data type mapping.
 
 ## Description
 
-`hawq register` is a utility that loads and registers existing or external 
parquet data in HDFS into HAWQ, so that it can be directly ingested and 
accessed through HAWQ. Parquet data from the file or directory in the specified 
path is loaded into the appropriate HAWQ table directory in HDFS and the 
utility updates the corresponding HAWQ metadata for the files. 
+`hawq register` is a utility that loads and registers existing data files 
or folders in HDFS into HAWQ internal tables, allowing HAWQ to directly read 
the data and use internal table processing for operations such as transactions 
and high perforance, without needing to load or copy it. Data from the file or 
directory specified by \<hdfsfilepath\> is loaded into the appropriate HAWQ 
table directory in HDFS and the utility updates the corresponding HAWQ metadata 
for the files. 
 
-Only parquet tables can be loaded using the `hawq register` command. 
Metadata for the parquet file(s) and the destination table must be consistent. 
Different  data types are used by HAWQ tables and parquet tables, so the data 
is mapped. You must verify that the structure of the parquet files and the HAWQ 
table are compatible before running `hawq register`. 
+You can use `hawq register` to:
 
-Note: only HAWQ or HIVE-generated parquet tables are currently supported.
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster.
 
-###Limitations for Registering Hive Tables to HAWQ
-The currently-supported data types for generating Hive tables into HAWQ 
tables are: boolean, int, smallint, tinyint, bigint, float, double, string, 
binary, char, and varchar.  
+Two usage models are available.
 
-The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
+###Usage Model 1: register file data to an existing table.
 
+`hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-f 
filepath] [-e eof]`
 
-## Options
-
-**General Options**
-
--? (show help)   
-Show help, then exit.
-
--\\\-version   
-Show the version of this utility, then exit.
-
-
-**Connection Options**
-
--h \<hostname\> 
-Specifies the host name of the machine on which the HAWQ master 
database server is running. If not specified, reads from the environment 
variable `$PGHOST` or defaults to `localhost`.
-
- -p \<port\>  
-Specifies the TCP port on which the HAWQ master database server is 
listening for connections. If not specified, reads from the environment 
variable `$PGPORT` or defaults to 5432.
+Metadata for the Parquet file(s) and the destination table must be 
consistent. Different  data types are used by HAWQ tables and Parquet files, so 
the data is mapped. Refer to the section [Data Type 
Mapping](hawqregister.html#topic1__section7) below. You must verify that the 
structure of the Parquet files and the HAWQ table are compatible before running 
`hawq register`. 
 
--U \<username\>  
-The database role name to connect as. If not specified, reads from the 
environment variable `$PGUSER` or defaults to the current system user name.
+Limitations
+Only HAWQ o

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81384844
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \<hdfsfilepath\> is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
+
+-   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client configured and the hdfs filepath specified.
+-   The files to be registered and the HAWQ table must be located in the 
same HDFS cluster.
+-   The target table DDL is configured with the correct data type mapping.
+
+##Registering Externally Generated HDFS File Data to an Existing Table
+
+Files or folders in HDFS can be registered into an existing table, 
allowing them to be managed as a HAWQ internal table. When registering files, 
you can optionally specify the maximum amount of data to be loaded, in bytes, 
using the `--eof` option. If registering a folder, the actual file sizes are 
used. 
+
+Only HAWQ or Hive-generated Parquet tables are supported. Partitioned 
tables are not supported. Attempting to register these tables will result in an 
error.
+
+Metadata for the Parquet file(s) and the destination table must be 
consistent. Different  data types are used by HAWQ tables and Parquet files, so 
data must be mapped. You must verify that the structure of the parquet files 
and the HAWQ table are compatible before running `hawq register`. 
+
+We recommand creating a copy of the Parquet file to be registered before 
running ```hawq register```
+You can then then run ```hawq register``` on the copy,  leaving the 
original file available for additional Hive queries or if a data mapping error 
is encountered.
+
+###Limitations for Registering Hive Tables to HAWQ
+The currently-supported data types for generating Hive tables into HAWQ 
tables are: boolean, int, smallint, tinyint, bigint, float, double, string, 
binary, char, and varchar.  
+
+The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
+
+###Example: Registering a Hive-Generated Parquet File
+
+This example shows how to register a HIVE-generated parquet file in HDFS 
into the table `parquet_table` in HAWQ, which is in the database named 
`postgres`. The file path of the HIVE-generated file is 
`hdfs://localhost:8020/temp/hive.paq`.
+
+In this example, the location of the database is 
`hdfs://localhost:8020/hawq_default`, the tablespace id is 16385, the database 
id is 16387, the table filenode id is 77160, and the last file under the 
filenode is numbered 7.
+
+Enter:
+
+``` pre
+$ hawq register -d postgres -f hdfs://localhost:8020/temp/hive.paq 
parquet_table
+```
+
+After running the `hawq register` command for the file location  
`hdfs://localhost:8020/temp/hive.paq`, the corresponding new location of the 
file in HDFS is:  `hdfs://localhost:8020/hawq_default/16385/16387/77160/8`. 
+
+The command then updates the metadata of the table `parquet_table` in 
HAWQ, which is contained in the table `pg_aoseg.pg_paqseg_77160`. The pg\_aoseg 
is a fixed schema for row-oriented and Parquet AO tables. For row-oriented 
tables, the table name prefix is pg\_aoseg. The table name prefix for parquet 
tables is pg\_paqseg. 77160 is the relation id of the table.
+
+To locate the table, either find the relation ID by looking up the catalog 
table pg\_class in SQL by running 
+
+```
+select oid from pg_class where relname=$relname
+```
+or find the table name by using the SQL command 
+```
+select segrelid from pg_appendonly where relid = $relid
+```
+then running 
+```
+select relname from pg_class where oid = segrelid
+```
+
+##Registering Data Using Information from a YAML Configuration File
+ 
+The `hawq register` command can register HDFS

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81391277
  
--- Diff: reference/cli/admin_utilities/hawqregister.html.md.erb ---
@@ -2,102 +2,83 @@
 title: hawq register
 ---
 
-Loads and registers external parquet-formatted data in HDFS into a 
corresponding table in HAWQ.
+Loads and registers 
+AO or Parquet-formatted data in HDFS into a corresponding table in HAWQ.
--- End diff --

I still think this needs to say something other than AO formatted data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81384280
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \<hdfsfilepath\> is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
+
+-   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client configured and the hdfs filepath specified.
+-   The files to be registered and the HAWQ table must be located in the 
same HDFS cluster.
+-   The target table DDL is configured with the correct data type mapping.
+
+##Registering Externally Generated HDFS File Data to an Existing Table
+
+Files or folders in HDFS can be registered into an existing table, 
allowing them to be managed as a HAWQ internal table. When registering files, 
you can optionally specify the maximum amount of data to be loaded, in bytes, 
using the `--eof` option. If registering a folder, the actual file sizes are 
used. 
+
+Only HAWQ or Hive-generated Parquet tables are supported. Partitioned 
tables are not supported. Attempting to register these tables will result in an 
error.
+
+Metadata for the Parquet file(s) and the destination table must be 
consistent. Different  data types are used by HAWQ tables and Parquet files, so 
data must be mapped. You must verify that the structure of the parquet files 
and the HAWQ table are compatible before running `hawq register`. 
+
+We recommand creating a copy of the Parquet file to be registered before 
running ```hawq register```
+You can then then run ```hawq register``` on the copy,  leaving the 
original file available for additional Hive queries or if a data mapping error 
is encountered.
+
+###Limitations for Registering Hive Tables to HAWQ
+The currently-supported data types for generating Hive tables into HAWQ 
tables are: boolean, int, smallint, tinyint, bigint, float, double, string, 
binary, char, and varchar.  
+
+The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
+
+###Example: Registering a Hive-Generated Parquet File
+
+This example shows how to register a HIVE-generated parquet file in HDFS 
into the table `parquet_table` in HAWQ, which is in the database named 
`postgres`. The file path of the HIVE-generated file is 
`hdfs://localhost:8020/temp/hive.paq`.
+
+In this example, the location of the database is 
`hdfs://localhost:8020/hawq_default`, the tablespace id is 16385, the database 
id is 16387, the table filenode id is 77160, and the last file under the 
filenode is numbered 7.
+
+Enter:
+
+``` pre
+$ hawq register -d postgres -f hdfs://localhost:8020/temp/hive.paq 
parquet_table
+```
+
+After running the `hawq register` command for the file location  
`hdfs://localhost:8020/temp/hive.paq`, the corresponding new location of the 
file in HDFS is:  `hdfs://localhost:8020/hawq_default/16385/16387/77160/8`. 
+
+The command then updates the metadata of the table `parquet_table` in 
HAWQ, which is contained in the table `pg_aoseg.pg_paqseg_77160`. The pg\_aoseg 
is a fixed schema for row-oriented and Parquet AO tables. For row-oriented 
tables, the table name prefix is pg\_aoseg. The table name prefix for parquet 
tables is pg\_paqseg. 77160 is the relation id of the table.
--- End diff --

Change "The pg\_aoseg is" to "The pg\_aoseg table is"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81386257
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \<hdfsfilepath\> is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
+
+-   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client configured and the hdfs filepath specified.
+-   The files to be registered and the HAWQ table must be located in the 
same HDFS cluster.
+-   The target table DDL is configured with the correct data type mapping.
+
+##Registering Externally Generated HDFS File Data to an Existing Table
+
+Files or folders in HDFS can be registered into an existing table, 
allowing them to be managed as a HAWQ internal table. When registering files, 
you can optionally specify the maximum amount of data to be loaded, in bytes, 
using the `--eof` option. If registering a folder, the actual file sizes are 
used. 
+
+Only HAWQ or Hive-generated Parquet tables are supported. Partitioned 
tables are not supported. Attempting to register these tables will result in an 
error.
+
+Metadata for the Parquet file(s) and the destination table must be 
consistent. Different  data types are used by HAWQ tables and Parquet files, so 
data must be mapped. You must verify that the structure of the parquet files 
and the HAWQ table are compatible before running `hawq register`. 
+
+We recommand creating a copy of the Parquet file to be registered before 
running ```hawq register```
+You can then then run ```hawq register``` on the copy,  leaving the 
original file available for additional Hive queries or if a data mapping error 
is encountered.
+
+###Limitations for Registering Hive Tables to HAWQ
+The currently-supported data types for generating Hive tables into HAWQ 
tables are: boolean, int, smallint, tinyint, bigint, float, double, string, 
binary, char, and varchar.  
+
+The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
+
+###Example: Registering a Hive-Generated Parquet File
+
+This example shows how to register a HIVE-generated parquet file in HDFS 
into the table `parquet_table` in HAWQ, which is in the database named 
`postgres`. The file path of the HIVE-generated file is 
`hdfs://localhost:8020/temp/hive.paq`.
+
+In this example, the location of the database is 
`hdfs://localhost:8020/hawq_default`, the tablespace id is 16385, the database 
id is 16387, the table filenode id is 77160, and the last file under the 
filenode is numbered 7.
+
+Enter:
+
+``` pre
+$ hawq register -d postgres -f hdfs://localhost:8020/temp/hive.paq 
parquet_table
+```
+
+After running the `hawq register` command for the file location  
`hdfs://localhost:8020/temp/hive.paq`, the corresponding new location of the 
file in HDFS is:  `hdfs://localhost:8020/hawq_default/16385/16387/77160/8`. 
+
+The command then updates the metadata of the table `parquet_table` in 
HAWQ, which is contained in the table `pg_aoseg.pg_paqseg_77160`. The pg\_aoseg 
is a fixed schema for row-oriented and Parquet AO tables. For row-oriented 
tables, the table name prefix is pg\_aoseg. The table name prefix for parquet 
tables is pg\_paqseg. 77160 is the relation id of the table.
+
+To locate the table, either find the relation ID by looking up the catalog 
table pg\_class in SQL by running 
+
+```
+select oid from pg_class where relname=$relname
+```
+or find the table name by using the SQL command 
+```
+select segrelid from pg_appendonly where relid = $relid
+```
+then running 
+```
+select relname from pg_class where oid = segrelid
+```
+
+##Registering Data Using Information from a YAML Configuration File
+ 
+The `hawq register` command can register HDFS

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81390790
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \<hdfsfilepath\> is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
+
+-   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client configured and the hdfs filepath specified.
+-   The files to be registered and the HAWQ table must be located in the 
same HDFS cluster.
+-   The target table DDL is configured with the correct data type mapping.
+
+##Registering Externally Generated HDFS File Data to an Existing Table
+
+Files or folders in HDFS can be registered into an existing table, 
allowing them to be managed as a HAWQ internal table. When registering files, 
you can optionally specify the maximum amount of data to be loaded, in bytes, 
using the `--eof` option. If registering a folder, the actual file sizes are 
used. 
+
+Only HAWQ or Hive-generated Parquet tables are supported. Partitioned 
tables are not supported. Attempting to register these tables will result in an 
error.
+
+Metadata for the Parquet file(s) and the destination table must be 
consistent. Different  data types are used by HAWQ tables and Parquet files, so 
data must be mapped. You must verify that the structure of the parquet files 
and the HAWQ table are compatible before running `hawq register`. 
+
+We recommand creating a copy of the Parquet file to be registered before 
running ```hawq register```
+You can then then run ```hawq register``` on the copy,  leaving the 
original file available for additional Hive queries or if a data mapping error 
is encountered.
+
+###Limitations for Registering Hive Tables to HAWQ
+The currently-supported data types for generating Hive tables into HAWQ 
tables are: boolean, int, smallint, tinyint, bigint, float, double, string, 
binary, char, and varchar.  
+
+The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
+
+###Example: Registering a Hive-Generated Parquet File
+
+This example shows how to register a HIVE-generated parquet file in HDFS 
into the table `parquet_table` in HAWQ, which is in the database named 
`postgres`. The file path of the HIVE-generated file is 
`hdfs://localhost:8020/temp/hive.paq`.
+
+In this example, the location of the database is 
`hdfs://localhost:8020/hawq_default`, the tablespace id is 16385, the database 
id is 16387, the table filenode id is 77160, and the last file under the 
filenode is numbered 7.
+
+Enter:
+
+``` pre
+$ hawq register -d postgres -f hdfs://localhost:8020/temp/hive.paq 
parquet_table
+```
+
+After running the `hawq register` command for the file location  
`hdfs://localhost:8020/temp/hive.paq`, the corresponding new location of the 
file in HDFS is:  `hdfs://localhost:8020/hawq_default/16385/16387/77160/8`. 
+
+The command then updates the metadata of the table `parquet_table` in 
HAWQ, which is contained in the table `pg_aoseg.pg_paqseg_77160`. The pg\_aoseg 
is a fixed schema for row-oriented and Parquet AO tables. For row-oriented 
tables, the table name prefix is pg\_aoseg. The table name prefix for parquet 
tables is pg\_paqseg. 77160 is the relation id of the table.
+
+To locate the table, either find the relation ID by looking up the catalog 
table pg\_class in SQL by running 
+
+```
+select oid from pg_class where relname=$relname
+```
+or find the table name by using the SQL command 
+```
+select segrelid from pg_appendonly where relid = $relid
+```
+then running 
+```
+select relname from pg_class where oid = segrelid
+```
+
+##Registering Data Using Information from a YAML Configuration File
+ 
+The `hawq register` command can register HDFS

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81386025
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \<hdfsfilepath\> is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
+
+-   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client configured and the hdfs filepath specified.
+-   The files to be registered and the HAWQ table must be located in the 
same HDFS cluster.
+-   The target table DDL is configured with the correct data type mapping.
+
+##Registering Externally Generated HDFS File Data to an Existing Table
+
+Files or folders in HDFS can be registered into an existing table, 
allowing them to be managed as a HAWQ internal table. When registering files, 
you can optionally specify the maximum amount of data to be loaded, in bytes, 
using the `--eof` option. If registering a folder, the actual file sizes are 
used. 
+
+Only HAWQ or Hive-generated Parquet tables are supported. Partitioned 
tables are not supported. Attempting to register these tables will result in an 
error.
+
+Metadata for the Parquet file(s) and the destination table must be 
consistent. Different  data types are used by HAWQ tables and Parquet files, so 
data must be mapped. You must verify that the structure of the parquet files 
and the HAWQ table are compatible before running `hawq register`. 
+
+We recommand creating a copy of the Parquet file to be registered before 
running ```hawq register```
+You can then then run ```hawq register``` on the copy,  leaving the 
original file available for additional Hive queries or if a data mapping error 
is encountered.
+
+###Limitations for Registering Hive Tables to HAWQ
+The currently-supported data types for generating Hive tables into HAWQ 
tables are: boolean, int, smallint, tinyint, bigint, float, double, string, 
binary, char, and varchar.  
+
+The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
+
+###Example: Registering a Hive-Generated Parquet File
+
+This example shows how to register a HIVE-generated parquet file in HDFS 
into the table `parquet_table` in HAWQ, which is in the database named 
`postgres`. The file path of the HIVE-generated file is 
`hdfs://localhost:8020/temp/hive.paq`.
+
+In this example, the location of the database is 
`hdfs://localhost:8020/hawq_default`, the tablespace id is 16385, the database 
id is 16387, the table filenode id is 77160, and the last file under the 
filenode is numbered 7.
+
+Enter:
+
+``` pre
+$ hawq register -d postgres -f hdfs://localhost:8020/temp/hive.paq 
parquet_table
+```
+
+After running the `hawq register` command for the file location  
`hdfs://localhost:8020/temp/hive.paq`, the corresponding new location of the 
file in HDFS is:  `hdfs://localhost:8020/hawq_default/16385/16387/77160/8`. 
+
+The command then updates the metadata of the table `parquet_table` in 
HAWQ, which is contained in the table `pg_aoseg.pg_paqseg_77160`. The pg\_aoseg 
is a fixed schema for row-oriented and Parquet AO tables. For row-oriented 
tables, the table name prefix is pg\_aoseg. The table name prefix for parquet 
tables is pg\_paqseg. 77160 is the relation id of the table.
+
+To locate the table, either find the relation ID by looking up the catalog 
table pg\_class in SQL by running 
+
+```
+select oid from pg_class where relname=$relname
+```
+or find the table name by using the SQL command 
+```
+select segrelid from pg_appendonly where relid = $relid
+```
+then running 
+```
+select relname from pg_class where oid = segrelid
+```
+
+##Registering Data Using Information from a YAML Configuration File
+ 
+The `hawq register` command can register HDFS

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81386403
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \<hdfsfilepath\> is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
+
+-   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client configured and the hdfs filepath specified.
+-   The files to be registered and the HAWQ table must be located in the 
same HDFS cluster.
+-   The target table DDL is configured with the correct data type mapping.
+
+##Registering Externally Generated HDFS File Data to an Existing Table
+
+Files or folders in HDFS can be registered into an existing table, 
allowing them to be managed as a HAWQ internal table. When registering files, 
you can optionally specify the maximum amount of data to be loaded, in bytes, 
using the `--eof` option. If registering a folder, the actual file sizes are 
used. 
+
+Only HAWQ or Hive-generated Parquet tables are supported. Partitioned 
tables are not supported. Attempting to register these tables will result in an 
error.
+
+Metadata for the Parquet file(s) and the destination table must be 
consistent. Different  data types are used by HAWQ tables and Parquet files, so 
data must be mapped. You must verify that the structure of the parquet files 
and the HAWQ table are compatible before running `hawq register`. 
+
+We recommand creating a copy of the Parquet file to be registered before 
running ```hawq register```
+You can then then run ```hawq register``` on the copy,  leaving the 
original file available for additional Hive queries or if a data mapping error 
is encountered.
+
+###Limitations for Registering Hive Tables to HAWQ
+The currently-supported data types for generating Hive tables into HAWQ 
tables are: boolean, int, smallint, tinyint, bigint, float, double, string, 
binary, char, and varchar.  
+
+The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
+
+###Example: Registering a Hive-Generated Parquet File
+
+This example shows how to register a HIVE-generated parquet file in HDFS 
into the table `parquet_table` in HAWQ, which is in the database named 
`postgres`. The file path of the HIVE-generated file is 
`hdfs://localhost:8020/temp/hive.paq`.
+
+In this example, the location of the database is 
`hdfs://localhost:8020/hawq_default`, the tablespace id is 16385, the database 
id is 16387, the table filenode id is 77160, and the last file under the 
filenode is numbered 7.
+
+Enter:
+
+``` pre
+$ hawq register -d postgres -f hdfs://localhost:8020/temp/hive.paq 
parquet_table
+```
+
+After running the `hawq register` command for the file location  
`hdfs://localhost:8020/temp/hive.paq`, the corresponding new location of the 
file in HDFS is:  `hdfs://localhost:8020/hawq_default/16385/16387/77160/8`. 
+
+The command then updates the metadata of the table `parquet_table` in 
HAWQ, which is contained in the table `pg_aoseg.pg_paqseg_77160`. The pg\_aoseg 
is a fixed schema for row-oriented and Parquet AO tables. For row-oriented 
tables, the table name prefix is pg\_aoseg. The table name prefix for parquet 
tables is pg\_paqseg. 77160 is the relation id of the table.
+
+To locate the table, either find the relation ID by looking up the catalog 
table pg\_class in SQL by running 
+
+```
+select oid from pg_class where relname=$relname
+```
+or find the table name by using the SQL command 
+```
+select segrelid from pg_appendonly where relid = $relid
+```
+then running 
+```
+select relname from pg_class where oid = segrelid
+```
+
+##Registering Data Using Information from a YAML Configuration File
+ 
+The `hawq register` command can register HDFS

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81383227
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \<hdfsfilepath\> is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
+
+-   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client configured and the hdfs filepath specified.
+-   The files to be registered and the HAWQ table must be located in the 
same HDFS cluster.
+-   The target table DDL is configured with the correct data type mapping.
+
+##Registering Externally Generated HDFS File Data to an Existing Table
--- End diff --

Global: ID entries need to appear before the title text, like ## Registering...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81383986
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \<hdfsfilepath\> is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
+
+-   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client configured and the hdfs filepath specified.
+-   The files to be registered and the HAWQ table must be located in the 
same HDFS cluster.
+-   The target table DDL is configured with the correct data type mapping.
+
+##Registering Externally Generated HDFS File Data to an Existing Table
+
+Files or folders in HDFS can be registered into an existing table, 
allowing them to be managed as a HAWQ internal table. When registering files, 
you can optionally specify the maximum amount of data to be loaded, in bytes, 
using the `--eof` option. If registering a folder, the actual file sizes are 
used. 
+
+Only HAWQ or Hive-generated Parquet tables are supported. Partitioned 
tables are not supported. Attempting to register these tables will result in an 
error.
+
+Metadata for the Parquet file(s) and the destination table must be 
consistent. Different  data types are used by HAWQ tables and Parquet files, so 
data must be mapped. You must verify that the structure of the parquet files 
and the HAWQ table are compatible before running `hawq register`. 
+
+We recommand creating a copy of the Parquet file to be registered before 
running ```hawq register```
+You can then then run ```hawq register``` on the copy,  leaving the 
original file available for additional Hive queries or if a data mapping error 
is encountered.
+
+###Limitations for Registering Hive Tables to HAWQ
+The currently-supported data types for generating Hive tables into HAWQ 
tables are: boolean, int, smallint, tinyint, bigint, float, double, string, 
binary, char, and varchar.  
+
+The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
+
+###Example: Registering a Hive-Generated Parquet File
+
+This example shows how to register a HIVE-generated parquet file in HDFS 
into the table `parquet_table` in HAWQ, which is in the database named 
`postgres`. The file path of the HIVE-generated file is 
`hdfs://localhost:8020/temp/hive.paq`.
+
+In this example, the location of the database is 
`hdfs://localhost:8020/hawq_default`, the tablespace id is 16385, the database 
id is 16387, the table filenode id is 77160, and the last file under the 
filenode is numbered 7.
--- End diff --

For future work, it would be nice to provide commands for determining what 
these ID values will be before executing the register command.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81386491
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \<hdfsfilepath\> is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
+
+-   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client configured and the hdfs filepath specified.
+-   The files to be registered and the HAWQ table must be located in the 
same HDFS cluster.
+-   The target table DDL is configured with the correct data type mapping.
+
+##Registering Externally Generated HDFS File Data to an Existing Table
+
+Files or folders in HDFS can be registered into an existing table, 
allowing them to be managed as a HAWQ internal table. When registering files, 
you can optionally specify the maximum amount of data to be loaded, in bytes, 
using the `--eof` option. If registering a folder, the actual file sizes are 
used. 
+
+Only HAWQ or Hive-generated Parquet tables are supported. Partitioned 
tables are not supported. Attempting to register these tables will result in an 
error.
+
+Metadata for the Parquet file(s) and the destination table must be 
consistent. Different  data types are used by HAWQ tables and Parquet files, so 
data must be mapped. You must verify that the structure of the parquet files 
and the HAWQ table are compatible before running `hawq register`. 
+
+We recommand creating a copy of the Parquet file to be registered before 
running ```hawq register```
+You can then then run ```hawq register``` on the copy,  leaving the 
original file available for additional Hive queries or if a data mapping error 
is encountered.
+
+###Limitations for Registering Hive Tables to HAWQ
+The currently-supported data types for generating Hive tables into HAWQ 
tables are: boolean, int, smallint, tinyint, bigint, float, double, string, 
binary, char, and varchar.  
+
+The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
+
+###Example: Registering a Hive-Generated Parquet File
+
+This example shows how to register a HIVE-generated parquet file in HDFS 
into the table `parquet_table` in HAWQ, which is in the database named 
`postgres`. The file path of the HIVE-generated file is 
`hdfs://localhost:8020/temp/hive.paq`.
+
+In this example, the location of the database is 
`hdfs://localhost:8020/hawq_default`, the tablespace id is 16385, the database 
id is 16387, the table filenode id is 77160, and the last file under the 
filenode is numbered 7.
+
+Enter:
+
+``` pre
+$ hawq register -d postgres -f hdfs://localhost:8020/temp/hive.paq 
parquet_table
+```
+
+After running the `hawq register` command for the file location  
`hdfs://localhost:8020/temp/hive.paq`, the corresponding new location of the 
file in HDFS is:  `hdfs://localhost:8020/hawq_default/16385/16387/77160/8`. 
+
+The command then updates the metadata of the table `parquet_table` in 
HAWQ, which is contained in the table `pg_aoseg.pg_paqseg_77160`. The pg\_aoseg 
is a fixed schema for row-oriented and Parquet AO tables. For row-oriented 
tables, the table name prefix is pg\_aoseg. The table name prefix for parquet 
tables is pg\_paqseg. 77160 is the relation id of the table.
+
+To locate the table, either find the relation ID by looking up the catalog 
table pg\_class in SQL by running 
+
+```
+select oid from pg_class where relname=$relname
+```
+or find the table name by using the SQL command 
+```
+select segrelid from pg_appendonly where relid = $relid
+```
+then running 
+```
+select relname from pg_class where oid = segrelid
+```
+
+##Registering Data Using Information from a YAML Configuration File
+ 
+The `hawq register` command can register HDFS

[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81391750
  
--- Diff: reference/cli/admin_utilities/hawqregister.html.md.erb ---
@@ -2,102 +2,83 @@
 title: hawq register
 ---
 
-Loads and registers external parquet-formatted data in HDFS into a 
corresponding table in HAWQ.
+Loads and registers 
+AO or Parquet-formatted data in HDFS into a corresponding table in HAWQ.
 
 ## Synopsis
 
 ``` pre
-hawq register
+Usage 1:
+hawq register [] [-f ] [-e ] 

+
+Usage 2:
+hawq register [] [-c ][--force] 

+
+Connection Options:
  [-h ] 
  [-p ] 
  [-U ] 
  [-d ]
- [-t ] 
+ 
+Misc. Options:
  [-f ] 
+[-e ]
+[--force] 
  [-c ]  
 hawq register help | -? 
 hawq register --version
 ```
 
 ## Prerequisites
 
-The client machine where `hawq register` is executed must have the 
following:
+The client machine where `hawq register` is executed must meet the 
following conditions:
 
 -   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client must be configured and the hdfs filepath specified.
 -   The files to be registered and the HAWQ table located in the same HDFS 
cluster.
 -   The target table DDL is configured with the correct data type mapping.
 
 ## Description
 
-`hawq register` is a utility that loads and registers existing or external 
parquet data in HDFS into HAWQ, so that it can be directly ingested and 
accessed through HAWQ. Parquet data from the file or directory in the specified 
path is loaded into the appropriate HAWQ table directory in HDFS and the 
utility updates the corresponding HAWQ metadata for the files. 
+`hawq register` is a utility that loads and registers existing data files 
or folders in HDFS into HAWQ internal tables, allowing HAWQ to directly read 
the data and use internal table processing for operations such as transactions 
and high perforance, without needing to load or copy it. Data from the file or 
directory specified by \<hdfsfilepath\> is loaded into the appropriate HAWQ 
table directory in HDFS and the utility updates the corresponding HAWQ metadata 
for the files. 
 
-Only parquet tables can be loaded using the `hawq register` command. 
Metadata for the parquet file(s) and the destination table must be consistent. 
Different  data types are used by HAWQ tables and parquet tables, so the data 
is mapped. You must verify that the structure of the parquet files and the HAWQ 
table are compatible before running `hawq register`. 
+You can use `hawq register` to:
 
-Note: only HAWQ or HIVE-generated parquet tables are currently supported.
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster.
 
-###Limitations for Registering Hive Tables to HAWQ
-The currently-supported data types for generating Hive tables into HAWQ 
tables are: boolean, int, smallint, tinyint, bigint, float, double, string, 
binary, char, and varchar.  
+Two usage models are available.
 
-The following HIVE data types cannot be converted to HAWQ equivalents: 
timestamp, decimal, array, struct, map, and union.   
+###Usage Model 1: register file data to an existing table.
--- End diff --

Capitalize "Register"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81382521
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \<hdfsfilepath\> is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
--- End diff --

Need to change "client server" to one or the other, I think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81382033
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \<hdfsfilepath\> is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
--- End diff --

This sentence has problems.  Maybe change it to:  Either AO **or** 
Parquet-formatted **files** in HDFS can be loaded into a corresponding table in 
HAWQ.  

But I'm not sure that a *file* can be AO.  Doesn't append-only apply to the 
associated tables?  Maybe a different term is required here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-docs pull request #17: Updates for hawq register

2016-09-30 Thread dyozie
Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81383086
  
--- Diff: datamgmt/load/g-register_files.html.md.erb ---
@@ -0,0 +1,213 @@
+---
+title: Registering Files into HAWQ Internal Tables
+---
+
+The `hawq register` utility loads and registers HDFS data files or folders 
into HAWQ internal tables. Files can be read directly, rather than having to be 
copied or loaded, resulting in higher performance and more efficient 
transaction processing.
+
+Data from the file or directory specified by \<hdfsfilepath\> is loaded 
into the appropriate HAWQ table directory in HDFS and the utility updates the 
corresponding HAWQ metadata for the files. Either AO for Parquet-formatted in 
HDFS can be loaded into a corresponding table in HAWQ.
+
+You can use `hawq register` either to:
+
+-  Load and register external Parquet-formatted file data generated by an 
external system such as Hive or Spark.
+-  Recover cluster data from a backup cluster for disaster recovery. 
+
+Requirements for running `hawq register` on the client server are:
+
+-   Network access to and from all hosts in your HAWQ cluster (master and 
segments) and the hosts where the data to be loaded is located.
+-   The Hadoop client configured and the hdfs filepath specified.
+-   The files to be registered and the HAWQ table must be located in the 
same HDFS cluster.
+-   The target table DDL is configured with the correct data type mapping.
+
--- End diff --

Need to make the above list items parallel.  And each should be a 
stand-alone sentence if you keep the punctuation after them.  Ie) change to 
"Network access is available between all hosts in your HAWQ cluster (master and 
segments) and the hosts from which the data to load will be loaded."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq-site pull request: Updating doc links to point to h...

2016-04-21 Thread dyozie
GitHub user dyozie opened a pull request:

https://github.com/apache/incubator-hawq-site/pull/5

Updating doc links to point to hdb.docs.pivotal.io

The current link to hawq.docs.pivota..io describes the 1.3 (pre Apache 
HAWQ) code, which differs significantly from the current code. 
hdb.docs.pivotal.io is a better match until we open the docs source directly on 
git.apache.org.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dyozie/incubator-hawq-site patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq-site/pull/5.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5


commit a1bc79231a2b0f7dd255f47b74aae568ee82e657
Author: David Yozie <dyo...@pivotal.io>
Date:   2016-04-21T18:25:50Z

Updating doc links to point to hdb.docs.pivotal.io (hawq.docs.pivotal.io is 
based on older code that differs significantly from Apache HAWQ)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


<    1   2   3