[jira] [Commented] (HAWQ-1554) Add registered trademark symbol (®) to website.

2017-11-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16266481#comment-16266481
 ] 

ASF GitHub Bot commented on HAWQ-1554:
--

GitHub user edespino opened a pull request:

https://github.com/apache/incubator-hawq-site/pull/14

HAWQ-1554. Add registered trademark symbol (®) to website.

This task tracks the update of the website's home page and download
page (or section of the website) to add the ® character after the first
and most prominent mentions of HAWQ in any text (i.e. not inside graphics).


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/edespino/incubator-hawq-site HAWQ-1554

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq-site/pull/14.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14


commit bfcc79a5d5e1fc7ab945f57bfb77a368f8a3dd1e
Author: Ed Espino 
Date:   2017-11-27T08:00:08Z

HAWQ-1554. Add registered trademark symbol (®) to website.




> Add registered trademark symbol (®) to website.
> ---
>
> Key: HAWQ-1554
> URL: https://issues.apache.org/jira/browse/HAWQ-1554
> Project: Apache HAWQ
>  Issue Type: Task
>  Components: Documentation
>Reporter: Ed Espino
>Assignee: Ed Espino
>
> Since HAWQ® is a registered trademark, we need to ensure we mark it 
> appropriately to ensure that we can preserve our rights.
> This task tracks the update of the website's home page and download
> page (or section of the website) to add the ® character after the first
> and most prominent mentions of HAWQ in any text (i.e. not inside graphics).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HAWQ-1554) Add registered trademark symbol (®) to website.

2017-11-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16266486#comment-16266486
 ] 

ASF GitHub Bot commented on HAWQ-1554:
--

Github user huor commented on the issue:

https://github.com/apache/incubator-hawq-site/pull/14
  
Looks good, +1


> Add registered trademark symbol (®) to website.
> ---
>
> Key: HAWQ-1554
> URL: https://issues.apache.org/jira/browse/HAWQ-1554
> Project: Apache HAWQ
>  Issue Type: Task
>  Components: Documentation
>Reporter: Ed Espino
>Assignee: Ed Espino
>
> Since HAWQ® is a registered trademark, we need to ensure we mark it 
> appropriately to ensure that we can preserve our rights.
> This task tracks the update of the website's home page and download
> page (or section of the website) to add the ® character after the first
> and most prominent mentions of HAWQ in any text (i.e. not inside graphics).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HAWQ-1554) Add registered trademark symbol (®) to website.

2017-11-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16266497#comment-16266497
 ] 

ASF GitHub Bot commented on HAWQ-1554:
--

Github user radarwave commented on the issue:

https://github.com/apache/incubator-hawq-site/pull/14
  
LGTM +1


> Add registered trademark symbol (®) to website.
> ---
>
> Key: HAWQ-1554
> URL: https://issues.apache.org/jira/browse/HAWQ-1554
> Project: Apache HAWQ
>  Issue Type: Task
>  Components: Documentation
>Reporter: Ed Espino
>Assignee: Ed Espino
>
> Since HAWQ® is a registered trademark, we need to ensure we mark it 
> appropriately to ensure that we can preserve our rights.
> This task tracks the update of the website's home page and download
> page (or section of the website) to add the ® character after the first
> and most prominent mentions of HAWQ in any text (i.e. not inside graphics).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HAWQ-1554) Add registered trademark symbol (®) to website.

2017-11-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16266501#comment-16266501
 ] 

ASF GitHub Bot commented on HAWQ-1554:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-hawq-site/pull/14


> Add registered trademark symbol (®) to website.
> ---
>
> Key: HAWQ-1554
> URL: https://issues.apache.org/jira/browse/HAWQ-1554
> Project: Apache HAWQ
>  Issue Type: Task
>  Components: Documentation
>Reporter: Ed Espino
>Assignee: Ed Espino
>
> Since HAWQ® is a registered trademark, we need to ensure we mark it 
> appropriately to ensure that we can preserve our rights.
> This task tracks the update of the website's home page and download
> page (or section of the website) to add the ® character after the first
> and most prominent mentions of HAWQ in any text (i.e. not inside graphics).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HAWQ-1553) User who doesn't have home directory can not run hawq extract command

2017-12-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276972#comment-16276972
 ] 

ASF GitHub Bot commented on HAWQ-1553:
--

GitHub user outofmem0ry opened a pull request:

https://github.com/apache/incubator-hawq-docs/pull/134

HAWQ-1553 Add option to hawq extract to specify log directory



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/outofmem0ry/incubator-hawq-docs 
document/HAWQ-1553

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq-docs/pull/134.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #134


commit 7f89466967bfe3d4b3f8f348f83996f8048d5a33
Author: Shubham Sharma 
Date:   2017-12-04T15:31:33Z

HAWQ-1553 Add option to hawq extract to specify log directory




> User who doesn't have home directory can not run hawq extract command
> -
>
> Key: HAWQ-1553
> URL: https://issues.apache.org/jira/browse/HAWQ-1553
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Command Line Tools
>Reporter: Shubham Sharma
>Assignee: Radar Lei
>
> HAWQ extract stores information in hawqextract_MMDD.log under directory 
> ~/hawqAdminLogs, and a user who doesn't have it's own home directory 
> encounters failure when running hawq extract.
> We can add -l option in order to set the target log directory for hawq 
> extract.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HAWQ-1368) normal user who doesn't have home directory may have problem when running hawq register

2017-12-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276975#comment-16276975
 ] 

ASF GitHub Bot commented on HAWQ-1368:
--

GitHub user outofmem0ry opened a pull request:

https://github.com/apache/incubator-hawq-docs/pull/135

HAWQ-1368 Add option to hawq register to specify log directory



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/outofmem0ry/incubator-hawq-docs 
document/HAWQ-1368

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq-docs/pull/135.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #135


commit 1ac0fb26d0c358876c3a17c678b761132ccc6941
Author: Shubham Sharma 
Date:   2017-12-04T15:42:15Z

HAWQ-1368 Add option to hawq register to specify log directory




> normal user who doesn't have home directory may have problem when running 
> hawq register
> ---
>
> Key: HAWQ-1368
> URL: https://issues.apache.org/jira/browse/HAWQ-1368
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Command Line Tools
>Reporter: Lili Ma
>Assignee: Radar Lei
> Fix For: backlog
>
>
> HAWQ register stores information in hawqregister_MMDD.log under directory 
> ~/hawqAdminLogs, and normal user who doesn't have own home directory may 
> encounter failure when running hawq regsiter.
> We can add -l option in order to set the target log directory and file name 
> of hawq register.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HAWQ-1562) Incorrect path to default log directory in documentation

2017-12-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16277017#comment-16277017
 ] 

ASF GitHub Bot commented on HAWQ-1562:
--

GitHub user outofmem0ry opened a pull request:

https://github.com/apache/incubator-hawq-docs/pull/136

HAWQ-1562 Fixed incorrect path to default log directory



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/outofmem0ry/incubator-hawq-docs 
document/HAWQ-1562

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq-docs/pull/136.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #136


commit 9c05af6f19c755526679fcd16792d3f272745e85
Author: Shubham Sharma 
Date:   2017-12-04T16:07:51Z

HAWQ-1562 Fixed incorrect path to default log directory




> Incorrect path to default log directory in documentation
> 
>
> Key: HAWQ-1562
> URL: https://issues.apache.org/jira/browse/HAWQ-1562
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Shubham Sharma
>Assignee: David Yozie
>
> In the current documentation six files point to wrong location of default 
> directories. The default log directory of management utilities is 
> ~/hawqAdminLogs but the documentation specifies ~/hawq/Adminlogs/ . There 
> list can be seen 
> [here|https://github.com/apache/incubator-hawq-docs/search?utf8=%E2%9C%93&q=Adminlogs&type=]
>  .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HAWQ-1553) User who doesn't have home directory can not run hawq extract command

2018-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16341820#comment-16341820
 ] 

ASF GitHub Bot commented on HAWQ-1553:
--

Github user lisakowen commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/134#discussion_r164250622
  
--- Diff: markdown/reference/cli/admin_utilities/hawqextract.html.md.erb ---
@@ -73,6 +73,9 @@ where:
 -\\\-version  
 Displays the version of this utility.
 
+-l, -\\\-logdir \  
+Specifies the log directory that `hawq extract` uses for logs. The 
default is `~/hawqAdminLogs/`.
--- End diff --

how about "Specifies the directory that `hawq extract` uses for log files."?


> User who doesn't have home directory can not run hawq extract command
> -
>
> Key: HAWQ-1553
> URL: https://issues.apache.org/jira/browse/HAWQ-1553
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Command Line Tools
>Reporter: Shubham Sharma
>Assignee: Radar Lei
>Priority: Major
> Fix For: 2.3.0.0-incubating
>
>
> HAWQ extract stores information in hawqextract_MMDD.log under directory 
> ~/hawqAdminLogs, and a user who doesn't have it's own home directory 
> encounters failure when running hawq extract.
> We can add -l option in order to set the target log directory for hawq 
> extract.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HAWQ-1368) normal user who doesn't have home directory may have problem when running hawq register

2018-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16341824#comment-16341824
 ] 

ASF GitHub Bot commented on HAWQ-1368:
--

Github user lisakowen commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/135#discussion_r164250784
  
--- Diff: markdown/reference/cli/admin_utilities/hawqregister.html.md.erb 
---
@@ -200,6 +200,8 @@ group {
 -\\\-version   
 Show the version of this utility, then exit.
 
+-l, -\\\-logdir \  
+Specifies the log directory that `hawq register` uses for logs. The 
default is `~/hawqAdminLogs/`.
--- End diff --

same comment as hawq extract ...  how about "Specifies the directory that 
`hawq register` uses for log files."?


> normal user who doesn't have home directory may have problem when running 
> hawq register
> ---
>
> Key: HAWQ-1368
> URL: https://issues.apache.org/jira/browse/HAWQ-1368
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Command Line Tools
>Reporter: Lili Ma
>Assignee: Radar Lei
>Priority: Major
> Fix For: backlog
>
>
> HAWQ register stores information in hawqregister_MMDD.log under directory 
> ~/hawqAdminLogs, and normal user who doesn't have own home directory may 
> encounter failure when running hawq regsiter.
> We can add -l option in order to set the target log directory and file name 
> of hawq register.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HAWQ-1562) Incorrect path to default log directory in documentation

2018-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354164#comment-16354164
 ] 

ASF GitHub Bot commented on HAWQ-1562:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-hawq-docs/pull/136


> Incorrect path to default log directory in documentation
> 
>
> Key: HAWQ-1562
> URL: https://issues.apache.org/jira/browse/HAWQ-1562
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Shubham Sharma
>Assignee: David Yozie
>Priority: Major
>
> In the current documentation six files point to wrong location of default 
> directories. The default log directory of management utilities is 
> ~/hawqAdminLogs but the documentation specifies ~/hawq/Adminlogs/ . There 
> list can be seen 
> [here|https://github.com/apache/incubator-hawq-docs/search?utf8=%E2%9C%93&q=Adminlogs&type=]
>  .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HAWQ-1638) Issues with website

2018-07-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HAWQ-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16536742#comment-16536742
 ] 

ASF GitHub Bot commented on HAWQ-1638:
--

GitHub user radarwave opened a pull request:

https://github.com/apache/incubator-hawq-site/pull/16

HAWQ-1638. Correct typo and use full name for ASF products 

Correct typo for madlib and use full name for ASF products with the 
first/main references.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/radarwave/incubator-hawq-site HAWQ-1638

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq-site/pull/16.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16


commit c6a20e535d9a36d39977269b00b2e6cea05687f8
Author: rlei 
Date:   2018-07-09T09:52:04Z

HAWQ-1638. Correct typo and use full name for ASF products with the 
first/main references.




> Issues with website
> ---
>
> Key: HAWQ-1638
> URL: https://issues.apache.org/jira/browse/HAWQ-1638
> Project: Apache HAWQ
>  Issue Type: Bug
>Reporter: Sebb
>Assignee: Radar Lei
>Priority: Major
>
> The HAWQ page looks nice, however there are a few problems with it.
> The phrase
> "Plus, HAWQ® works Apache MADlib) machine learning libraries"
> does not read well. Something missing?
> The first/main references to ASF products such as Hadoop, YARN etc must use 
> the full name, i.e. Apache Hadoop etc.
> The download section does not have any link to the KEYS file, nor any 
> instructions on how to use the KEYS+sig or hashes to validate downloads.
> The download section still includes references to MD5 hashes.
> These are deprecated and can be removed for older releases that have other 
> hashes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HAWQ-1638) Issues with website

2018-07-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HAWQ-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16536746#comment-16536746
 ] 

ASF GitHub Bot commented on HAWQ-1638:
--

Github user radarwave commented on the issue:

https://github.com/apache/incubator-hawq-site/pull/16
  
Download links fix will be update later.

@changleicn  @edespino @jiny2 
Would you help to review? Thanks.


> Issues with website
> ---
>
> Key: HAWQ-1638
> URL: https://issues.apache.org/jira/browse/HAWQ-1638
> Project: Apache HAWQ
>  Issue Type: Bug
>Reporter: Sebb
>Assignee: Radar Lei
>Priority: Major
>
> The HAWQ page looks nice, however there are a few problems with it.
> The phrase
> "Plus, HAWQ® works Apache MADlib) machine learning libraries"
> does not read well. Something missing?
> The first/main references to ASF products such as Hadoop, YARN etc must use 
> the full name, i.e. Apache Hadoop etc.
> The download section does not have any link to the KEYS file, nor any 
> instructions on how to use the KEYS+sig or hashes to validate downloads.
> The download section still includes references to MD5 hashes.
> These are deprecated and can be removed for older releases that have other 
> hashes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HAWQ-1638) Issues with website

2018-07-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HAWQ-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537839#comment-16537839
 ] 

ASF GitHub Bot commented on HAWQ-1638:
--

Github user changleicn commented on the issue:

https://github.com/apache/incubator-hawq-site/pull/16
  
looks good!


> Issues with website
> ---
>
> Key: HAWQ-1638
> URL: https://issues.apache.org/jira/browse/HAWQ-1638
> Project: Apache HAWQ
>  Issue Type: Bug
>Reporter: Sebb
>Assignee: Radar Lei
>Priority: Major
>
> The HAWQ page looks nice, however there are a few problems with it.
> The phrase
> "Plus, HAWQ® works Apache MADlib) machine learning libraries"
> does not read well. Something missing?
> The first/main references to ASF products such as Hadoop, YARN etc must use 
> the full name, i.e. Apache Hadoop etc.
> The download section does not have any link to the KEYS file, nor any 
> instructions on how to use the KEYS+sig or hashes to validate downloads.
> The download section still includes references to MD5 hashes.
> These are deprecated and can be removed for older releases that have other 
> hashes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HAWQ-1638) Issues with website

2018-07-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HAWQ-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537890#comment-16537890
 ] 

ASF GitHub Bot commented on HAWQ-1638:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-hawq-site/pull/16


> Issues with website
> ---
>
> Key: HAWQ-1638
> URL: https://issues.apache.org/jira/browse/HAWQ-1638
> Project: Apache HAWQ
>  Issue Type: Bug
>Reporter: Sebb
>Assignee: Radar Lei
>Priority: Major
>
> The HAWQ page looks nice, however there are a few problems with it.
> The phrase
> "Plus, HAWQ® works Apache MADlib) machine learning libraries"
> does not read well. Something missing?
> The first/main references to ASF products such as Hadoop, YARN etc must use 
> the full name, i.e. Apache Hadoop etc.
> The download section does not have any link to the KEYS file, nor any 
> instructions on how to use the KEYS+sig or hashes to validate downloads.
> The download section still includes references to MD5 hashes.
> These are deprecated and can be removed for older releases that have other 
> hashes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HAWQ-1638) Issues with website

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HAWQ-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538353#comment-16538353
 ] 

ASF GitHub Bot commented on HAWQ-1638:
--

GitHub user radarwave opened a pull request:

https://github.com/apache/incubator-hawq-site/pull/17

HAWQ-1638. Add how to verify downloaded files section, removed md5 keys.

Added a section about how to verify downloaded files.

Removed MD5 references.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/radarwave/incubator-hawq-site HAWQ-1638

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq-site/pull/17.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17


commit fff3da1168b91ceec758dbfe6b1e0f8e6ae6fb4b
Author: rlei 
Date:   2018-07-10T10:05:01Z

HAWQ-1638. Add how to verify downloaded files section, removed md5 keys.




> Issues with website
> ---
>
> Key: HAWQ-1638
> URL: https://issues.apache.org/jira/browse/HAWQ-1638
> Project: Apache HAWQ
>  Issue Type: Bug
>Reporter: Sebb
>Assignee: Radar Lei
>Priority: Major
>
> The HAWQ page looks nice, however there are a few problems with it.
> The phrase
> "Plus, HAWQ® works Apache MADlib) machine learning libraries"
> does not read well. Something missing?
> The first/main references to ASF products such as Hadoop, YARN etc must use 
> the full name, i.e. Apache Hadoop etc.
> The download section does not have any link to the KEYS file, nor any 
> instructions on how to use the KEYS+sig or hashes to validate downloads.
> The download section still includes references to MD5 hashes.
> These are deprecated and can be removed for older releases that have other 
> hashes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HAWQ-1638) Issues with website

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HAWQ-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538356#comment-16538356
 ] 

ASF GitHub Bot commented on HAWQ-1638:
--

Github user radarwave commented on the issue:

https://github.com/apache/incubator-hawq-site/pull/17
  
@changleicn @edespino @jiny2
Please help to review, thanks.


> Issues with website
> ---
>
> Key: HAWQ-1638
> URL: https://issues.apache.org/jira/browse/HAWQ-1638
> Project: Apache HAWQ
>  Issue Type: Bug
>Reporter: Sebb
>Assignee: Radar Lei
>Priority: Major
>
> The HAWQ page looks nice, however there are a few problems with it.
> The phrase
> "Plus, HAWQ® works Apache MADlib) machine learning libraries"
> does not read well. Something missing?
> The first/main references to ASF products such as Hadoop, YARN etc must use 
> the full name, i.e. Apache Hadoop etc.
> The download section does not have any link to the KEYS file, nor any 
> instructions on how to use the KEYS+sig or hashes to validate downloads.
> The download section still includes references to MD5 hashes.
> These are deprecated and can be removed for older releases that have other 
> hashes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HAWQ-1638) Issues with website

2018-07-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HAWQ-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538359#comment-16538359
 ] 

ASF GitHub Bot commented on HAWQ-1638:
--

Github user radarwave commented on the issue:

https://github.com/apache/incubator-hawq-site/pull/17
  
Add @dyozie 


> Issues with website
> ---
>
> Key: HAWQ-1638
> URL: https://issues.apache.org/jira/browse/HAWQ-1638
> Project: Apache HAWQ
>  Issue Type: Bug
>Reporter: Sebb
>Assignee: Radar Lei
>Priority: Major
>
> The HAWQ page looks nice, however there are a few problems with it.
> The phrase
> "Plus, HAWQ® works Apache MADlib) machine learning libraries"
> does not read well. Something missing?
> The first/main references to ASF products such as Hadoop, YARN etc must use 
> the full name, i.e. Apache Hadoop etc.
> The download section does not have any link to the KEYS file, nor any 
> instructions on how to use the KEYS+sig or hashes to validate downloads.
> The download section still includes references to MD5 hashes.
> These are deprecated and can be removed for older releases that have other 
> hashes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HAWQ-1638) Issues with website

2018-07-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HAWQ-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539647#comment-16539647
 ] 

ASF GitHub Bot commented on HAWQ-1638:
--

Github user jiny2 commented on the issue:

https://github.com/apache/incubator-hawq-site/pull/17
  
+1, LGTM. Thank you.


> Issues with website
> ---
>
> Key: HAWQ-1638
> URL: https://issues.apache.org/jira/browse/HAWQ-1638
> Project: Apache HAWQ
>  Issue Type: Bug
>Reporter: Sebb
>Assignee: Radar Lei
>Priority: Major
>
> The HAWQ page looks nice, however there are a few problems with it.
> The phrase
> "Plus, HAWQ® works Apache MADlib) machine learning libraries"
> does not read well. Something missing?
> The first/main references to ASF products such as Hadoop, YARN etc must use 
> the full name, i.e. Apache Hadoop etc.
> The download section does not have any link to the KEYS file, nor any 
> instructions on how to use the KEYS+sig or hashes to validate downloads.
> The download section still includes references to MD5 hashes.
> These are deprecated and can be removed for older releases that have other 
> hashes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HAWQ-1031) updates docs to reflect MASTER_DATA_DIRECTORY code changes

2016-09-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15456493#comment-15456493
 ] 

ASF GitHub Bot commented on HAWQ-1031:
--

GitHub user lisakowen opened a pull request:

https://github.com/apache/incubator-hawq-docs/pull/4

MASTER_DATA_DIRECTORY clarifications - HAWQ-1031

doc updates to remove references to MASTER_DATA_DIRECTORY.  clarify when 
the hawq-site.xml hawq_master_directory config value should be used.

should resolve HAWQ-1031

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lisakowen/incubator-hawq-docs 
feature/mdatadir_chgs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq-docs/pull/4.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4


commit 1a7efeffc1e1a5e010f7aa4cba4622f9da1357bb
Author: Lisa Owen 
Date:   2016-08-29T20:23:01Z

MASTER_DATA_DIRECTORY clarifications - HAWQ-1031




> updates docs to reflect MASTER_DATA_DIRECTORY code changes
> --
>
> Key: HAWQ-1031
> URL: https://issues.apache.org/jira/browse/HAWQ-1031
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.0.1.0-incubating
>Reporter: Lisa Owen
>Assignee: Lei Chang
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> MASTER_DATA_DIRECTORY is no longer a required environment variable for any 
> HAWQ command.  remove references in the docs, and point users to the 
> hawq-site.xml hawq_master_directory property value when required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1031) updates docs to reflect MASTER_DATA_DIRECTORY code changes

2016-09-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15456599#comment-15456599
 ] 

ASF GitHub Bot commented on HAWQ-1031:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-hawq-docs/pull/4


> updates docs to reflect MASTER_DATA_DIRECTORY code changes
> --
>
> Key: HAWQ-1031
> URL: https://issues.apache.org/jira/browse/HAWQ-1031
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.0.1.0-incubating
>Reporter: Lisa Owen
>Assignee: Lei Chang
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> MASTER_DATA_DIRECTORY is no longer a required environment variable for any 
> HAWQ command.  remove references in the docs, and point users to the 
> hawq-site.xml hawq_master_directory property value when required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1056) "hawq check" help output and documentation updates needed

2016-09-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15494197#comment-15494197
 ] 

ASF GitHub Bot commented on HAWQ-1056:
--

GitHub user lisakowen opened a pull request:

https://github.com/apache/incubator-hawq-docs/pull/9

Feature/hawqcheck hadoopopt

some cleanup to documentation for "hawq check" command.  fixes the 
documentation part of HAWQ-1056.

- add -h, --host  option
- clarify value of --hadoop, --hadoop-home option  value 
should be the full install path to hadoop
- modify the examples to use relevant values for hadoop_home

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lisakowen/incubator-hawq-docs 
feature/hawqcheck-hadoopopt

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq-docs/pull/9.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9


commit 0f642f1ce67bd570d2043174bbdaed990c7840bb
Author: Lisa Owen 
Date:   2016-09-14T21:11:38Z

clarify use of hawq check --hadoop option

commit 6704cc0b7a358fafba08b3fd66a5a12b5bb97f85
Author: Lisa Owen 
Date:   2016-09-14T21:21:07Z

hawq check --hadoop option - misc cleanup

commit 016630163015e782ef630338998ef1696f5f005e
Author: Lisa Owen 
Date:   2016-09-15T17:52:32Z

hawq check - add h/host option, cleanup

commit 4a617974cf04d1b1758bdfbc490116b60bdefb79
Author: Lisa Owen 
Date:   2016-09-15T18:38:36Z

hawq check - hadoop home is optional




> "hawq check" help output and documentation updates needed
> -
>
> Key: HAWQ-1056
> URL: https://issues.apache.org/jira/browse/HAWQ-1056
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Command Line Tools, Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
> Fix For: 2.0.1.0-incubating
>
>
> help output and reference documentation for "hawq check" --hadoop option is 
> not clear.  specifically, this option should identify the full path to the 
> hadoop installation.
> additionally, the [-h | --host ] option appears to be missing in 
> both areas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1056) "hawq check" help output and documentation updates needed

2016-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15508041#comment-15508041
 ] 

ASF GitHub Bot commented on HAWQ-1056:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-hawq-docs/pull/9


> "hawq check" help output and documentation updates needed
> -
>
> Key: HAWQ-1056
> URL: https://issues.apache.org/jira/browse/HAWQ-1056
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Command Line Tools, Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> help output and reference documentation for "hawq check" --hadoop option is 
> not clear.  specifically, this option should identify the full path to the 
> hadoop installation.
> additionally, the [-h | --host ] option appears to be missing in 
> both areas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586004#comment-15586004
 ] 

ASF GitHub Bot commented on HAWQ-1095:
--

GitHub user lisakowen opened a pull request:

https://github.com/apache/incubator-hawq-docs/pull/23

HAWQ-1095 - enhance database api docs

add content for jdbc, odbc, libpq


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lisakowen/incubator-hawq-docs 
feature/dbapiinfo

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq-docs/pull/23.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23


commit 2c0f4b19bb2baef545467c9d39f097344c6358b2
Author: Lisa Owen 
Date:   2016-10-04T19:25:29Z

restructure db API section; add libpq and links to driver and api docs

commit f066326f0241050a22a8b592fcaae3aab037c504
Author: Lisa Owen 
Date:   2016-10-04T20:36:41Z

clarify some statements

commit fbb0571df9cdb1ba05a2ba970b560cb6388b72eb
Author: Lisa Owen 
Date:   2016-10-04T23:11:07Z

hawq supports datadirect drivers

commit df2aaed3aab20b9d0fffa0c62df8a23c33864065
Author: Lisa Owen 
Date:   2016-10-04T23:26:02Z

update driver names

commit 245633e69bd0017f43a5cc20e82c9a5fc23b4079
Author: Lisa Owen 
Date:   2016-10-05T21:56:36Z

provide locations of libpq lib and include file

commit 57d76d2b86014f772754ca70cab95e4c337a71a2
Author: Lisa Owen 
Date:   2016-10-07T16:02:22Z

add jdbc connection string and example

commit 70e45af7d24a6699840eec176603b4b835121bef
Author: Lisa Owen 
Date:   2016-10-07T23:48:39Z

flesh out jdbc section; add connection URL specs

commit 3288da3e8ce51482e1d6e6913a237cbf5fc0bc8e
Author: Lisa Owen 
Date:   2016-10-10T19:08:48Z

db drivers and apis - flesh out odbc section




> enhance database driver and API documentation
> -
>
> Key: HAWQ-1095
> URL: https://issues.apache.org/jira/browse/HAWQ-1095
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> docs contain very brief references to JDBC/ODBC and none at all to libpq.  
> add more content in these areas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586041#comment-15586041
 ] 

ASF GitHub Bot commented on HAWQ-1096:
--

GitHub user lisakowen opened a pull request:

https://github.com/apache/incubator-hawq-docs/pull/25

HAWQ-1096 - add content for hawq built-in languages

add content for sql, c, and internal hawq built in languages

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lisakowen/incubator-hawq-docs 
feature/builtin-langs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq-docs/pull/25.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #25


commit 504c662be21dc344a161b81a9c627a8f6d7861cd
Author: Lisa Owen 
Date:   2016-10-05T21:33:36Z

add file discussing hawq built-in languages

commit 8e27e9093f1d27277d676386144ee895ad004f86
Author: Lisa Owen 
Date:   2016-10-05T21:34:36Z

include built-in languages in PL lang landing page

commit bd85fdbc31cb463855c2606fde48d803dccb3de2
Author: Lisa Owen 
Date:   2016-10-05T21:47:11Z

c user-defined function example - add _c to function name to avoid confusion

commit 1332870d01d2f8da2f8284ac167253d7005c6dfd
Author: Lisa Owen 
Date:   2016-10-10T22:24:20Z

builtin langs -  clarify and add some links




> document the HAWQ built-in languages (SQL, C, internal)
> ---
>
> Key: HAWQ-1096
> URL: https://issues.apache.org/jira/browse/HAWQ-1096
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
>
> the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, 
> C and internal.  add content to introduce these languages with relevant 
> examples and links. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586400#comment-15586400
 ] 

ASF GitHub Bot commented on HAWQ-1096:
--

GitHub user lisakowen opened a pull request:

https://github.com/apache/incubator-hawq-docs/pull/27

HAWQ-1096 - add subnav entry for built-in languages

add subnav for new topic

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lisakowen/incubator-hawq-docs 
feature/subnav-builtin-langs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq-docs/pull/27.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #27






> document the HAWQ built-in languages (SQL, C, internal)
> ---
>
> Key: HAWQ-1096
> URL: https://issues.apache.org/jira/browse/HAWQ-1096
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
>
> the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, 
> C and internal.  add content to introduce these languages with relevant 
> examples and links. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587080#comment-15587080
 ] 

ASF GitHub Bot commented on HAWQ-1095:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r83974977
  
--- Diff: clientaccess/g-database-application-interfaces.html.md.erb ---
@@ -1,8 +1,96 @@
 ---
-title: ODBC/JDBC Application Interfaces
+title: HAWQ Database Drivers and APIs
 ---
 
+You may want to connect your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The database application programming 
interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs.
 
-You may want to deploy your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The most commonly used database application 
programming interfaces with HAWQ are the ODBC and JDBC APIs. 
+HAWQ provides the following connectivity tools for connecting to the 
database:
+
+  - ODBC driver
+  - JDBC driver
+  - `libpq` - PostgreSQL C API
+
+## HAWQ Drivers
+
+ODBC and JDBC drivers for HAWQ are available as a separate download from 
Pivotal Network [Pivotal 
Network](https://network.pivotal.io/products/pivotal-hdb).
+
+### ODBC Driver
+
+The ODBC API specifies a standard set of C interfaces for accessing 
database management systems.  For additional information on using the ODBC API, 
refer to the [ODBC Programmer's 
Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) 
documentation.
+
+HAWQ supports the DataDirect ODBC Driver. Installation instructions for 
this driver are provided on the Pivotal Network driver download page. Refer to 
[HAWQ ODBC 
Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23)
 for HAWQ-specific ODBC driver information.
+
+ Connection Data Source
+The information required by the HAWQ ODBC driver to connect to a database 
is typically stored in a named data source. Depending on your platform, you may 
use 
[GUI](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_through_a_GUI_14.html%23)
 or [command 
line](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_in_the_UNIX_2fLinux_odbc_13.html%23)
 tools to create your data source definition. On Linux, ODBC data sources are 
typically defined in a file named `odbc.ini`. 
+
+Commonly-specified HAWQ ODBC data source connection properties include:
+
+| Property Name| Value 
Description 

|

+|---|-|
+| Database | name of the database to which you want to connect |
+| Driver   | full path to the ODBC driver library file 
  |
+| HostName  | HAWQ master host name
 |
+| MaxLongVarcharSize  | maximum size of columns of type long varchar   

   |
+| Password  | password used to connect to the specified 
database
   |
+| PortNumber  | HAWQ master database port number   
|
+
+Refer to [Connection Option 
Descriptions](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fgreenplum-connection-option-descriptions.html%23)
 for a list of ODBC connection properties supported by the HAWQ DataDirect ODBC 
driver.
+
+Example HAWQ DataDirect ODBC driver data source definition:
+
+``` shell
+[HAWQ-201]
+Driver=/usr/local/hawq_drivers/odbc/lib/ddgplm27.so
+Description=DataDirect 7.1 Greenplum Wire Protocol - for HAWQ
+Database=getstartdb
+HostName=hdm1
+PortNumber=5432
+Password=changeme
+MaxLongVarcharSize=8192
+```
+
+The first line, `[HAWQ-201]`, identifies the name of the data source.
+
+ODBC connection properties may also be specified in a connection string 
identifying ei

[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587079#comment-15587079
 ] 

ASF GitHub Bot commented on HAWQ-1095:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r83974918
  
--- Diff: clientaccess/g-database-application-interfaces.html.md.erb ---
@@ -1,8 +1,96 @@
 ---
-title: ODBC/JDBC Application Interfaces
+title: HAWQ Database Drivers and APIs
 ---
 
+You may want to connect your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The database application programming 
interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs.
 
-You may want to deploy your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The most commonly used database application 
programming interfaces with HAWQ are the ODBC and JDBC APIs. 
+HAWQ provides the following connectivity tools for connecting to the 
database:
+
+  - ODBC driver
+  - JDBC driver
+  - `libpq` - PostgreSQL C API
+
+## HAWQ Drivers
+
+ODBC and JDBC drivers for HAWQ are available as a separate download from 
Pivotal Network [Pivotal 
Network](https://network.pivotal.io/products/pivotal-hdb).
+
+### ODBC Driver
+
+The ODBC API specifies a standard set of C interfaces for accessing 
database management systems.  For additional information on using the ODBC API, 
refer to the [ODBC Programmer's 
Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) 
documentation.
+
+HAWQ supports the DataDirect ODBC Driver. Installation instructions for 
this driver are provided on the Pivotal Network driver download page. Refer to 
[HAWQ ODBC 
Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23)
 for HAWQ-specific ODBC driver information.
+
+ Connection Data Source
+The information required by the HAWQ ODBC driver to connect to a database 
is typically stored in a named data source. Depending on your platform, you may 
use 
[GUI](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_through_a_GUI_14.html%23)
 or [command 
line](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_in_the_UNIX_2fLinux_odbc_13.html%23)
 tools to create your data source definition. On Linux, ODBC data sources are 
typically defined in a file named `odbc.ini`. 
+
+Commonly-specified HAWQ ODBC data source connection properties include:
+
+| Property Name| Value 
Description 

|

+|---|-|
+| Database | name of the database to which you want to connect |
+| Driver   | full path to the ODBC driver library file 
  |
+| HostName  | HAWQ master host name
 |
+| MaxLongVarcharSize  | maximum size of columns of type long varchar   

   |
+| Password  | password used to connect to the specified 
database
   |
+| PortNumber  | HAWQ master database port number   
|
+
+Refer to [Connection Option 
Descriptions](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fgreenplum-connection-option-descriptions.html%23)
 for a list of ODBC connection properties supported by the HAWQ DataDirect ODBC 
driver.
+
+Example HAWQ DataDirect ODBC driver data source definition:
+
+``` shell
+[HAWQ-201]
+Driver=/usr/local/hawq_drivers/odbc/lib/ddgplm27.so
+Description=DataDirect 7.1 Greenplum Wire Protocol - for HAWQ
+Database=getstartdb
+HostName=hdm1
+PortNumber=5432
+Password=changeme
+MaxLongVarcharSize=8192
+```
+
+The first line, `[HAWQ-201]`, identifies the name of the data source.
+
+ODBC connection properties may also be specified in a connection string 
identifying ei

[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587077#comment-15587077
 ] 

ASF GitHub Bot commented on HAWQ-1095:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r83974424
  
--- Diff: clientaccess/g-database-application-interfaces.html.md.erb ---
@@ -1,8 +1,96 @@
 ---
-title: ODBC/JDBC Application Interfaces
+title: HAWQ Database Drivers and APIs
 ---
 
+You may want to connect your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The database application programming 
interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs.
 
-You may want to deploy your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The most commonly used database application 
programming interfaces with HAWQ are the ODBC and JDBC APIs. 
+HAWQ provides the following connectivity tools for connecting to the 
database:
+
+  - ODBC driver
+  - JDBC driver
+  - `libpq` - PostgreSQL C API
+
+## HAWQ Drivers
+
+ODBC and JDBC drivers for HAWQ are available as a separate download from 
Pivotal Network [Pivotal 
Network](https://network.pivotal.io/products/pivotal-hdb).
+
+### ODBC Driver
+
+The ODBC API specifies a standard set of C interfaces for accessing 
database management systems.  For additional information on using the ODBC API, 
refer to the [ODBC Programmer's 
Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) 
documentation.
+
+HAWQ supports the DataDirect ODBC Driver. Installation instructions for 
this driver are provided on the Pivotal Network driver download page. Refer to 
[HAWQ ODBC 
Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23)
 for HAWQ-specific ODBC driver information.
+
+ Connection Data Source
+The information required by the HAWQ ODBC driver to connect to a database 
is typically stored in a named data source. Depending on your platform, you may 
use 
[GUI](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_through_a_GUI_14.html%23)
 or [command 
line](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_in_the_UNIX_2fLinux_odbc_13.html%23)
 tools to create your data source definition. On Linux, ODBC data sources are 
typically defined in a file named `odbc.ini`. 
+
+Commonly-specified HAWQ ODBC data source connection properties include:
+
+| Property Name| Value 
Description 

|

+|---|-|
+| Database | name of the database to which you want to connect |
+| Driver   | full path to the ODBC driver library file 
  |
+| HostName  | HAWQ master host name
 |
+| MaxLongVarcharSize  | maximum size of columns of type long varchar   

   |
+| Password  | password used to connect to the specified 
database
   |
+| PortNumber  | HAWQ master database port number   
|
--- End diff --

Let's initial-capitalize the second column.


> enhance database driver and API documentation
> -
>
> Key: HAWQ-1095
> URL: https://issues.apache.org/jira/browse/HAWQ-1095
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> docs contain very brief references to JDBC/ODBC and none at all to libpq.  
> add more content in these areas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587081#comment-15587081
 ] 

ASF GitHub Bot commented on HAWQ-1095:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r83974367
  
--- Diff: clientaccess/g-database-application-interfaces.html.md.erb ---
@@ -1,8 +1,96 @@
 ---
-title: ODBC/JDBC Application Interfaces
+title: HAWQ Database Drivers and APIs
 ---
 
+You may want to connect your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The database application programming 
interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs.
 
-You may want to deploy your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The most commonly used database application 
programming interfaces with HAWQ are the ODBC and JDBC APIs. 
+HAWQ provides the following connectivity tools for connecting to the 
database:
+
+  - ODBC driver
+  - JDBC driver
+  - `libpq` - PostgreSQL C API
+
+## HAWQ Drivers
+
+ODBC and JDBC drivers for HAWQ are available as a separate download from 
Pivotal Network [Pivotal 
Network](https://network.pivotal.io/products/pivotal-hdb).
+
+### ODBC Driver
+
+The ODBC API specifies a standard set of C interfaces for accessing 
database management systems.  For additional information on using the ODBC API, 
refer to the [ODBC Programmer's 
Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) 
documentation.
+
+HAWQ supports the DataDirect ODBC Driver. Installation instructions for 
this driver are provided on the Pivotal Network driver download page. Refer to 
[HAWQ ODBC 
Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23)
 for HAWQ-specific ODBC driver information.
--- End diff --

Are you sure the datadirect link contains the same info available in the 
HAWQ ODBC download?


> enhance database driver and API documentation
> -
>
> Key: HAWQ-1095
> URL: https://issues.apache.org/jira/browse/HAWQ-1095
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> docs contain very brief references to JDBC/ODBC and none at all to libpq.  
> add more content in these areas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587078#comment-15587078
 ] 

ASF GitHub Bot commented on HAWQ-1095:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r83974668
  
--- Diff: clientaccess/g-database-application-interfaces.html.md.erb ---
@@ -1,8 +1,96 @@
 ---
-title: ODBC/JDBC Application Interfaces
+title: HAWQ Database Drivers and APIs
 ---
 
+You may want to connect your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The database application programming 
interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs.
 
-You may want to deploy your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The most commonly used database application 
programming interfaces with HAWQ are the ODBC and JDBC APIs. 
+HAWQ provides the following connectivity tools for connecting to the 
database:
+
+  - ODBC driver
+  - JDBC driver
+  - `libpq` - PostgreSQL C API
+
+## HAWQ Drivers
+
+ODBC and JDBC drivers for HAWQ are available as a separate download from 
Pivotal Network [Pivotal 
Network](https://network.pivotal.io/products/pivotal-hdb).
+
+### ODBC Driver
+
+The ODBC API specifies a standard set of C interfaces for accessing 
database management systems.  For additional information on using the ODBC API, 
refer to the [ODBC Programmer's 
Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) 
documentation.
+
+HAWQ supports the DataDirect ODBC Driver. Installation instructions for 
this driver are provided on the Pivotal Network driver download page. Refer to 
[HAWQ ODBC 
Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23)
 for HAWQ-specific ODBC driver information.
+
+ Connection Data Source
+The information required by the HAWQ ODBC driver to connect to a database 
is typically stored in a named data source. Depending on your platform, you may 
use 
[GUI](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_through_a_GUI_14.html%23)
 or [command 
line](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_in_the_UNIX_2fLinux_odbc_13.html%23)
 tools to create your data source definition. On Linux, ODBC data sources are 
typically defined in a file named `odbc.ini`. 
+
+Commonly-specified HAWQ ODBC data source connection properties include:
+
+| Property Name| Value 
Description 

|

+|---|-|
+| Database | name of the database to which you want to connect |
+| Driver   | full path to the ODBC driver library file 
  |
+| HostName  | HAWQ master host name
 |
+| MaxLongVarcharSize  | maximum size of columns of type long varchar   

   |
+| Password  | password used to connect to the specified 
database
   |
+| PortNumber  | HAWQ master database port number   
|
+
+Refer to [Connection Option 
Descriptions](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fgreenplum-connection-option-descriptions.html%23)
 for a list of ODBC connection properties supported by the HAWQ DataDirect ODBC 
driver.
+
+Example HAWQ DataDirect ODBC driver data source definition:
+
+``` shell
+[HAWQ-201]
+Driver=/usr/local/hawq_drivers/odbc/lib/ddgplm27.so
+Description=DataDirect 7.1 Greenplum Wire Protocol - for HAWQ
+Database=getstartdb
+HostName=hdm1
+PortNumber=5432
+Password=changeme
+MaxLongVarcharSize=8192
+```
+
+The first line, `[HAWQ-201]`, identifies the name of the data source.
+
+ODBC connection properties may also be specified in a connection string 
identifying ei

[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587103#comment-15587103
 ] 

ASF GitHub Bot commented on HAWQ-1095:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-hawq-docs/pull/26


> enhance database driver and API documentation
> -
>
> Key: HAWQ-1095
> URL: https://issues.apache.org/jira/browse/HAWQ-1095
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> docs contain very brief references to JDBC/ODBC and none at all to libpq.  
> add more content in these areas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587108#comment-15587108
 ] 

ASF GitHub Bot commented on HAWQ-1095:
--

Github user lisakowen commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r83976521
  
--- Diff: clientaccess/g-database-application-interfaces.html.md.erb ---
@@ -1,8 +1,96 @@
 ---
-title: ODBC/JDBC Application Interfaces
+title: HAWQ Database Drivers and APIs
 ---
 
+You may want to connect your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The database application programming 
interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs.
 
-You may want to deploy your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The most commonly used database application 
programming interfaces with HAWQ are the ODBC and JDBC APIs. 
+HAWQ provides the following connectivity tools for connecting to the 
database:
+
+  - ODBC driver
+  - JDBC driver
+  - `libpq` - PostgreSQL C API
+
+## HAWQ Drivers
+
+ODBC and JDBC drivers for HAWQ are available as a separate download from 
Pivotal Network [Pivotal 
Network](https://network.pivotal.io/products/pivotal-hdb).
+
+### ODBC Driver
+
+The ODBC API specifies a standard set of C interfaces for accessing 
database management systems.  For additional information on using the ODBC API, 
refer to the [ODBC Programmer's 
Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) 
documentation.
+
+HAWQ supports the DataDirect ODBC Driver. Installation instructions for 
this driver are provided on the Pivotal Network driver download page. Refer to 
[HAWQ ODBC 
Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23)
 for HAWQ-specific ODBC driver information.
+
+ Connection Data Source
+The information required by the HAWQ ODBC driver to connect to a database 
is typically stored in a named data source. Depending on your platform, you may 
use 
[GUI](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_through_a_GUI_14.html%23)
 or [command 
line](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_in_the_UNIX_2fLinux_odbc_13.html%23)
 tools to create your data source definition. On Linux, ODBC data sources are 
typically defined in a file named `odbc.ini`. 
+
+Commonly-specified HAWQ ODBC data source connection properties include:
+
+| Property Name| Value 
Description 

|

+|---|-|
+| Database | name of the database to which you want to connect |
+| Driver   | full path to the ODBC driver library file 
  |
+| HostName  | HAWQ master host name
 |
+| MaxLongVarcharSize  | maximum size of columns of type long varchar   

   |
+| Password  | password used to connect to the specified 
database
   |
+| PortNumber  | HAWQ master database port number   
|
+
+Refer to [Connection Option 
Descriptions](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fgreenplum-connection-option-descriptions.html%23)
 for a list of ODBC connection properties supported by the HAWQ DataDirect ODBC 
driver.
+
+Example HAWQ DataDirect ODBC driver data source definition:
+
+``` shell
+[HAWQ-201]
+Driver=/usr/local/hawq_drivers/odbc/lib/ddgplm27.so
+Description=DataDirect 7.1 Greenplum Wire Protocol - for HAWQ
+Database=getstartdb
+HostName=hdm1
+PortNumber=5432
+Password=changeme
+MaxLongVarcharSize=8192
+```
+
+The first line, `[HAWQ-201]`, identifies the name of the data source.
+
+ODBC connection properties may also be specified in a connection string 
identifying

[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587132#comment-15587132
 ] 

ASF GitHub Bot commented on HAWQ-1095:
--

Github user lisakowen commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r83977371
  
--- Diff: clientaccess/g-database-application-interfaces.html.md.erb ---
@@ -1,8 +1,96 @@
 ---
-title: ODBC/JDBC Application Interfaces
+title: HAWQ Database Drivers and APIs
 ---
 
+You may want to connect your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The database application programming 
interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs.
 
-You may want to deploy your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The most commonly used database application 
programming interfaces with HAWQ are the ODBC and JDBC APIs. 
+HAWQ provides the following connectivity tools for connecting to the 
database:
+
+  - ODBC driver
+  - JDBC driver
+  - `libpq` - PostgreSQL C API
+
+## HAWQ Drivers
+
+ODBC and JDBC drivers for HAWQ are available as a separate download from 
Pivotal Network [Pivotal 
Network](https://network.pivotal.io/products/pivotal-hdb).
+
+### ODBC Driver
+
+The ODBC API specifies a standard set of C interfaces for accessing 
database management systems.  For additional information on using the ODBC API, 
refer to the [ODBC Programmer's 
Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) 
documentation.
+
+HAWQ supports the DataDirect ODBC Driver. Installation instructions for 
this driver are provided on the Pivotal Network driver download page. Refer to 
[HAWQ ODBC 
Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23)
 for HAWQ-specific ODBC driver information.
--- End diff --

users will download the readme from pivnet.  the link at the end of the 
readme points to a datadirect page from which one could navigate to the links i 
have included.

i don't see any other docs when i untar the download package.


> enhance database driver and API documentation
> -
>
> Key: HAWQ-1095
> URL: https://issues.apache.org/jira/browse/HAWQ-1095
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> docs contain very brief references to JDBC/ODBC and none at all to libpq.  
> add more content in these areas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587173#comment-15587173
 ] 

ASF GitHub Bot commented on HAWQ-1096:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83978549
  
--- Diff: plext/builtin_langs.html.md.erb ---
@@ -0,0 +1,110 @@
+---
+title: Using HAWQ Built-In Languages
+---
+
+This section provides an introduction to using the HAWQ built-in languages.
+
+HAWQ supports user-defined functions created with the SQL and C built-in 
languages. HAWQ also supports user-defined aliases for internal functions.
+
+
+## Enabling Built-in Language Support
+
+Support for SQL, internal, and C language user-defined functions is 
enabled by default for all HAWQ databases.
+
+## SQL
+
+SQL functions execute an arbitrary list of SQL statements. The SQL 
statements in the body of an SQL function must be separated by semicolons. The 
final statement in a non-void-returning SQL function must be a 
[SELECT](../reference/sql/SELECT.html) that returns data of the type specified 
by the function's return type. The function will return a single or set of rows 
corresponding to this last SQL query.
+
+The following example creates and calls an SQL function to count the 
number of rows of the database named `orders`:
+
+``` sql
+gpadmin=# CREATE FUNCTION count_orders() RETURNS bigint AS $$
+ SELECT count(*) FROM orders;
+$$ LANGUAGE SQL;
+CREATE FUNCTION
+gpadmin=# select count_orders();
+ my_count 
+--
+   830513
+(1 row)
+```
+
+For additional information on creating SQL functions, refer to [Query 
Language (SQL) 
Functions](https://www.postgresql.org/docs/8.2/static/xfunc-sql.html) in the 
PostgreSQL documentation.
+
+## Internal
+
+Many HAWQ internal functions are written in C. These functions are 
declared during initialization of the database cluster and statically linked to 
the HAWQ server. See [Built-in Functions and 
Operators](../query/functions-operators.html#topic29) for detailed information 
on HAWQ internal functions.
+
+While users cannot define new internal functions, they can create aliases 
for existing internal functions.
+
+The following example creates a new function named `all_caps` that will be 
defined as an alias for the `upper` HAWQ internal function:
--- End diff --

Edit:  change "that will be defined as an" to "that is an"


> document the HAWQ built-in languages (SQL, C, internal)
> ---
>
> Key: HAWQ-1096
> URL: https://issues.apache.org/jira/browse/HAWQ-1096
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
>
> the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, 
> C and internal.  add content to introduce these languages with relevant 
> examples and links. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587171#comment-15587171
 ] 

ASF GitHub Bot commented on HAWQ-1096:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83978465
  
--- Diff: plext/builtin_langs.html.md.erb ---
@@ -0,0 +1,110 @@
+---
+title: Using HAWQ Built-In Languages
+---
+
+This section provides an introduction to using the HAWQ built-in languages.
+
+HAWQ supports user-defined functions created with the SQL and C built-in 
languages. HAWQ also supports user-defined aliases for internal functions.
+
+
+## Enabling Built-in Language Support
+
+Support for SQL, internal, and C language user-defined functions is 
enabled by default for all HAWQ databases.
+
+## SQL
+
+SQL functions execute an arbitrary list of SQL statements. The SQL 
statements in the body of an SQL function must be separated by semicolons. The 
final statement in a non-void-returning SQL function must be a 
[SELECT](../reference/sql/SELECT.html) that returns data of the type specified 
by the function's return type. The function will return a single or set of rows 
corresponding to this last SQL query.
+
+The following example creates and calls an SQL function to count the 
number of rows of the database named `orders`:
+
+``` sql
+gpadmin=# CREATE FUNCTION count_orders() RETURNS bigint AS $$
+ SELECT count(*) FROM orders;
+$$ LANGUAGE SQL;
+CREATE FUNCTION
+gpadmin=# select count_orders();
+ my_count 
+--
+   830513
+(1 row)
+```
+
+For additional information on creating SQL functions, refer to [Query 
Language (SQL) 
Functions](https://www.postgresql.org/docs/8.2/static/xfunc-sql.html) in the 
PostgreSQL documentation.
+
+## Internal
+
+Many HAWQ internal functions are written in C. These functions are 
declared during initialization of the database cluster and statically linked to 
the HAWQ server. See [Built-in Functions and 
Operators](../query/functions-operators.html#topic29) for detailed information 
on HAWQ internal functions.
+
+While users cannot define new internal functions, they can create aliases 
for existing internal functions.
--- End diff --

Reword:  **You** cannot define new internal functions, **but you** can 
create...


> document the HAWQ built-in languages (SQL, C, internal)
> ---
>
> Key: HAWQ-1096
> URL: https://issues.apache.org/jira/browse/HAWQ-1096
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
>
> the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, 
> C and internal.  add content to introduce these languages with relevant 
> examples and links. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587175#comment-15587175
 ] 

ASF GitHub Bot commented on HAWQ-1096:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83978153
  
--- Diff: plext/builtin_langs.html.md.erb ---
@@ -0,0 +1,110 @@
+---
+title: Using HAWQ Built-In Languages
+---
+
+This section provides an introduction to using the HAWQ built-in languages.
+
+HAWQ supports user-defined functions created with the SQL and C built-in 
languages. HAWQ also supports user-defined aliases for internal functions.
+
+
+## Enabling Built-in Language Support
+
+Support for SQL, internal, and C language user-defined functions is 
enabled by default for all HAWQ databases.
+
+## SQL
+
+SQL functions execute an arbitrary list of SQL statements. The SQL 
statements in the body of an SQL function must be separated by semicolons. The 
final statement in a non-void-returning SQL function must be a 
[SELECT](../reference/sql/SELECT.html) that returns data of the type specified 
by the function's return type. The function will return a single or set of rows 
corresponding to this last SQL query.
+
+The following example creates and calls an SQL function to count the 
number of rows of the database named `orders`:
+
+``` sql
+gpadmin=# CREATE FUNCTION count_orders() RETURNS bigint AS $$
+ SELECT count(*) FROM orders;
+$$ LANGUAGE SQL;
+CREATE FUNCTION
+gpadmin=# select count_orders();
+ my_count 
+--
+   830513
+(1 row)
+```
+
+For additional information on creating SQL functions, refer to [Query 
Language (SQL) 
Functions](https://www.postgresql.org/docs/8.2/static/xfunc-sql.html) in the 
PostgreSQL documentation.
+
+## Internal
+
+Many HAWQ internal functions are written in C. These functions are 
declared during initialization of the database cluster and statically linked to 
the HAWQ server. See [Built-in Functions and 
Operators](../query/functions-operators.html#topic29) for detailed information 
on HAWQ internal functions.
+
+While users cannot define new internal functions, they can create aliases 
for existing internal functions.
+
+The following example creates a new function named `all_caps` that will be 
defined as an alias for the `upper` HAWQ internal function:
+
+
+``` sql
+gpadmin=# CREATE FUNCTION all_caps (text) RETURNS text AS 'upper'
+LANGUAGE internal STRICT;
+CREATE FUNCTION
+gpadmin=# SELECT all_caps('change me');
+ all_caps  
+---
+ CHANGE ME
+(1 row)
+
+```
+
+For more information on aliasing internal functions, refer to [Internal 
Functions](https://www.postgresql.org/docs/8.2/static/xfunc-internal.html) in 
the PostgreSQL documentation.
+
+## C
--- End diff --

This id value is the same as the previous one - should be unique.  Also 
change header to "C Functions"?


> document the HAWQ built-in languages (SQL, C, internal)
> ---
>
> Key: HAWQ-1096
> URL: https://issues.apache.org/jira/browse/HAWQ-1096
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
>
> the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, 
> C and internal.  add content to introduce these languages with relevant 
> examples and links. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587172#comment-15587172
 ] 

ASF GitHub Bot commented on HAWQ-1096:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83976414
  
--- Diff: plext/UsingProceduralLanguages.html.md.erb ---
@@ -1,13 +1,16 @@
 ---
-title: Using Procedural Languages and Extensions in HAWQ
+title: Using Languages and Extensions in HAWQ
 ---
 
-HAWQ allows user-defined functions to be written in other languages 
besides SQL and C. These other languages are generically called *procedural 
languages* (PLs).
+HAWQ supports user-defined functions created with the SQL and C built-in 
languages, including supporting user-defined aliases for internal functions.
--- End diff --

This needs a bit of an edit:  HAWQ supports user-defined functions **that 
are** created with the SQL and C built-in languages, **and also supports** 
user-defined aliases for internal functions.



> document the HAWQ built-in languages (SQL, C, internal)
> ---
>
> Key: HAWQ-1096
> URL: https://issues.apache.org/jira/browse/HAWQ-1096
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
>
> the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, 
> C and internal.  add content to introduce these languages with relevant 
> examples and links. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587170#comment-15587170
 ] 

ASF GitHub Bot commented on HAWQ-1096:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83978854
  
--- Diff: plext/builtin_langs.html.md.erb ---
@@ -0,0 +1,110 @@
+---
+title: Using HAWQ Built-In Languages
+---
+
+This section provides an introduction to using the HAWQ built-in languages.
+
+HAWQ supports user-defined functions created with the SQL and C built-in 
languages. HAWQ also supports user-defined aliases for internal functions.
+
+
+## Enabling Built-in Language Support
+
+Support for SQL, internal, and C language user-defined functions is 
enabled by default for all HAWQ databases.
+
+## SQL
+
+SQL functions execute an arbitrary list of SQL statements. The SQL 
statements in the body of an SQL function must be separated by semicolons. The 
final statement in a non-void-returning SQL function must be a 
[SELECT](../reference/sql/SELECT.html) that returns data of the type specified 
by the function's return type. The function will return a single or set of rows 
corresponding to this last SQL query.
+
+The following example creates and calls an SQL function to count the 
number of rows of the database named `orders`:
+
+``` sql
+gpadmin=# CREATE FUNCTION count_orders() RETURNS bigint AS $$
+ SELECT count(*) FROM orders;
+$$ LANGUAGE SQL;
+CREATE FUNCTION
+gpadmin=# select count_orders();
+ my_count 
+--
+   830513
+(1 row)
+```
+
+For additional information on creating SQL functions, refer to [Query 
Language (SQL) 
Functions](https://www.postgresql.org/docs/8.2/static/xfunc-sql.html) in the 
PostgreSQL documentation.
+
+## Internal
+
+Many HAWQ internal functions are written in C. These functions are 
declared during initialization of the database cluster and statically linked to 
the HAWQ server. See [Built-in Functions and 
Operators](../query/functions-operators.html#topic29) for detailed information 
on HAWQ internal functions.
+
+While users cannot define new internal functions, they can create aliases 
for existing internal functions.
+
+The following example creates a new function named `all_caps` that will be 
defined as an alias for the `upper` HAWQ internal function:
+
+
+``` sql
+gpadmin=# CREATE FUNCTION all_caps (text) RETURNS text AS 'upper'
+LANGUAGE internal STRICT;
+CREATE FUNCTION
+gpadmin=# SELECT all_caps('change me');
+ all_caps  
+---
+ CHANGE ME
+(1 row)
+
+```
+
+For more information on aliasing internal functions, refer to [Internal 
Functions](https://www.postgresql.org/docs/8.2/static/xfunc-internal.html) in 
the PostgreSQL documentation.
+
+## C
+
+User-defined functions written in C must be compiled into shared libraries 
to be loaded by the HAWQ server on demand. This dynamic loading distinguishes C 
language functions from internal functions that are written in C.
--- End diff --

Avoid passive voice here:  "You must compile user-defined functions written 
in C into shared libraries so that the HAWQ server can load them on demand."


> document the HAWQ built-in languages (SQL, C, internal)
> ---
>
> Key: HAWQ-1096
> URL: https://issues.apache.org/jira/browse/HAWQ-1096
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
>
> the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, 
> C and internal.  add content to introduce these languages with relevant 
> examples and links. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587168#comment-15587168
 ] 

ASF GitHub Bot commented on HAWQ-1096:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83977628
  
--- Diff: plext/builtin_langs.html.md.erb ---
@@ -0,0 +1,110 @@
+---
+title: Using HAWQ Built-In Languages
+---
+
+This section provides an introduction to using the HAWQ built-in languages.
+
+HAWQ supports user-defined functions created with the SQL and C built-in 
languages. HAWQ also supports user-defined aliases for internal functions.
+
+
+## Enabling Built-in Language Support
+
+Support for SQL, internal, and C language user-defined functions is 
enabled by default for all HAWQ databases.
+
+## SQL
+
+SQL functions execute an arbitrary list of SQL statements. The SQL 
statements in the body of an SQL function must be separated by semicolons. The 
final statement in a non-void-returning SQL function must be a 
[SELECT](../reference/sql/SELECT.html) that returns data of the type specified 
by the function's return type. The function will return a single or set of rows 
corresponding to this last SQL query.
--- End diff --

Global:  change "an SQL" to "a SQL" (pronounced 'sequel')


> document the HAWQ built-in languages (SQL, C, internal)
> ---
>
> Key: HAWQ-1096
> URL: https://issues.apache.org/jira/browse/HAWQ-1096
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
>
> the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, 
> C and internal.  add content to introduce these languages with relevant 
> examples and links. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587174#comment-15587174
 ] 

ASF GitHub Bot commented on HAWQ-1096:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83978174
  
--- Diff: plext/builtin_langs.html.md.erb ---
@@ -0,0 +1,110 @@
+---
+title: Using HAWQ Built-In Languages
+---
+
+This section provides an introduction to using the HAWQ built-in languages.
+
+HAWQ supports user-defined functions created with the SQL and C built-in 
languages. HAWQ also supports user-defined aliases for internal functions.
+
+
+## Enabling Built-in Language Support
+
+Support for SQL, internal, and C language user-defined functions is 
enabled by default for all HAWQ databases.
+
+## SQL
+
+SQL functions execute an arbitrary list of SQL statements. The SQL 
statements in the body of an SQL function must be separated by semicolons. The 
final statement in a non-void-returning SQL function must be a 
[SELECT](../reference/sql/SELECT.html) that returns data of the type specified 
by the function's return type. The function will return a single or set of rows 
corresponding to this last SQL query.
+
+The following example creates and calls an SQL function to count the 
number of rows of the database named `orders`:
+
+``` sql
+gpadmin=# CREATE FUNCTION count_orders() RETURNS bigint AS $$
+ SELECT count(*) FROM orders;
+$$ LANGUAGE SQL;
+CREATE FUNCTION
+gpadmin=# select count_orders();
+ my_count 
+--
+   830513
+(1 row)
+```
+
+For additional information on creating SQL functions, refer to [Query 
Language (SQL) 
Functions](https://www.postgresql.org/docs/8.2/static/xfunc-sql.html) in the 
PostgreSQL documentation.
+
+## Internal
--- End diff --

Change title to "Internal Functions"?


> document the HAWQ built-in languages (SQL, C, internal)
> ---
>
> Key: HAWQ-1096
> URL: https://issues.apache.org/jira/browse/HAWQ-1096
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
>
> the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, 
> C and internal.  add content to introduce these languages with relevant 
> examples and links. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587169#comment-15587169
 ] 

ASF GitHub Bot commented on HAWQ-1096:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83979056
  
--- Diff: plext/builtin_langs.html.md.erb ---
@@ -0,0 +1,110 @@
+---
+title: Using HAWQ Built-In Languages
+---
+
+This section provides an introduction to using the HAWQ built-in languages.
+
+HAWQ supports user-defined functions created with the SQL and C built-in 
languages. HAWQ also supports user-defined aliases for internal functions.
+
+
+## Enabling Built-in Language Support
+
+Support for SQL, internal, and C language user-defined functions is 
enabled by default for all HAWQ databases.
+
+## SQL
+
+SQL functions execute an arbitrary list of SQL statements. The SQL 
statements in the body of an SQL function must be separated by semicolons. The 
final statement in a non-void-returning SQL function must be a 
[SELECT](../reference/sql/SELECT.html) that returns data of the type specified 
by the function's return type. The function will return a single or set of rows 
corresponding to this last SQL query.
+
+The following example creates and calls an SQL function to count the 
number of rows of the database named `orders`:
+
+``` sql
+gpadmin=# CREATE FUNCTION count_orders() RETURNS bigint AS $$
+ SELECT count(*) FROM orders;
+$$ LANGUAGE SQL;
+CREATE FUNCTION
+gpadmin=# select count_orders();
+ my_count 
+--
+   830513
+(1 row)
+```
+
+For additional information on creating SQL functions, refer to [Query 
Language (SQL) 
Functions](https://www.postgresql.org/docs/8.2/static/xfunc-sql.html) in the 
PostgreSQL documentation.
--- End diff --

Global edit:  Change "For additional information on" to "For additional 
information about"


> document the HAWQ built-in languages (SQL, C, internal)
> ---
>
> Key: HAWQ-1096
> URL: https://issues.apache.org/jira/browse/HAWQ-1096
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
>
> the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, 
> C and internal.  add content to introduce these languages with relevant 
> examples and links. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation

2016-10-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15589261#comment-15589261
 ] 

ASF GitHub Bot commented on HAWQ-1095:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r84117499
  
--- Diff: clientaccess/g-database-application-interfaces.html.md.erb ---
@@ -1,8 +1,96 @@
 ---
-title: ODBC/JDBC Application Interfaces
+title: HAWQ Database Drivers and APIs
 ---
 
+You may want to connect your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The database application programming 
interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs.
 
-You may want to deploy your existing Business Intelligence (BI) or 
Analytics applications with HAWQ. The most commonly used database application 
programming interfaces with HAWQ are the ODBC and JDBC APIs. 
+HAWQ provides the following connectivity tools for connecting to the 
database:
+
+  - ODBC driver
+  - JDBC driver
+  - `libpq` - PostgreSQL C API
+
+## HAWQ Drivers
+
+ODBC and JDBC drivers for HAWQ are available as a separate download from 
Pivotal Network [Pivotal 
Network](https://network.pivotal.io/products/pivotal-hdb).
+
+### ODBC Driver
+
+The ODBC API specifies a standard set of C interfaces for accessing 
database management systems.  For additional information on using the ODBC API, 
refer to the [ODBC Programmer's 
Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) 
documentation.
+
+HAWQ supports the DataDirect ODBC Driver. Installation instructions for 
this driver are provided on the Pivotal Network driver download page. Refer to 
[HAWQ ODBC 
Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23)
 for HAWQ-specific ODBC driver information.
--- End diff --

Ok - thanks.  I think in other cases PDFs of the actual docs are included.  
This might only be in the Windows downloads.


> enhance database driver and API documentation
> -
>
> Key: HAWQ-1095
> URL: https://issues.apache.org/jira/browse/HAWQ-1095
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> docs contain very brief references to JDBC/ODBC and none at all to libpq.  
> add more content in these areas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)

2016-10-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15589308#comment-15589308
 ] 

ASF GitHub Bot commented on HAWQ-1096:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-hawq-docs/pull/27


> document the HAWQ built-in languages (SQL, C, internal)
> ---
>
> Key: HAWQ-1096
> URL: https://issues.apache.org/jira/browse/HAWQ-1096
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
>
> the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, 
> C and internal.  add content to introduce these languages with relevant 
> examples and links. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation

2016-10-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15589318#comment-15589318
 ] 

ASF GitHub Bot commented on HAWQ-1095:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-hawq-docs/pull/23


> enhance database driver and API documentation
> -
>
> Key: HAWQ-1095
> URL: https://issues.apache.org/jira/browse/HAWQ-1095
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> docs contain very brief references to JDBC/ODBC and none at all to libpq.  
> add more content in these areas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)

2016-10-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15589313#comment-15589313
 ] 

ASF GitHub Bot commented on HAWQ-1096:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-hawq-docs/pull/25


> document the HAWQ built-in languages (SQL, C, internal)
> ---
>
> Key: HAWQ-1096
> URL: https://issues.apache.org/jira/browse/HAWQ-1096
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
>
> the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, 
> C and internal.  add content to introduce these languages with relevant 
> examples and links. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

2016-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15602949#comment-15602949
 ] 

ASF GitHub Bot commented on HAWQ-1107:
--

GitHub user lisakowen opened a pull request:

https://github.com/apache/incubator-hawq-docs/pull/33

HAWQ-1107 - enhance PXF HDFS plugin documentation

added more examples, restructured the content, removed SequenceWritable 
references.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lisakowen/incubator-hawq-docs 
feature/pxfhdfs-enhance

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq-docs/pull/33.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #33


commit 9ca277927bebd9c8d79bdf4619dfaf94a695c838
Author: Lisa Owen 
Date:   2016-10-14T22:29:22Z

start restructuring HDFS plug-in page

commit 2da7a92a3e8431335a48005d55a70c9eba333e16
Author: Lisa Owen 
Date:   2016-10-17T23:27:23Z

more content and rearranging of pxf hdfs plugin page

commit 5a941a70bda0e8466b5aa5dd2885840fce14c522
Author: Lisa Owen 
Date:   2016-10-18T16:57:09Z

more rework of hdfs plug in page

commit fd029d568589f5a4e2461d92437963d97f7d3198
Author: Lisa Owen 
Date:   2016-10-20T19:20:21Z

remove SerialWritable, use namenode for host

commit 6ba64f94d5b11397c98f46eb14d5c6e48d17a6cc
Author: Lisa Owen 
Date:   2016-10-20T21:12:43Z

use more descriptive file names

commit 86d13b312ea8591949b8a811973937ab60f74df9
Author: Lisa Owen 
Date:   2016-10-20T22:36:01Z

more mods to HDFS plugin docs




> PXF HDFS documentation - restructure content and include more examples
> --
>
> Key: HAWQ-1107
> URL: https://issues.apache.org/jira/browse/HAWQ-1107
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> the current PXF HDFS documentation does not include any runnable examples.  
> add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, 
> Avro) profiles.  restructure the content as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

2016-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15602963#comment-15602963
 ] 

ASF GitHub Bot commented on HAWQ-1107:
--

GitHub user lisakowen opened a pull request:

https://github.com/apache/incubator-hawq-docs/pull/34

HAWQ-1107 - subnav chgs for pxf hdfs plugin content restructure

subnav changes

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lisakowen/incubator-hawq-docs 
feature/subnav-pxfhdfs-enhance

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq-docs/pull/34.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #34


commit f350e41fa419e9fb661f4ccb6e8793b7d9e9a40b
Author: Lisa Owen 
Date:   2016-10-24T19:30:37Z

subna chgs for pxf hdfs plugin content restructure




> PXF HDFS documentation - restructure content and include more examples
> --
>
> Key: HAWQ-1107
> URL: https://issues.apache.org/jira/browse/HAWQ-1107
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> the current PXF HDFS documentation does not include any runnable examples.  
> add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, 
> Avro) profiles.  restructure the content as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

2016-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606510#comment-15606510
 ] 

ASF GitHub Bot commented on HAWQ-1107:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85000415
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,388 +2,282 @@
 title: Accessing HDFS File Data
 ---
 
-## Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
 
-Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
+This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
 
--   Test PXF on HDFS before connecting to Hive or HBase.
--   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
+## Prerequisites
 
-## Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
 
-The syntax for creating an external HDFS file is as follows: 
+-   The HDFS plug-in is installed on all cluster nodes.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
 
-``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
-( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]')
-  FORMAT '[TEXT | CSV | CUSTOM]' ();
-```
+## HDFS File Formats
 
-where `` is:
+The PXF HDFS plug-in supports reading the following file formats:
 
-``` pre
-   
FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class]
- | PROFILE=profile-name
-```
+- Text File - comma-separated value (.csv) or delimited format plain text 
file
+- Avro - JSON-defined, schema-based data serialization format
 
-**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
+The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
 
-Use an SQL `SELECT` statement to read from an HDFS READABLE table:
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
 
-``` sql
-SELECT ... FROM table_name;
-```
 
-Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table:
+## HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.  
These tools support typical file system operations including copying and 
listing files, changing file permissions, etc. 
 
-``` sql
-INSERT INTO table_name ...;
-```
+The HDFS file system command is `hdfs dfs  []`. Invoked 
with no options, `hdfs dfs` lists the file system options supported by the tool.
+
+`hdfs dfs` options used in this section are identified in the table below:
+
+| Option  | Description |
+|---|-|
+| `-cat`| Display file contents. |
+| `-mkdir`| Create directory in HDFS. |
+| `-put`| Copy file from local file system to HDFS. |
+
+### Create Data Files
+
+Perform the following steps to create data files used in subsequent 
exercises:
+
+1. Create an HDFS directory for PXF example data files:
+
+``` shell
+ $ sudo -u hdfs hdfs dfs -mkdir -p /data/pxf_examples
+```
+
+2. Create a delimited plain text file:
+
+``` shell
+$ vi /tmp/pxf_hdfs_simple.txt
+```
+
+3. Copy and paste the following data into `pxf_hdfs_simple.txt`:
+
+``` pre
+Prague,Jan,101,4875.33
+Rome,Mar,87,1557.39
+Bangalore,May,317,8936.99
+Beijing,Jul,411,11600.67
+```
+
+Notice the use of the comma `,` to separate the four data fields.
+
+4. Add the data file to HDFS:
+
+``` shell
+$ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_simple.txt 
/data/pxf_examples/
+```
+
+5. Display the contents of the `pxf_hdfs_simple.txt` file stored in HDFS:
+
+``` shell
+$ sudo -u hdfs hdfs dfs -cat /data/pxf_examples/pxf_hdfs_simple.txt
+```
+
+6. Create a second delimited plain text file:
+
+``` shell
+$ vi /tmp/pxf_hdfs_multi.txt
+```
 
-To read the data in the files or to write based on the existing format, 
use `FORMAT`, `P

[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

2016-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606508#comment-15606508
 ] 

ASF GitHub Bot commented on HAWQ-1107:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r84997631
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,388 +2,282 @@
 title: Accessing HDFS File Data
 ---
 
-## Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
 
-Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
+This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
 
--   Test PXF on HDFS before connecting to Hive or HBase.
--   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
+## Prerequisites
 
-## Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
 
-The syntax for creating an external HDFS file is as follows: 
+-   The HDFS plug-in is installed on all cluster nodes.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
 
-``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
-( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]')
-  FORMAT '[TEXT | CSV | CUSTOM]' ();
-```
+## HDFS File Formats
 
-where `` is:
+The PXF HDFS plug-in supports reading the following file formats:
 
-``` pre
-   
FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class]
- | PROFILE=profile-name
-```
+- Text File - comma-separated value (.csv) or delimited format plain text 
file
+- Avro - JSON-defined, schema-based data serialization format
 
-**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
+The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
 
-Use an SQL `SELECT` statement to read from an HDFS READABLE table:
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
 
-``` sql
-SELECT ... FROM table_name;
-```
 
-Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table:
+## HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.  
These tools support typical file system operations including copying and 
listing files, changing file permissions, etc. 
--- End diff --

Change "etc." to "and so forth."


> PXF HDFS documentation - restructure content and include more examples
> --
>
> Key: HAWQ-1107
> URL: https://issues.apache.org/jira/browse/HAWQ-1107
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> the current PXF HDFS documentation does not include any runnable examples.  
> add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, 
> Avro) profiles.  restructure the content as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

2016-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606515#comment-15606515
 ] 

ASF GitHub Bot commented on HAWQ-1107:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85003579
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -415,93 +312,101 @@ The following example uses the Avro schema shown in 
[Sample Avro Schema](#topic_
 {"name":"street", "type":"string"},
 {"name":"city", "type":"string"}]
 }
-  }, {
-   "name": "relationship",
-"type": {
-"type": "enum",
-"name": "relationshipEnum",
-"symbols": 
["MARRIED","LOVE","FRIEND","COLLEAGUE","STRANGER","ENEMY"]
-}
-  }, {
-"name" : "md5",
-"type": {
-"type" : "fixed",
-"name" : "md5Fixed",
-"size" : 4
-}
   } ],
   "doc:" : "A basic schema for storing messages"
 }
 ```
 
- Sample Avro Data 
(JSON)
+### Sample Avro Data 
(JSON)
+
+Create a text file named `pxf_hdfs_avro.txt`:
+
+``` shell
+$ vi /tmp/pxf_hdfs_avro.txt
+```
+
+Enter the following data into `pxf_hdfs_avro.txt`:
 
 ``` pre
-{"id":1, "username":"john","followers":["kate", "santosh"], "rank":null, 
"relationship": "FRIEND", "fmap": {"kate":10,"santosh":4},
-"address":{"street":"renaissance drive", "number":1,"city":"san jose"}, 
"md5":\u3F00\u007A\u0073\u0074}
+{"id":1, "username":"john","followers":["kate", "santosh"], 
"relationship": "FRIEND", "fmap": {"kate":10,"santosh":4}, 
"address":{"number":1, "street":"renaissance drive", "city":"san jose"}}
+
+{"id":2, "username":"jim","followers":["john", "pam"], "relationship": 
"COLLEAGUE", "fmap": {"john":3,"pam":3}, "address":{"number":9, "street":"deer 
creek", "city":"palo alto"}}
+```
+
+The sample data uses a comma `,` to separate top level records and a colon 
`:` to separate map/key values and record field name/values.
 
-{"id":2, "username":"jim","followers":["john", "pam"], "rank":3, 
"relationship": "COLLEAGUE", "fmap": {"john":3,"pam":3}, 
-"address":{"street":"deer creek", "number":9,"city":"palo alto"}, 
"md5":\u0010\u0021\u0003\u0004}
+Convert the text file to Avro format. There are various ways to perform 
the conversion programmatically and via the command line. In this example, we 
use the [Java Avro tools](http://avro.apache.org/releases.html), and the jar 
file resides in the current directory:
+
+``` shell
+$ java -jar ./avro-tools-1.8.1.jar fromjson --schema-file 
/tmp/avro_schema.avsc /tmp/pxf_hdfs_avro.txt > /tmp/pxf_hdfs_avro.avro
 ```
 
-To map this Avro file to an external table, the top-level primitive fields 
("id" of type long and "username" of type string) are mapped to their 
equivalent HAWQ types (bigint and text). The remaining complex fields are 
mapped to text columns:
+The generated Avro binary data file is written to 
`/tmp/pxf_hdfs_avro.avro`. Copy this file to HDFS:
 
-``` sql
-gpadmin=# CREATE EXTERNAL TABLE avro_complex 
-  (id bigint, 
-  username text, 
-  followers text, 
-  rank int, 
-  fmap text, 
-  address text, 
-  relationship text,
-  md5 bytea) 
-LOCATION ('pxf://namehost:51200/tmp/avro_complex?PROFILE=Avro')
-FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
+``` shell
+$ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_avro.avro /data/pxf_examples/
 ```
+### Querying Avro Data
+
+Create a queryable external table from this Avro file:
 
-The above command uses default delimiters for separating components of the 
complex types. This command is equivalent to the one above, but it explicitly 
sets the delimiters using the Avro profile parameters:
+-  Map the top-level primitive fields, `id` (type long) and `username` 
(type string), to their equivalent HAWQ types (bigint and text). 
+-  Map the remaining complex fields to type text.
+-  Explicitly set the record, map, and collection delimiters using the 
Avro profile custom options:
 
 ``` sql
-gpadmin=# CREATE EXTERNAL TABLE avro_complex 
-  (id bigint, 
-  username text, 
-  followers text, 
-  rank int, 
-  fmap text, 
-  address text, 
-  relationship text,
-  md5 bytea) 
-LOCATION 
('pxf://localhost:51200/tmp/avro_complex?PROFILE=Avro&COLLECTION_DELIM=,&MAPKEY_DELIM=:&RECORDKEY_DELIM=:')
-FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
+gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_avro(id bigint, username text, 
followers text, fmap text, relationship text, address text)
+LOCATION 
('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_avro.avro?PROFILE=Avro&CO

[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

2016-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606514#comment-15606514
 ] 

ASF GitHub Bot commented on HAWQ-1107:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85003214
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,388 +2,282 @@
 title: Accessing HDFS File Data
 ---
 
-## Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
 
-Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
+This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
 
--   Test PXF on HDFS before connecting to Hive or HBase.
--   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
+## Prerequisites
 
-## Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
 
-The syntax for creating an external HDFS file is as follows: 
+-   The HDFS plug-in is installed on all cluster nodes.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
 
-``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
-( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]')
-  FORMAT '[TEXT | CSV | CUSTOM]' ();
-```
+## HDFS File Formats
 
-where `` is:
+The PXF HDFS plug-in supports reading the following file formats:
 
-``` pre
-   
FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class]
- | PROFILE=profile-name
-```
+- Text File - comma-separated value (.csv) or delimited format plain text 
file
+- Avro - JSON-defined, schema-based data serialization format
 
-**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
+The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
 
-Use an SQL `SELECT` statement to read from an HDFS READABLE table:
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
 
-``` sql
-SELECT ... FROM table_name;
-```
 
-Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table:
+## HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.  
These tools support typical file system operations including copying and 
listing files, changing file permissions, etc. 
 
-``` sql
-INSERT INTO table_name ...;
-```
+The HDFS file system command is `hdfs dfs  []`. Invoked 
with no options, `hdfs dfs` lists the file system options supported by the tool.
+
+`hdfs dfs` options used in this section are identified in the table below:
+
+| Option  | Description |
+|---|-|
+| `-cat`| Display file contents. |
+| `-mkdir`| Create directory in HDFS. |
+| `-put`| Copy file from local file system to HDFS. |
+
+### Create Data Files
+
+Perform the following steps to create data files used in subsequent 
exercises:
+
+1. Create an HDFS directory for PXF example data files:
+
+``` shell
+ $ sudo -u hdfs hdfs dfs -mkdir -p /data/pxf_examples
+```
+
+2. Create a delimited plain text file:
+
+``` shell
+$ vi /tmp/pxf_hdfs_simple.txt
+```
+
+3. Copy and paste the following data into `pxf_hdfs_simple.txt`:
+
+``` pre
+Prague,Jan,101,4875.33
+Rome,Mar,87,1557.39
+Bangalore,May,317,8936.99
+Beijing,Jul,411,11600.67
+```
+
+Notice the use of the comma `,` to separate the four data fields.
+
+4. Add the data file to HDFS:
+
+``` shell
+$ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_simple.txt 
/data/pxf_examples/
+```
+
+5. Display the contents of the `pxf_hdfs_simple.txt` file stored in HDFS:
+
+``` shell
+$ sudo -u hdfs hdfs dfs -cat /data/pxf_examples/pxf_hdfs_simple.txt
+```
+
+6. Create a second delimited plain text file:
+
+``` shell
+$ vi /tmp/pxf_hdfs_multi.txt
+```
 
-To read the data in the files or to write based on the existing format, 
use `FORMAT`, `P

[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

2016-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606509#comment-15606509
 ] 

ASF GitHub Bot commented on HAWQ-1107:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r84998515
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,388 +2,282 @@
 title: Accessing HDFS File Data
 ---
 
-## Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
 
-Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
+This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
 
--   Test PXF on HDFS before connecting to Hive or HBase.
--   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
+## Prerequisites
 
-## Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
 
-The syntax for creating an external HDFS file is as follows: 
+-   The HDFS plug-in is installed on all cluster nodes.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
 
-``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
-( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]')
-  FORMAT '[TEXT | CSV | CUSTOM]' ();
-```
+## HDFS File Formats
 
-where `` is:
+The PXF HDFS plug-in supports reading the following file formats:
 
-``` pre
-   
FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class]
- | PROFILE=profile-name
-```
+- Text File - comma-separated value (.csv) or delimited format plain text 
file
+- Avro - JSON-defined, schema-based data serialization format
 
-**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
+The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
 
-Use an SQL `SELECT` statement to read from an HDFS READABLE table:
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
 
-``` sql
-SELECT ... FROM table_name;
-```
 
-Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table:
+## HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.  
These tools support typical file system operations including copying and 
listing files, changing file permissions, etc. 
 
-``` sql
-INSERT INTO table_name ...;
-```
+The HDFS file system command is `hdfs dfs  []`. Invoked 
with no options, `hdfs dfs` lists the file system options supported by the tool.
+
+`hdfs dfs` options used in this section are identified in the table below:
--- End diff --

Edit:

The `hdfs dfs` options used in this section are:


> PXF HDFS documentation - restructure content and include more examples
> --
>
> Key: HAWQ-1107
> URL: https://issues.apache.org/jira/browse/HAWQ-1107
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> the current PXF HDFS documentation does not include any runnable examples.  
> add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, 
> Avro) profiles.  restructure the content as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

2016-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606511#comment-15606511
 ] 

ASF GitHub Bot commented on HAWQ-1107:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r84996425
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,388 +2,282 @@
 title: Accessing HDFS File Data
 ---
 
-## Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
 
-Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
+This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
 
--   Test PXF on HDFS before connecting to Hive or HBase.
--   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
+## Prerequisites
 
-## Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
 
-The syntax for creating an external HDFS file is as follows: 
+-   The HDFS plug-in is installed on all cluster nodes.
--- End diff --

Add an XREF here.


> PXF HDFS documentation - restructure content and include more examples
> --
>
> Key: HAWQ-1107
> URL: https://issues.apache.org/jira/browse/HAWQ-1107
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> the current PXF HDFS documentation does not include any runnable examples.  
> add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, 
> Avro) profiles.  restructure the content as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

2016-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606513#comment-15606513
 ] 

ASF GitHub Bot commented on HAWQ-1107:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85002565
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,388 +2,282 @@
 title: Accessing HDFS File Data
 ---
 
-## Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
 
-Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
+This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
 
--   Test PXF on HDFS before connecting to Hive or HBase.
--   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
+## Prerequisites
 
-## Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
 
-The syntax for creating an external HDFS file is as follows: 
+-   The HDFS plug-in is installed on all cluster nodes.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
 
-``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
-( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]')
-  FORMAT '[TEXT | CSV | CUSTOM]' ();
-```
+## HDFS File Formats
 
-where `` is:
+The PXF HDFS plug-in supports reading the following file formats:
 
-``` pre
-   
FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class]
- | PROFILE=profile-name
-```
+- Text File - comma-separated value (.csv) or delimited format plain text 
file
+- Avro - JSON-defined, schema-based data serialization format
 
-**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
+The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
 
-Use an SQL `SELECT` statement to read from an HDFS READABLE table:
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
 
-``` sql
-SELECT ... FROM table_name;
-```
 
-Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table:
+## HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.  
These tools support typical file system operations including copying and 
listing files, changing file permissions, etc. 
 
-``` sql
-INSERT INTO table_name ...;
-```
+The HDFS file system command is `hdfs dfs  []`. Invoked 
with no options, `hdfs dfs` lists the file system options supported by the tool.
+
+`hdfs dfs` options used in this section are identified in the table below:
+
+| Option  | Description |
+|---|-|
+| `-cat`| Display file contents. |
+| `-mkdir`| Create directory in HDFS. |
+| `-put`| Copy file from local file system to HDFS. |
+
+### Create Data Files
+
+Perform the following steps to create data files used in subsequent 
exercises:
+
+1. Create an HDFS directory for PXF example data files:
+
+``` shell
+ $ sudo -u hdfs hdfs dfs -mkdir -p /data/pxf_examples
+```
+
+2. Create a delimited plain text file:
+
+``` shell
+$ vi /tmp/pxf_hdfs_simple.txt
+```
+
+3. Copy and paste the following data into `pxf_hdfs_simple.txt`:
+
+``` pre
+Prague,Jan,101,4875.33
+Rome,Mar,87,1557.39
+Bangalore,May,317,8936.99
+Beijing,Jul,411,11600.67
+```
+
+Notice the use of the comma `,` to separate the four data fields.
+
+4. Add the data file to HDFS:
+
+``` shell
+$ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_simple.txt 
/data/pxf_examples/
+```
+
+5. Display the contents of the `pxf_hdfs_simple.txt` file stored in HDFS:
+
+``` shell
+$ sudo -u hdfs hdfs dfs -cat /data/pxf_examples/pxf_hdfs_simple.txt
+```
+
+6. Create a second delimited plain text file:
+
+``` shell
+$ vi /tmp/pxf_hdfs_multi.txt
+```
 
-To read the data in the files or to write based on the existing format, 
use `FORMAT`, `P

[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

2016-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606506#comment-15606506
 ] 

ASF GitHub Bot commented on HAWQ-1107:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r84999127
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,388 +2,282 @@
 title: Accessing HDFS File Data
 ---
 
-## Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
 
-Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
+This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
 
--   Test PXF on HDFS before connecting to Hive or HBase.
--   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
+## Prerequisites
 
-## Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
 
-The syntax for creating an external HDFS file is as follows: 
+-   The HDFS plug-in is installed on all cluster nodes.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
 
-``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
-( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]')
-  FORMAT '[TEXT | CSV | CUSTOM]' ();
-```
+## HDFS File Formats
 
-where `` is:
+The PXF HDFS plug-in supports reading the following file formats:
 
-``` pre
-   
FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class]
- | PROFILE=profile-name
-```
+- Text File - comma-separated value (.csv) or delimited format plain text 
file
+- Avro - JSON-defined, schema-based data serialization format
 
-**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
+The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
 
-Use an SQL `SELECT` statement to read from an HDFS READABLE table:
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
 
-``` sql
-SELECT ... FROM table_name;
-```
 
-Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table:
+## HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.  
These tools support typical file system operations including copying and 
listing files, changing file permissions, etc. 
 
-``` sql
-INSERT INTO table_name ...;
-```
+The HDFS file system command is `hdfs dfs  []`. Invoked 
with no options, `hdfs dfs` lists the file system options supported by the tool.
+
+`hdfs dfs` options used in this section are identified in the table below:
+
+| Option  | Description |
+|---|-|
+| `-cat`| Display file contents. |
+| `-mkdir`| Create directory in HDFS. |
+| `-put`| Copy file from local file system to HDFS. |
+
+### Create Data Files
+
+Perform the following steps to create data files used in subsequent 
exercises:
--- End diff --

I think this procedure needs a bit more explanation about what its trying 
to accomplish. It seems like this should be optional in the context of the 
larger topic, as readers might already have files in HDFS that they want to 
reference.  Just add some notes to say that you can optionally follow the steps 
to create some sample files in HDFS for use in later examples.


> PXF HDFS documentation - restructure content and include more examples
> --
>
> Key: HAWQ-1107
> URL: https://issues.apache.org/jira/browse/HAWQ-1107
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> the current PXF HDFS documentation does not include any runnable examples.  
> add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, 
> Avro) profiles.  restructure the content as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

2016-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606512#comment-15606512
 ] 

ASF GitHub Bot commented on HAWQ-1107:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r84999604
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,388 +2,282 @@
 title: Accessing HDFS File Data
 ---
 
-## Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
 
-Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
+This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
 
--   Test PXF on HDFS before connecting to Hive or HBase.
--   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
+## Prerequisites
 
-## Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
 
-The syntax for creating an external HDFS file is as follows: 
+-   The HDFS plug-in is installed on all cluster nodes.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
 
-``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
-( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]')
-  FORMAT '[TEXT | CSV | CUSTOM]' ();
-```
+## HDFS File Formats
 
-where `` is:
+The PXF HDFS plug-in supports reading the following file formats:
 
-``` pre
-   
FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class]
- | PROFILE=profile-name
-```
+- Text File - comma-separated value (.csv) or delimited format plain text 
file
+- Avro - JSON-defined, schema-based data serialization format
 
-**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
+The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
 
-Use an SQL `SELECT` statement to read from an HDFS READABLE table:
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
 
-``` sql
-SELECT ... FROM table_name;
-```
 
-Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table:
+## HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.  
These tools support typical file system operations including copying and 
listing files, changing file permissions, etc. 
 
-``` sql
-INSERT INTO table_name ...;
-```
+The HDFS file system command is `hdfs dfs  []`. Invoked 
with no options, `hdfs dfs` lists the file system options supported by the tool.
+
+`hdfs dfs` options used in this section are identified in the table below:
+
+| Option  | Description |
+|---|-|
+| `-cat`| Display file contents. |
+| `-mkdir`| Create directory in HDFS. |
+| `-put`| Copy file from local file system to HDFS. |
+
+### Create Data Files
+
+Perform the following steps to create data files used in subsequent 
exercises:
+
+1. Create an HDFS directory for PXF example data files:
+
+``` shell
+ $ sudo -u hdfs hdfs dfs -mkdir -p /data/pxf_examples
+```
+
+2. Create a delimited plain text file:
+
+``` shell
+$ vi /tmp/pxf_hdfs_simple.txt
--- End diff --

Does it make sense to change these into `echo` commands so they can just be 
cut/pasted?  Like:

$ echo 'Prague,Jan,101,4875.33
Rome,Mar,87,1557.39
Bangalore,May,317,8936.99
Beijing,Jul,411,11600.67' >> pxf_hdfs_simple.txt


> PXF HDFS documentation - restructure content and include more examples
> --
>
> Key: HAWQ-1107
> URL: https://issues.apache.org/jira/browse/HAWQ-1107
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> the current PXF HDFS documentation does not include any runnable examples.  
> add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritabl

[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

2016-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606507#comment-15606507
 ] 

ASF GitHub Bot commented on HAWQ-1107:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r84997781
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,388 +2,282 @@
 title: Accessing HDFS File Data
 ---
 
-## Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
 
-Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
+This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
 
--   Test PXF on HDFS before connecting to Hive or HBase.
--   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
+## Prerequisites
 
-## Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
 
-The syntax for creating an external HDFS file is as follows: 
+-   The HDFS plug-in is installed on all cluster nodes.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
 
-``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
-( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]')
-  FORMAT '[TEXT | CSV | CUSTOM]' ();
-```
+## HDFS File Formats
 
-where `` is:
+The PXF HDFS plug-in supports reading the following file formats:
 
-``` pre
-   
FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class]
- | PROFILE=profile-name
-```
+- Text File - comma-separated value (.csv) or delimited format plain text 
file
+- Avro - JSON-defined, schema-based data serialization format
 
-**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
+The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
 
-Use an SQL `SELECT` statement to read from an HDFS READABLE table:
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
 
-``` sql
-SELECT ... FROM table_name;
-```
 
-Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table:
+## HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.  
These tools support typical file system operations including copying and 
listing files, changing file permissions, etc. 
 
-``` sql
-INSERT INTO table_name ...;
-```
+The HDFS file system command is `hdfs dfs  []`. Invoked 
with no options, `hdfs dfs` lists the file system options supported by the tool.
--- End diff --

command -> command syntax


> PXF HDFS documentation - restructure content and include more examples
> --
>
> Key: HAWQ-1107
> URL: https://issues.apache.org/jira/browse/HAWQ-1107
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> the current PXF HDFS documentation does not include any runnable examples.  
> add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, 
> Avro) profiles.  restructure the content as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

2016-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606649#comment-15606649
 ] 

ASF GitHub Bot commented on HAWQ-1107:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-hawq-docs/pull/34


> PXF HDFS documentation - restructure content and include more examples
> --
>
> Key: HAWQ-1107
> URL: https://issues.apache.org/jira/browse/HAWQ-1107
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> the current PXF HDFS documentation does not include any runnable examples.  
> add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, 
> Avro) profiles.  restructure the content as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

2016-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15608979#comment-15608979
 ] 

ASF GitHub Bot commented on HAWQ-1107:
--

GitHub user lisakowen opened a pull request:

https://github.com/apache/incubator-hawq-docs/pull/38

HAWQ-1107 - more subnav changes for HDFS plugin

remove all submenus

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lisakowen/incubator-hawq-docs 
feature/subnav-pxfhdfs-enhance

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq-docs/pull/38.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #38






> PXF HDFS documentation - restructure content and include more examples
> --
>
> Key: HAWQ-1107
> URL: https://issues.apache.org/jira/browse/HAWQ-1107
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> the current PXF HDFS documentation does not include any runnable examples.  
> add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, 
> Avro) profiles.  restructure the content as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

2016-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15609383#comment-15609383
 ] 

ASF GitHub Bot commented on HAWQ-1107:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-hawq-docs/pull/38


> PXF HDFS documentation - restructure content and include more examples
> --
>
> Key: HAWQ-1107
> URL: https://issues.apache.org/jira/browse/HAWQ-1107
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> the current PXF HDFS documentation does not include any runnable examples.  
> add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, 
> Avro) profiles.  restructure the content as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

2016-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15609382#comment-15609382
 ] 

ASF GitHub Bot commented on HAWQ-1107:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-hawq-docs/pull/33


> PXF HDFS documentation - restructure content and include more examples
> --
>
> Key: HAWQ-1107
> URL: https://issues.apache.org/jira/browse/HAWQ-1107
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> the current PXF HDFS documentation does not include any runnable examples.  
> add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, 
> Avro) profiles.  restructure the content as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation

2016-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15609892#comment-15609892
 ] 

ASF GitHub Bot commented on HAWQ-1071:
--

GitHub user lisakowen opened a pull request:

https://github.com/apache/incubator-hawq-docs/pull/39

HAWQ-1071 - add examples for HiveText and HiveRC plugins

added examples, restructured content, added hive command line section.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lisakowen/incubator-hawq-docs 
feature/pxfhive-enhance

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq-docs/pull/39.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #39


commit 0398a62fefd3627273927f938b4d082a25bf3003
Author: Lisa Owen 
Date:   2016-09-26T21:37:04Z

restructure PXF Hive pulug-in page; add more relevant examples

commit 457d703a3f5c057e241acf985fbc35da34f6a075
Author: Lisa Owen 
Date:   2016-09-26T22:40:10Z

PXF Hive plug-in mods

commit 822d7545e746490e55507866c62dca5ea2d5349a
Author: Lisa Owen 
Date:   2016-10-03T22:19:03Z

clean up some extra whitespace

commit 8c986b60b8db3edd77c10f23704cc9174c52a803
Author: Lisa Owen 
Date:   2016-10-11T18:37:34Z

include list of hive profile names in file format section

commit 150fa67857871d58ea05eb14c023215c932ab7b1
Author: Lisa Owen 
Date:   2016-10-11T19:03:39Z

link to CREATE EXTERNAL TABLE ref page

commit 5cdd8f8c35a51360fe3bfdedeff796bf1e0f31f3
Author: Lisa Owen 
Date:   2016-10-11T20:27:17Z

sql commands all caps

commit 67e8b9699c9eec64d04ce9e6048ffb385f7f3573
Author: Lisa Owen 
Date:   2016-10-11T20:33:35Z

use <> for optional args

commit 54b2c01a80d477cc093d7eb1ed2aa8c0bf762d36
Author: Lisa Owen 
Date:   2016-10-22T00:16:24Z

fix some duplicate ids

commit 284c3ec2db38e8d9020826e3bf292efad76c1819
Author: Lisa Owen 
Date:   2016-10-26T15:38:37Z

restructure to use numbered steps

commit 2a38a0322abda804cfd4fc8aa39f142f0d83ea11
Author: Lisa Owen 
Date:   2016-10-26T17:20:28Z

note/notice




> add PXF HiveText and HiveRC profile examples to the documentation
> -
>
> Key: HAWQ-1071
> URL: https://issues.apache.org/jira/browse/HAWQ-1071
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> the current PXF Hive documentation includes an example for only the Hive 
> profile.  add examples for HiveText and HiveRC profiles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation

2016-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15609898#comment-15609898
 ] 

ASF GitHub Bot commented on HAWQ-1071:
--

GitHub user lisakowen opened a pull request:

https://github.com/apache/incubator-hawq-docs/pull/40

HAWQ-1071 - subnav changes for pxf enhancement work

removed all submenus from the subnav

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lisakowen/incubator-hawq-docs 
feature/subnav-pxfhive-enhance

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq-docs/pull/40.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #40


commit c3f381265b2c48b89f98863888ccd00b2926880c
Author: Lisa Owen 
Date:   2016-10-24T19:43:04Z

subnav chgs for hive plugin content restructure

commit 54445c6815a166e4e275455ea64221322087
Author: Lisa Owen 
Date:   2016-10-26T16:36:31Z

remove submenu from pxf hive plugin subnav




> add PXF HiveText and HiveRC profile examples to the documentation
> -
>
> Key: HAWQ-1071
> URL: https://issues.apache.org/jira/browse/HAWQ-1071
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> the current PXF Hive documentation includes an example for only the Hive 
> profile.  add examples for HiveText and HiveRC profiles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612282#comment-15612282
 ] 

ASF GitHub Bot commented on HAWQ-1107:
--

Github user kavinderd commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85362806
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,506 +2,449 @@
 title: Accessing HDFS File Data
 ---
 
-## Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
 
-Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
+This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
 
--   Test PXF on HDFS before connecting to Hive or HBase.
--   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
+## Prerequisites
 
-## Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
 
-The syntax for creating an external HDFS file is as follows: 
+-   The HDFS plug-in is installed on all cluster nodes. See [Installing 
PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
 
-``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
-( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]')
-  FORMAT '[TEXT | CSV | CUSTOM]' ();
-```
+## HDFS File Formats
 
-where `` is:
+The PXF HDFS plug-in supports reading the following file formats:
 
-``` pre
-   
FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class]
- | PROFILE=profile-name
-```
+- Text File - comma-separated value (.csv) or delimited format plain text 
file
+- Avro - JSON-defined, schema-based data serialization format
 
-**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
+The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
 
-Use an SQL `SELECT` statement to read from an HDFS READABLE table:
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
 
-``` sql
-SELECT ... FROM table_name;
+If you find that the pre-defined PXF HDFS profiles do not meet your needs, 
you may choose to create a custom HDFS profile from the existing HDFS 
serialization and deserialization classes. Refer to [Adding and Updating 
Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on 
creating a custom profile.
+
+## HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.  
These tools support typical file system operations including copying and 
listing files, changing file permissions, and so forth. 
+
+The HDFS file system command syntax is `hdfs dfs  []`. 
Invoked with no options, `hdfs dfs` lists the file system options supported by 
the tool.
+
+`hdfs dfs` options used in this topic are:
+
+| Option  | Description |
+|---|-|
+| `-cat`| Display file contents. |
+| `-mkdir`| Create directory in HDFS. |
+| `-put`| Copy file from local file system to HDFS. |
+
+Examples:
+
+Create a directory in HDFS:
+
+``` shell
+$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir
 ```
 
-Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table:
+Copy a text file to HDFS:
 
-``` sql
-INSERT INTO table_name ...;
+``` shell
+$ sudo -u hdfs hdfs dfs -put /tmp/example.txt /data/exampledir/
 ```
 
-To read the data in the files or to write based on the existing format, 
use `FORMAT`, `PROFILE`, or one of the classes.
-
-This topic describes the following:
-
--   FORMAT clause
--   Profile
--   Accessor
--   Resolver
--   Avro
-
-**Note:** For more details about the API and classes, see [PXF External 
Tables and 
API](PXFExternalTableandAPIReference.html#pxfexternaltableandapireference).
-
-### FORMAT clause
-
-Use one of the following formats to read data with any PXF connector:
-
--   `FORMAT 'TEXT'`: Use with plain delimited text files on HDFS.
--   `FORMAT 'CSV'`: Us

[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612280#comment-15612280
 ] 

ASF GitHub Bot commented on HAWQ-1107:
--

Github user kavinderd commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85361807
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,506 +2,449 @@
 title: Accessing HDFS File Data
 ---
 
-## Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
 
-Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
+This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
 
--   Test PXF on HDFS before connecting to Hive or HBase.
--   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
+## Prerequisites
 
-## Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
 
-The syntax for creating an external HDFS file is as follows: 
+-   The HDFS plug-in is installed on all cluster nodes. See [Installing 
PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
 
-``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
-( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]')
-  FORMAT '[TEXT | CSV | CUSTOM]' ();
-```
+## HDFS File Formats
 
-where `` is:
+The PXF HDFS plug-in supports reading the following file formats:
 
-``` pre
-   
FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class]
- | PROFILE=profile-name
-```
+- Text File - comma-separated value (.csv) or delimited format plain text 
file
+- Avro - JSON-defined, schema-based data serialization format
 
-**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
+The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
 
-Use an SQL `SELECT` statement to read from an HDFS READABLE table:
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
 
-``` sql
-SELECT ... FROM table_name;
+If you find that the pre-defined PXF HDFS profiles do not meet your needs, 
you may choose to create a custom HDFS profile from the existing HDFS 
serialization and deserialization classes. Refer to [Adding and Updating 
Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on 
creating a custom profile.
+
+## HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.  
These tools support typical file system operations including copying and 
listing files, changing file permissions, and so forth. 
+
+The HDFS file system command syntax is `hdfs dfs  []`. 
Invoked with no options, `hdfs dfs` lists the file system options supported by 
the tool.
+
+`hdfs dfs` options used in this topic are:
+
+| Option  | Description |
+|---|-|
+| `-cat`| Display file contents. |
+| `-mkdir`| Create directory in HDFS. |
+| `-put`| Copy file from local file system to HDFS. |
+
+Examples:
+
+Create a directory in HDFS:
+
+``` shell
+$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir
--- End diff --

You don't necessarily have to run hdfs commands as `sudo -u hdfs` if the 
current user has the hdfs client and permissions.


> PXF HDFS documentation - restructure content and include more examples
> --
>
> Key: HAWQ-1107
> URL: https://issues.apache.org/jira/browse/HAWQ-1107
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> the current PXF HDFS documentation does not include any runnable examples.  
> add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, 
> Avro) profiles.  restructure the content as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#633

[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612281#comment-15612281
 ] 

ASF GitHub Bot commented on HAWQ-1107:
--

Github user kavinderd commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85362384
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,506 +2,449 @@
 title: Accessing HDFS File Data
 ---
 
-## Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
 
-Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
+This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
 
--   Test PXF on HDFS before connecting to Hive or HBase.
--   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
+## Prerequisites
 
-## Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
 
-The syntax for creating an external HDFS file is as follows: 
+-   The HDFS plug-in is installed on all cluster nodes. See [Installing 
PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
 
-``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
-( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]')
-  FORMAT '[TEXT | CSV | CUSTOM]' ();
-```
+## HDFS File Formats
 
-where `` is:
+The PXF HDFS plug-in supports reading the following file formats:
 
-``` pre
-   
FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class]
- | PROFILE=profile-name
-```
+- Text File - comma-separated value (.csv) or delimited format plain text 
file
+- Avro - JSON-defined, schema-based data serialization format
 
-**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
+The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
 
-Use an SQL `SELECT` statement to read from an HDFS READABLE table:
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
 
-``` sql
-SELECT ... FROM table_name;
+If you find that the pre-defined PXF HDFS profiles do not meet your needs, 
you may choose to create a custom HDFS profile from the existing HDFS 
serialization and deserialization classes. Refer to [Adding and Updating 
Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on 
creating a custom profile.
+
+## HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.  
These tools support typical file system operations including copying and 
listing files, changing file permissions, and so forth. 
+
+The HDFS file system command syntax is `hdfs dfs  []`. 
Invoked with no options, `hdfs dfs` lists the file system options supported by 
the tool.
+
+`hdfs dfs` options used in this topic are:
+
+| Option  | Description |
+|---|-|
+| `-cat`| Display file contents. |
+| `-mkdir`| Create directory in HDFS. |
+| `-put`| Copy file from local file system to HDFS. |
+
+Examples:
+
+Create a directory in HDFS:
+
+``` shell
+$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir
 ```
 
-Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table:
+Copy a text file to HDFS:
 
-``` sql
-INSERT INTO table_name ...;
+``` shell
+$ sudo -u hdfs hdfs dfs -put /tmp/example.txt /data/exampledir/
 ```
 
-To read the data in the files or to write based on the existing format, 
use `FORMAT`, `PROFILE`, or one of the classes.
-
-This topic describes the following:
-
--   FORMAT clause
--   Profile
--   Accessor
--   Resolver
--   Avro
-
-**Note:** For more details about the API and classes, see [PXF External 
Tables and 
API](PXFExternalTableandAPIReference.html#pxfexternaltableandapireference).
-
-### FORMAT clause
-
-Use one of the following formats to read data with any PXF connector:
-
--   `FORMAT 'TEXT'`: Use with plain delimited text files on HDFS.
--   `FORMAT 'CSV'`: Us

[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612279#comment-15612279
 ] 

ASF GitHub Bot commented on HAWQ-1107:
--

Github user kavinderd commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85358483
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,506 +2,449 @@
 title: Accessing HDFS File Data
 ---
 
-## Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
 
-Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
+This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
 
--   Test PXF on HDFS before connecting to Hive or HBase.
--   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
+## Prerequisites
 
-## Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
 
-The syntax for creating an external HDFS file is as follows: 
+-   The HDFS plug-in is installed on all cluster nodes. See [Installing 
PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
 
-``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
-( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]')
-  FORMAT '[TEXT | CSV | CUSTOM]' ();
-```
+## HDFS File Formats
 
-where `` is:
+The PXF HDFS plug-in supports reading the following file formats:
 
-``` pre
-   
FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class]
- | PROFILE=profile-name
-```
+- Text File - comma-separated value (.csv) or delimited format plain text 
file
+- Avro - JSON-defined, schema-based data serialization format
 
-**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
+The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
 
-Use an SQL `SELECT` statement to read from an HDFS READABLE table:
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
 
-``` sql
-SELECT ... FROM table_name;
+If you find that the pre-defined PXF HDFS profiles do not meet your needs, 
you may choose to create a custom HDFS profile from the existing HDFS 
serialization and deserialization classes. Refer to [Adding and Updating 
Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on 
creating a custom profile.
+
+## HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.  
These tools support typical file system operations including copying and 
listing files, changing file permissions, and so forth. 
+
+The HDFS file system command syntax is `hdfs dfs  []`. 
Invoked with no options, `hdfs dfs` lists the file system options supported by 
the tool.
+
+`hdfs dfs` options used in this topic are:
+
+| Option  | Description |
+|---|-|
+| `-cat`| Display file contents. |
+| `-mkdir`| Create directory in HDFS. |
+| `-put`| Copy file from local file system to HDFS. |
+
+Examples:
+
+Create a directory in HDFS:
+
+``` shell
+$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir
 ```
 
-Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table:
+Copy a text file to HDFS:
 
-``` sql
-INSERT INTO table_name ...;
+``` shell
+$ sudo -u hdfs hdfs dfs -put /tmp/example.txt /data/exampledir/
 ```
 
-To read the data in the files or to write based on the existing format, 
use `FORMAT`, `PROFILE`, or one of the classes.
-
-This topic describes the following:
-
--   FORMAT clause
--   Profile
--   Accessor
--   Resolver
--   Avro
-
-**Note:** For more details about the API and classes, see [PXF External 
Tables and 
API](PXFExternalTableandAPIReference.html#pxfexternaltableandapireference).
-
-### FORMAT clause
-
-Use one of the following formats to read data with any PXF connector:
-
--   `FORMAT 'TEXT'`: Use with plain delimited text files on HDFS.
--   `FORMAT 'CSV'`: Us

[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612342#comment-15612342
 ] 

ASF GitHub Bot commented on HAWQ-1107:
--

Github user lisakowen commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85371514
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,506 +2,449 @@
 title: Accessing HDFS File Data
 ---
 
-## Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
 
-Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
+This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
 
--   Test PXF on HDFS before connecting to Hive or HBase.
--   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
+## Prerequisites
 
-## Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
 
-The syntax for creating an external HDFS file is as follows: 
+-   The HDFS plug-in is installed on all cluster nodes. See [Installing 
PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
 
-``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
-( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]')
-  FORMAT '[TEXT | CSV | CUSTOM]' ();
-```
+## HDFS File Formats
 
-where `` is:
+The PXF HDFS plug-in supports reading the following file formats:
 
-``` pre
-   
FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class]
- | PROFILE=profile-name
-```
+- Text File - comma-separated value (.csv) or delimited format plain text 
file
+- Avro - JSON-defined, schema-based data serialization format
 
-**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
+The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
 
-Use an SQL `SELECT` statement to read from an HDFS READABLE table:
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
 
-``` sql
-SELECT ... FROM table_name;
+If you find that the pre-defined PXF HDFS profiles do not meet your needs, 
you may choose to create a custom HDFS profile from the existing HDFS 
serialization and deserialization classes. Refer to [Adding and Updating 
Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on 
creating a custom profile.
+
+## HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.  
These tools support typical file system operations including copying and 
listing files, changing file permissions, and so forth. 
+
+The HDFS file system command syntax is `hdfs dfs  []`. 
Invoked with no options, `hdfs dfs` lists the file system options supported by 
the tool.
+
+`hdfs dfs` options used in this topic are:
+
+| Option  | Description |
+|---|-|
+| `-cat`| Display file contents. |
+| `-mkdir`| Create directory in HDFS. |
+| `-put`| Copy file from local file system to HDFS. |
+
+Examples:
+
+Create a directory in HDFS:
+
+``` shell
+$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir
 ```
 
-Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table:
+Copy a text file to HDFS:
 
-``` sql
-INSERT INTO table_name ...;
+``` shell
+$ sudo -u hdfs hdfs dfs -put /tmp/example.txt /data/exampledir/
 ```
 
-To read the data in the files or to write based on the existing format, 
use `FORMAT`, `PROFILE`, or one of the classes.
-
-This topic describes the following:
-
--   FORMAT clause
--   Profile
--   Accessor
--   Resolver
--   Avro
-
-**Note:** For more details about the API and classes, see [PXF External 
Tables and 
API](PXFExternalTableandAPIReference.html#pxfexternaltableandapireference).
-
-### FORMAT clause
-
-Use one of the following formats to read data with any PXF connector:
-
--   `FORMAT 'TEXT'`: Use with plain delimited text files on HDFS.
--   `FORMAT 'CSV'`: Us

[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612340#comment-15612340
 ] 

ASF GitHub Bot commented on HAWQ-1107:
--

Github user lisakowen commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85371358
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,506 +2,449 @@
 title: Accessing HDFS File Data
 ---
 
-## Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
 
-Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
+This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
 
--   Test PXF on HDFS before connecting to Hive or HBase.
--   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
+## Prerequisites
 
-## Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
 
-The syntax for creating an external HDFS file is as follows: 
+-   The HDFS plug-in is installed on all cluster nodes. See [Installing 
PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
 
-``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
-( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]')
-  FORMAT '[TEXT | CSV | CUSTOM]' ();
-```
+## HDFS File Formats
 
-where `` is:
+The PXF HDFS plug-in supports reading the following file formats:
 
-``` pre
-   
FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class]
- | PROFILE=profile-name
-```
+- Text File - comma-separated value (.csv) or delimited format plain text 
file
+- Avro - JSON-defined, schema-based data serialization format
 
-**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
+The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
 
-Use an SQL `SELECT` statement to read from an HDFS READABLE table:
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
 
-``` sql
-SELECT ... FROM table_name;
+If you find that the pre-defined PXF HDFS profiles do not meet your needs, 
you may choose to create a custom HDFS profile from the existing HDFS 
serialization and deserialization classes. Refer to [Adding and Updating 
Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on 
creating a custom profile.
+
+## HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.  
These tools support typical file system operations including copying and 
listing files, changing file permissions, and so forth. 
+
+The HDFS file system command syntax is `hdfs dfs  []`. 
Invoked with no options, `hdfs dfs` lists the file system options supported by 
the tool.
+
+`hdfs dfs` options used in this topic are:
+
+| Option  | Description |
+|---|-|
+| `-cat`| Display file contents. |
+| `-mkdir`| Create directory in HDFS. |
+| `-put`| Copy file from local file system to HDFS. |
+
+Examples:
+
+Create a directory in HDFS:
+
+``` shell
+$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir
 ```
 
-Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table:
+Copy a text file to HDFS:
 
-``` sql
-INSERT INTO table_name ...;
+``` shell
+$ sudo -u hdfs hdfs dfs -put /tmp/example.txt /data/exampledir/
 ```
 
-To read the data in the files or to write based on the existing format, 
use `FORMAT`, `PROFILE`, or one of the classes.
-
-This topic describes the following:
-
--   FORMAT clause
--   Profile
--   Accessor
--   Resolver
--   Avro
-
-**Note:** For more details about the API and classes, see [PXF External 
Tables and 
API](PXFExternalTableandAPIReference.html#pxfexternaltableandapireference).
-
-### FORMAT clause
-
-Use one of the following formats to read data with any PXF connector:
-
--   `FORMAT 'TEXT'`: Use with plain delimited text files on HDFS.
--   `FORMAT 'CSV'`: Us

[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612360#comment-15612360
 ] 

ASF GitHub Bot commented on HAWQ-1071:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85371576
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -151,184 +477,120 @@ To enable HCatalog query integration in HAWQ, 
perform the following steps:
 postgres=# GRANT ALL ON PROTOCOL pxf TO "role";
 ``` 
 
-3.  To query a Hive table with HCatalog integration, simply query HCatalog 
directly from HAWQ. The query syntax is:
 
-``` sql
-postgres=# SELECT * FROM hcatalog.hive-db-name.hive-table-name;
-```
+To query a Hive table with HCatalog integration, query HCatalog directly 
from HAWQ. The query syntax is:
+
+``` sql
+postgres=# SELECT * FROM hcatalog.hive-db-name.hive-table-name;
+```
 
-For example:
+For example:
 
-``` sql
-postgres=# SELECT * FROM hcatalog.default.sales;
-```
-
-4.  To obtain a description of a Hive table with HCatalog integration, you 
can use the `psql` client interface.
--   Within HAWQ, use either the `\d
 hcatalog.hive-db-name.hive-table-name` or `\d+ 
hcatalog.hive-db-name.hive-table-name` commands to describe a 
single table. For example, from the `psql` client interface:
-
-``` shell
-$ psql -d postgres
-postgres=# \d hcatalog.default.test
-
-PXF Hive Table "default.test"
-Column|  Type  
---+
- name | text
- type | text
- supplier_key | int4
- full_price   | float8 
-```
--   Use `\d hcatalog.hive-db-name.*` to describe the whole database 
schema. For example:
-
-``` shell
-postgres=# \d hcatalog.default.*
-
-PXF Hive Table "default.test"
-Column|  Type  
---+
- type | text
- name | text
- supplier_key | int4
- full_price   | float8
-
-PXF Hive Table "default.testabc"
- Column | Type 
-+--
- type   | text
- name   | text
-```
--   Use `\d hcatalog.*.*` to describe the whole schema:
-
-``` shell
-postgres=# \d hcatalog.*.*
-
-PXF Hive Table "default.test"
-Column|  Type  
---+
- type | text
- name | text
- supplier_key | int4
- full_price   | float8
-
-PXF Hive Table "default.testabc"
- Column | Type 
-+--
- type   | text
- name   | text
-
-PXF Hive Table "userdb.test"
-  Column  | Type 
---+--
- address  | text
- username | text
- 
-```
-
-**Note:** When using `\d` or `\d+` commands in the `psql` HAWQ client, 
`hcatalog` will not be listed as a database. If you use other `psql` compatible 
clients, `hcatalog` will be listed as a database with a size value of `-1` 
since `hcatalog` is not a real database in HAWQ.
-
-5.  Alternatively, you can use the **pxf\_get\_item\_fields** user-defined 
function (UDF) to obtain Hive table descriptions from other client interfaces 
or third-party applications. The UDF takes a PXF profile and a table pattern 
string as its input parameters.
-
-**Note:** Currently the only supported input profile is `'Hive'`.
-
-For example, the following statement returns a description of a 
specific table. The description includes path, itemname (table), fieldname, and 
fieldtype.
+``` sql
+postgres=# SELECT * FROM hcatalog.default.sales_info;
+```
+
+To obtain a description of a Hive table with HCatalog integration, you can 
use the `psql` client interface.
+
+-   Within HAWQ, use either the `\d
 hcatalog.hive-db-name.hive-table-name` or `\d+ 
hcatalog.hive-db-name.hive-table-name` commands to describe a single 
table. For example, from the `psql` client interface:
+
+``` shell
+$ psql -d postgres
+```
 
 ``` sql
-postgres=# select * from pxf_get_item_fields('Hive','default.test');
+postgres=# \d hcatalog.default.sales_info_rcfile;
 ```
-
-``` pre
-  path 

[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612359#comment-15612359
 ] 

ASF GitHub Bot commented on HAWQ-1071:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85365959
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -2,121 +2,450 @@
 title: Accessing Hive Data
 ---
 
-This topic describes how to access Hive data using PXF. You have several 
options for querying data stored in Hive. You can create external tables in PXF 
and then query those tables, or you can easily query Hive tables by using HAWQ 
and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored 
in HCatalog.
+Apache Hive is a distributed data warehousing infrastructure.  Hive 
facilitates managing large data sets supporting multiple data formats, 
including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive 
plug-in reads data stored in Hive, as well as HDFS or HBase.
+
+This section describes how to use PXF to access Hive data. Options for 
querying data stored in Hive include:
+
+-  Creating an external table in PXF and querying that table
+-  Querying Hive tables via PXF's integration with HCatalog
 
 ## Prerequisites
 
-Check the following before using PXF to access Hive:
+Before accessing Hive data with HAWQ and PXF, ensure that:
 
--   The PXF HDFS plug-in is installed on all cluster nodes.
+-   The PXF HDFS plug-in is installed on all cluster nodes. See 
[Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation 
information.
 -   The PXF Hive plug-in is installed on all cluster nodes.
 -   The Hive JAR files and conf directory are installed on all cluster 
nodes.
--   Test PXF on HDFS before connecting to Hive or HBase.
+-   You have tested PXF on HDFS.
 -   You are running the Hive Metastore service on a machine in your 
cluster. 
 -   You have set the `hive.metastore.uris` property in the `hive-site.xml` 
on the NameNode.
 
+## Hive File Formats
+
+Hive supports several file formats:
+
+-   TextFile - flat file with data in comma-, tab-, or space-separated 
value format or JSON notation
+-   SequenceFile - flat file consisting of binary key/value pairs
+-   RCFile - record columnar data consisting of binary key/value pairs; 
high row compression rate
+-   ORCFile - optimized row columnar data with stripe, footer, and 
postscript sections; reduces data size
+-   Parquet - compressed columnar data representation
+-   Avro - JSON-defined, schema-based data serialization format
+
+Refer to [File 
Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for 
detailed information about the file formats supported by Hive.
+
+The PXF Hive plug-in supports the following profiles for accessing the 
Hive file formats listed above. These include:
+
+- `Hive`
+- `HiveText`
+- `HiveRC`
+
+## Data Type Mapping
+
+### Primitive Data Types
+
+To represent Hive data in HAWQ, map data values that use a primitive data 
type to HAWQ columns of the same type.
+
+The following table summarizes external mapping rules for Hive primitive 
types.
+
+| Hive Data Type  | Hawq Data Type |
+|---|---|
+| boolean| bool |
+| int   | int4 |
+| smallint   | int2 |
+| tinyint   | int2 |
+| bigint   | int8 |
+| decimal  |  numeric  |
+| float   | float4 |
+| double   | float8 |
+| string   | text |
+| binary   | bytea |
+| char   | bpchar |
+| varchar   | varchar |
+| timestamp   | timestamp |
+| date   | date |
+
+
+### Complex Data Types
+
+Hive supports complex data types including array, struct, map, and union. 
PXF maps each of these complex types to `text`.  While HAWQ does not natively 
support these types, you can create HAWQ functions or application code to 
extract subcomponents of these complex data types.
+
+An example using complex data types is provided later in this topic.
+
+
+## Sample Data Set
+
+Examples used in this topic will operate on a common data set. This simple 
data set models a retail sales operation and includes fields with the following 
names and data types:
+
+- location - text
+- month - text
+- number\_of\_orders - integer
+- total\_sales - double
--- End diff --

Also consider term/definition table here.


> add PXF HiveText and HiveRC profile examples to the documentation
> -
>
> Key: HAWQ-1071
> URL: https://issues.apache.org/jira/browse/HAWQ-1071
> Project: Apa

[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612356#comment-15612356
 ] 

ASF GitHub Bot commented on HAWQ-1071:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85368752
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -2,121 +2,450 @@
 title: Accessing Hive Data
 ---
 
-This topic describes how to access Hive data using PXF. You have several 
options for querying data stored in Hive. You can create external tables in PXF 
and then query those tables, or you can easily query Hive tables by using HAWQ 
and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored 
in HCatalog.
+Apache Hive is a distributed data warehousing infrastructure.  Hive 
facilitates managing large data sets supporting multiple data formats, 
including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive 
plug-in reads data stored in Hive, as well as HDFS or HBase.
+
+This section describes how to use PXF to access Hive data. Options for 
querying data stored in Hive include:
+
+-  Creating an external table in PXF and querying that table
+-  Querying Hive tables via PXF's integration with HCatalog
 
 ## Prerequisites
 
-Check the following before using PXF to access Hive:
+Before accessing Hive data with HAWQ and PXF, ensure that:
 
--   The PXF HDFS plug-in is installed on all cluster nodes.
+-   The PXF HDFS plug-in is installed on all cluster nodes. See 
[Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation 
information.
 -   The PXF Hive plug-in is installed on all cluster nodes.
 -   The Hive JAR files and conf directory are installed on all cluster 
nodes.
--   Test PXF on HDFS before connecting to Hive or HBase.
+-   You have tested PXF on HDFS.
 -   You are running the Hive Metastore service on a machine in your 
cluster. 
 -   You have set the `hive.metastore.uris` property in the `hive-site.xml` 
on the NameNode.
 
+## Hive File Formats
+
+Hive supports several file formats:
+
+-   TextFile - flat file with data in comma-, tab-, or space-separated 
value format or JSON notation
+-   SequenceFile - flat file consisting of binary key/value pairs
+-   RCFile - record columnar data consisting of binary key/value pairs; 
high row compression rate
+-   ORCFile - optimized row columnar data with stripe, footer, and 
postscript sections; reduces data size
+-   Parquet - compressed columnar data representation
+-   Avro - JSON-defined, schema-based data serialization format
+
+Refer to [File 
Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for 
detailed information about the file formats supported by Hive.
+
+The PXF Hive plug-in supports the following profiles for accessing the 
Hive file formats listed above. These include:
+
+- `Hive`
+- `HiveText`
+- `HiveRC`
+
+## Data Type Mapping
+
+### Primitive Data Types
+
+To represent Hive data in HAWQ, map data values that use a primitive data 
type to HAWQ columns of the same type.
+
+The following table summarizes external mapping rules for Hive primitive 
types.
+
+| Hive Data Type  | Hawq Data Type |
+|---|---|
+| boolean| bool |
+| int   | int4 |
+| smallint   | int2 |
+| tinyint   | int2 |
+| bigint   | int8 |
+| decimal  |  numeric  |
+| float   | float4 |
+| double   | float8 |
+| string   | text |
+| binary   | bytea |
+| char   | bpchar |
+| varchar   | varchar |
+| timestamp   | timestamp |
+| date   | date |
+
+
+### Complex Data Types
+
+Hive supports complex data types including array, struct, map, and union. 
PXF maps each of these complex types to `text`.  While HAWQ does not natively 
support these types, you can create HAWQ functions or application code to 
extract subcomponents of these complex data types.
+
+An example using complex data types is provided later in this topic.
+
+
+## Sample Data Set
+
+Examples used in this topic will operate on a common data set. This simple 
data set models a retail sales operation and includes fields with the following 
names and data types:
+
+- location - text
+- month - text
+- number\_of\_orders - integer
+- total\_sales - double
+
+Prepare the sample data set for use:
+
+1. First, create a text file:
+
+```
+$ vi /tmp/pxf_hive_datafile.txt
+```
+
+2. Add the following data to `pxf_hive_datafile.txt`; notice the use of 
the comma `,` to separate the four field values:
+
+```
+Prague,Jan,101,487

[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612364#comment-15612364
 ] 

ASF GitHub Bot commented on HAWQ-1071:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85366470
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -2,121 +2,450 @@
 title: Accessing Hive Data
 ---
 
-This topic describes how to access Hive data using PXF. You have several 
options for querying data stored in Hive. You can create external tables in PXF 
and then query those tables, or you can easily query Hive tables by using HAWQ 
and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored 
in HCatalog.
+Apache Hive is a distributed data warehousing infrastructure.  Hive 
facilitates managing large data sets supporting multiple data formats, 
including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive 
plug-in reads data stored in Hive, as well as HDFS or HBase.
+
+This section describes how to use PXF to access Hive data. Options for 
querying data stored in Hive include:
+
+-  Creating an external table in PXF and querying that table
+-  Querying Hive tables via PXF's integration with HCatalog
 
 ## Prerequisites
 
-Check the following before using PXF to access Hive:
+Before accessing Hive data with HAWQ and PXF, ensure that:
 
--   The PXF HDFS plug-in is installed on all cluster nodes.
+-   The PXF HDFS plug-in is installed on all cluster nodes. See 
[Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation 
information.
 -   The PXF Hive plug-in is installed on all cluster nodes.
 -   The Hive JAR files and conf directory are installed on all cluster 
nodes.
--   Test PXF on HDFS before connecting to Hive or HBase.
+-   You have tested PXF on HDFS.
 -   You are running the Hive Metastore service on a machine in your 
cluster. 
 -   You have set the `hive.metastore.uris` property in the `hive-site.xml` 
on the NameNode.
 
+## Hive File Formats
+
+Hive supports several file formats:
+
+-   TextFile - flat file with data in comma-, tab-, or space-separated 
value format or JSON notation
+-   SequenceFile - flat file consisting of binary key/value pairs
+-   RCFile - record columnar data consisting of binary key/value pairs; 
high row compression rate
+-   ORCFile - optimized row columnar data with stripe, footer, and 
postscript sections; reduces data size
+-   Parquet - compressed columnar data representation
+-   Avro - JSON-defined, schema-based data serialization format
+
+Refer to [File 
Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for 
detailed information about the file formats supported by Hive.
+
+The PXF Hive plug-in supports the following profiles for accessing the 
Hive file formats listed above. These include:
+
+- `Hive`
+- `HiveText`
+- `HiveRC`
+
+## Data Type Mapping
+
+### Primitive Data Types
+
+To represent Hive data in HAWQ, map data values that use a primitive data 
type to HAWQ columns of the same type.
+
+The following table summarizes external mapping rules for Hive primitive 
types.
+
+| Hive Data Type  | Hawq Data Type |
+|---|---|
+| boolean| bool |
+| int   | int4 |
+| smallint   | int2 |
+| tinyint   | int2 |
+| bigint   | int8 |
+| decimal  |  numeric  |
+| float   | float4 |
+| double   | float8 |
+| string   | text |
+| binary   | bytea |
+| char   | bpchar |
+| varchar   | varchar |
+| timestamp   | timestamp |
+| date   | date |
+
+
+### Complex Data Types
+
+Hive supports complex data types including array, struct, map, and union. 
PXF maps each of these complex types to `text`.  While HAWQ does not natively 
support these types, you can create HAWQ functions or application code to 
extract subcomponents of these complex data types.
+
+An example using complex data types is provided later in this topic.
+
+
+## Sample Data Set
+
+Examples used in this topic will operate on a common data set. This simple 
data set models a retail sales operation and includes fields with the following 
names and data types:
+
+- location - text
+- month - text
+- number\_of\_orders - integer
+- total\_sales - double
+
+Prepare the sample data set for use:
+
+1. First, create a text file:
+
+```
+$ vi /tmp/pxf_hive_datafile.txt
+```
+
+2. Add the following data to `pxf_hive_datafile.txt`; notice the use of 
the comma `,` to separate the four field values:
+
+```
+Prague,Jan,101,487

[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612363#comment-15612363
 ] 

ASF GitHub Bot commented on HAWQ-1071:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85367290
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -2,121 +2,450 @@
 title: Accessing Hive Data
 ---
 
-This topic describes how to access Hive data using PXF. You have several 
options for querying data stored in Hive. You can create external tables in PXF 
and then query those tables, or you can easily query Hive tables by using HAWQ 
and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored 
in HCatalog.
+Apache Hive is a distributed data warehousing infrastructure.  Hive 
facilitates managing large data sets supporting multiple data formats, 
including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive 
plug-in reads data stored in Hive, as well as HDFS or HBase.
+
+This section describes how to use PXF to access Hive data. Options for 
querying data stored in Hive include:
+
+-  Creating an external table in PXF and querying that table
+-  Querying Hive tables via PXF's integration with HCatalog
 
 ## Prerequisites
 
-Check the following before using PXF to access Hive:
+Before accessing Hive data with HAWQ and PXF, ensure that:
 
--   The PXF HDFS plug-in is installed on all cluster nodes.
+-   The PXF HDFS plug-in is installed on all cluster nodes. See 
[Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation 
information.
 -   The PXF Hive plug-in is installed on all cluster nodes.
 -   The Hive JAR files and conf directory are installed on all cluster 
nodes.
--   Test PXF on HDFS before connecting to Hive or HBase.
+-   You have tested PXF on HDFS.
 -   You are running the Hive Metastore service on a machine in your 
cluster. 
 -   You have set the `hive.metastore.uris` property in the `hive-site.xml` 
on the NameNode.
 
+## Hive File Formats
+
+Hive supports several file formats:
+
+-   TextFile - flat file with data in comma-, tab-, or space-separated 
value format or JSON notation
+-   SequenceFile - flat file consisting of binary key/value pairs
+-   RCFile - record columnar data consisting of binary key/value pairs; 
high row compression rate
+-   ORCFile - optimized row columnar data with stripe, footer, and 
postscript sections; reduces data size
+-   Parquet - compressed columnar data representation
+-   Avro - JSON-defined, schema-based data serialization format
+
+Refer to [File 
Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for 
detailed information about the file formats supported by Hive.
+
+The PXF Hive plug-in supports the following profiles for accessing the 
Hive file formats listed above. These include:
+
+- `Hive`
+- `HiveText`
+- `HiveRC`
+
+## Data Type Mapping
+
+### Primitive Data Types
+
+To represent Hive data in HAWQ, map data values that use a primitive data 
type to HAWQ columns of the same type.
+
+The following table summarizes external mapping rules for Hive primitive 
types.
+
+| Hive Data Type  | Hawq Data Type |
+|---|---|
+| boolean| bool |
+| int   | int4 |
+| smallint   | int2 |
+| tinyint   | int2 |
+| bigint   | int8 |
+| decimal  |  numeric  |
+| float   | float4 |
+| double   | float8 |
+| string   | text |
+| binary   | bytea |
+| char   | bpchar |
+| varchar   | varchar |
+| timestamp   | timestamp |
+| date   | date |
+
+
+### Complex Data Types
+
+Hive supports complex data types including array, struct, map, and union. 
PXF maps each of these complex types to `text`.  While HAWQ does not natively 
support these types, you can create HAWQ functions or application code to 
extract subcomponents of these complex data types.
+
+An example using complex data types is provided later in this topic.
+
+
+## Sample Data Set
+
+Examples used in this topic will operate on a common data set. This simple 
data set models a retail sales operation and includes fields with the following 
names and data types:
+
+- location - text
+- month - text
+- number\_of\_orders - integer
+- total\_sales - double
+
+Prepare the sample data set for use:
+
+1. First, create a text file:
+
+```
+$ vi /tmp/pxf_hive_datafile.txt
+```
+
+2. Add the following data to `pxf_hive_datafile.txt`; notice the use of 
the comma `,` to separate the four field values:
+
+```
+Prague,Jan,101,487

[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612367#comment-15612367
 ] 

ASF GitHub Bot commented on HAWQ-1071:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85367943
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -2,121 +2,450 @@
 title: Accessing Hive Data
 ---
 
-This topic describes how to access Hive data using PXF. You have several 
options for querying data stored in Hive. You can create external tables in PXF 
and then query those tables, or you can easily query Hive tables by using HAWQ 
and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored 
in HCatalog.
+Apache Hive is a distributed data warehousing infrastructure.  Hive 
facilitates managing large data sets supporting multiple data formats, 
including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive 
plug-in reads data stored in Hive, as well as HDFS or HBase.
+
+This section describes how to use PXF to access Hive data. Options for 
querying data stored in Hive include:
+
+-  Creating an external table in PXF and querying that table
+-  Querying Hive tables via PXF's integration with HCatalog
 
 ## Prerequisites
 
-Check the following before using PXF to access Hive:
+Before accessing Hive data with HAWQ and PXF, ensure that:
 
--   The PXF HDFS plug-in is installed on all cluster nodes.
+-   The PXF HDFS plug-in is installed on all cluster nodes. See 
[Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation 
information.
 -   The PXF Hive plug-in is installed on all cluster nodes.
 -   The Hive JAR files and conf directory are installed on all cluster 
nodes.
--   Test PXF on HDFS before connecting to Hive or HBase.
+-   You have tested PXF on HDFS.
 -   You are running the Hive Metastore service on a machine in your 
cluster. 
 -   You have set the `hive.metastore.uris` property in the `hive-site.xml` 
on the NameNode.
 
+## Hive File Formats
+
+Hive supports several file formats:
+
+-   TextFile - flat file with data in comma-, tab-, or space-separated 
value format or JSON notation
+-   SequenceFile - flat file consisting of binary key/value pairs
+-   RCFile - record columnar data consisting of binary key/value pairs; 
high row compression rate
+-   ORCFile - optimized row columnar data with stripe, footer, and 
postscript sections; reduces data size
+-   Parquet - compressed columnar data representation
+-   Avro - JSON-defined, schema-based data serialization format
+
+Refer to [File 
Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for 
detailed information about the file formats supported by Hive.
+
+The PXF Hive plug-in supports the following profiles for accessing the 
Hive file formats listed above. These include:
+
+- `Hive`
+- `HiveText`
+- `HiveRC`
+
+## Data Type Mapping
+
+### Primitive Data Types
+
+To represent Hive data in HAWQ, map data values that use a primitive data 
type to HAWQ columns of the same type.
+
+The following table summarizes external mapping rules for Hive primitive 
types.
+
+| Hive Data Type  | Hawq Data Type |
+|---|---|
+| boolean| bool |
+| int   | int4 |
+| smallint   | int2 |
+| tinyint   | int2 |
+| bigint   | int8 |
+| decimal  |  numeric  |
+| float   | float4 |
+| double   | float8 |
+| string   | text |
+| binary   | bytea |
+| char   | bpchar |
+| varchar   | varchar |
+| timestamp   | timestamp |
+| date   | date |
+
+
+### Complex Data Types
+
+Hive supports complex data types including array, struct, map, and union. 
PXF maps each of these complex types to `text`.  While HAWQ does not natively 
support these types, you can create HAWQ functions or application code to 
extract subcomponents of these complex data types.
+
+An example using complex data types is provided later in this topic.
+
+
+## Sample Data Set
+
+Examples used in this topic will operate on a common data set. This simple 
data set models a retail sales operation and includes fields with the following 
names and data types:
+
+- location - text
+- month - text
+- number\_of\_orders - integer
+- total\_sales - double
+
+Prepare the sample data set for use:
+
+1. First, create a text file:
+
+```
+$ vi /tmp/pxf_hive_datafile.txt
+```
+
+2. Add the following data to `pxf_hive_datafile.txt`; notice the use of 
the comma `,` to separate the four field values:
+
+```
+Prague,Jan,101,487

[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612357#comment-15612357
 ] 

ASF GitHub Bot commented on HAWQ-1071:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85367789
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -2,121 +2,450 @@
 title: Accessing Hive Data
 ---
 
-This topic describes how to access Hive data using PXF. You have several 
options for querying data stored in Hive. You can create external tables in PXF 
and then query those tables, or you can easily query Hive tables by using HAWQ 
and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored 
in HCatalog.
+Apache Hive is a distributed data warehousing infrastructure.  Hive 
facilitates managing large data sets supporting multiple data formats, 
including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive 
plug-in reads data stored in Hive, as well as HDFS or HBase.
+
+This section describes how to use PXF to access Hive data. Options for 
querying data stored in Hive include:
+
+-  Creating an external table in PXF and querying that table
+-  Querying Hive tables via PXF's integration with HCatalog
 
 ## Prerequisites
 
-Check the following before using PXF to access Hive:
+Before accessing Hive data with HAWQ and PXF, ensure that:
 
--   The PXF HDFS plug-in is installed on all cluster nodes.
+-   The PXF HDFS plug-in is installed on all cluster nodes. See 
[Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation 
information.
 -   The PXF Hive plug-in is installed on all cluster nodes.
 -   The Hive JAR files and conf directory are installed on all cluster 
nodes.
--   Test PXF on HDFS before connecting to Hive or HBase.
+-   You have tested PXF on HDFS.
 -   You are running the Hive Metastore service on a machine in your 
cluster. 
 -   You have set the `hive.metastore.uris` property in the `hive-site.xml` 
on the NameNode.
 
+## Hive File Formats
+
+Hive supports several file formats:
+
+-   TextFile - flat file with data in comma-, tab-, or space-separated 
value format or JSON notation
+-   SequenceFile - flat file consisting of binary key/value pairs
+-   RCFile - record columnar data consisting of binary key/value pairs; 
high row compression rate
+-   ORCFile - optimized row columnar data with stripe, footer, and 
postscript sections; reduces data size
+-   Parquet - compressed columnar data representation
+-   Avro - JSON-defined, schema-based data serialization format
+
+Refer to [File 
Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for 
detailed information about the file formats supported by Hive.
+
+The PXF Hive plug-in supports the following profiles for accessing the 
Hive file formats listed above. These include:
+
+- `Hive`
+- `HiveText`
+- `HiveRC`
+
+## Data Type Mapping
+
+### Primitive Data Types
+
+To represent Hive data in HAWQ, map data values that use a primitive data 
type to HAWQ columns of the same type.
+
+The following table summarizes external mapping rules for Hive primitive 
types.
+
+| Hive Data Type  | Hawq Data Type |
+|---|---|
+| boolean| bool |
+| int   | int4 |
+| smallint   | int2 |
+| tinyint   | int2 |
+| bigint   | int8 |
+| decimal  |  numeric  |
+| float   | float4 |
+| double   | float8 |
+| string   | text |
+| binary   | bytea |
+| char   | bpchar |
+| varchar   | varchar |
+| timestamp   | timestamp |
+| date   | date |
+
+
+### Complex Data Types
+
+Hive supports complex data types including array, struct, map, and union. 
PXF maps each of these complex types to `text`.  While HAWQ does not natively 
support these types, you can create HAWQ functions or application code to 
extract subcomponents of these complex data types.
+
+An example using complex data types is provided later in this topic.
+
+
+## Sample Data Set
+
+Examples used in this topic will operate on a common data set. This simple 
data set models a retail sales operation and includes fields with the following 
names and data types:
+
+- location - text
+- month - text
+- number\_of\_orders - integer
+- total\_sales - double
+
+Prepare the sample data set for use:
+
+1. First, create a text file:
+
+```
+$ vi /tmp/pxf_hive_datafile.txt
+```
+
+2. Add the following data to `pxf_hive_datafile.txt`; notice the use of 
the comma `,` to separate the four field values:
+
+```
+Prague,Jan,101,487

[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612366#comment-15612366
 ] 

ASF GitHub Bot commented on HAWQ-1071:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85369947
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -2,121 +2,450 @@
 title: Accessing Hive Data
 ---
 
-This topic describes how to access Hive data using PXF. You have several 
options for querying data stored in Hive. You can create external tables in PXF 
and then query those tables, or you can easily query Hive tables by using HAWQ 
and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored 
in HCatalog.
+Apache Hive is a distributed data warehousing infrastructure.  Hive 
facilitates managing large data sets supporting multiple data formats, 
including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive 
plug-in reads data stored in Hive, as well as HDFS or HBase.
+
+This section describes how to use PXF to access Hive data. Options for 
querying data stored in Hive include:
+
+-  Creating an external table in PXF and querying that table
+-  Querying Hive tables via PXF's integration with HCatalog
 
 ## Prerequisites
 
-Check the following before using PXF to access Hive:
+Before accessing Hive data with HAWQ and PXF, ensure that:
 
--   The PXF HDFS plug-in is installed on all cluster nodes.
+-   The PXF HDFS plug-in is installed on all cluster nodes. See 
[Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation 
information.
 -   The PXF Hive plug-in is installed on all cluster nodes.
 -   The Hive JAR files and conf directory are installed on all cluster 
nodes.
--   Test PXF on HDFS before connecting to Hive or HBase.
+-   You have tested PXF on HDFS.
 -   You are running the Hive Metastore service on a machine in your 
cluster. 
 -   You have set the `hive.metastore.uris` property in the `hive-site.xml` 
on the NameNode.
 
+## Hive File Formats
+
+Hive supports several file formats:
+
+-   TextFile - flat file with data in comma-, tab-, or space-separated 
value format or JSON notation
+-   SequenceFile - flat file consisting of binary key/value pairs
+-   RCFile - record columnar data consisting of binary key/value pairs; 
high row compression rate
+-   ORCFile - optimized row columnar data with stripe, footer, and 
postscript sections; reduces data size
+-   Parquet - compressed columnar data representation
+-   Avro - JSON-defined, schema-based data serialization format
+
+Refer to [File 
Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for 
detailed information about the file formats supported by Hive.
+
+The PXF Hive plug-in supports the following profiles for accessing the 
Hive file formats listed above. These include:
+
+- `Hive`
+- `HiveText`
+- `HiveRC`
+
+## Data Type Mapping
+
+### Primitive Data Types
+
+To represent Hive data in HAWQ, map data values that use a primitive data 
type to HAWQ columns of the same type.
+
+The following table summarizes external mapping rules for Hive primitive 
types.
+
+| Hive Data Type  | Hawq Data Type |
+|---|---|
+| boolean| bool |
+| int   | int4 |
+| smallint   | int2 |
+| tinyint   | int2 |
+| bigint   | int8 |
+| decimal  |  numeric  |
+| float   | float4 |
+| double   | float8 |
+| string   | text |
+| binary   | bytea |
+| char   | bpchar |
+| varchar   | varchar |
+| timestamp   | timestamp |
+| date   | date |
+
+
+### Complex Data Types
+
+Hive supports complex data types including array, struct, map, and union. 
PXF maps each of these complex types to `text`.  While HAWQ does not natively 
support these types, you can create HAWQ functions or application code to 
extract subcomponents of these complex data types.
+
+An example using complex data types is provided later in this topic.
+
+
+## Sample Data Set
+
+Examples used in this topic will operate on a common data set. This simple 
data set models a retail sales operation and includes fields with the following 
names and data types:
+
+- location - text
+- month - text
+- number\_of\_orders - integer
+- total\_sales - double
+
+Prepare the sample data set for use:
+
+1. First, create a text file:
+
+```
+$ vi /tmp/pxf_hive_datafile.txt
+```
+
+2. Add the following data to `pxf_hive_datafile.txt`; notice the use of 
the comma `,` to separate the four field values:
+
+```
+Prague,Jan,101,487

[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612358#comment-15612358
 ] 

ASF GitHub Bot commented on HAWQ-1071:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85370681
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -151,184 +477,120 @@ To enable HCatalog query integration in HAWQ, 
perform the following steps:
 postgres=# GRANT ALL ON PROTOCOL pxf TO "role";
 ``` 
 
-3.  To query a Hive table with HCatalog integration, simply query HCatalog 
directly from HAWQ. The query syntax is:
 
-``` sql
-postgres=# SELECT * FROM hcatalog.hive-db-name.hive-table-name;
-```
+To query a Hive table with HCatalog integration, query HCatalog directly 
from HAWQ. The query syntax is:
--- End diff --

It's a bit awkward to drop out of the procedure and into free-form 
discussion of the various operations.  I think it might be better to put the 
previous 3-step procedure into a new subsection like "Enabling HCatalog 
Integration" and then putting the remaining non-procedural content into "Usage" 
?


> add PXF HiveText and HiveRC profile examples to the documentation
> -
>
> Key: HAWQ-1071
> URL: https://issues.apache.org/jira/browse/HAWQ-1071
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> the current PXF Hive documentation includes an example for only the Hive 
> profile.  add examples for HiveText and HiveRC profiles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612365#comment-15612365
 ] 

ASF GitHub Bot commented on HAWQ-1071:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85365540
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -2,121 +2,450 @@
 title: Accessing Hive Data
 ---
 
-This topic describes how to access Hive data using PXF. You have several 
options for querying data stored in Hive. You can create external tables in PXF 
and then query those tables, or you can easily query Hive tables by using HAWQ 
and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored 
in HCatalog.
+Apache Hive is a distributed data warehousing infrastructure.  Hive 
facilitates managing large data sets supporting multiple data formats, 
including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive 
plug-in reads data stored in Hive, as well as HDFS or HBase.
+
+This section describes how to use PXF to access Hive data. Options for 
querying data stored in Hive include:
+
+-  Creating an external table in PXF and querying that table
+-  Querying Hive tables via PXF's integration with HCatalog
 
 ## Prerequisites
 
-Check the following before using PXF to access Hive:
+Before accessing Hive data with HAWQ and PXF, ensure that:
 
--   The PXF HDFS plug-in is installed on all cluster nodes.
+-   The PXF HDFS plug-in is installed on all cluster nodes. See 
[Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation 
information.
 -   The PXF Hive plug-in is installed on all cluster nodes.
 -   The Hive JAR files and conf directory are installed on all cluster 
nodes.
--   Test PXF on HDFS before connecting to Hive or HBase.
+-   You have tested PXF on HDFS.
 -   You are running the Hive Metastore service on a machine in your 
cluster. 
 -   You have set the `hive.metastore.uris` property in the `hive-site.xml` 
on the NameNode.
 
+## Hive File Formats
+
+Hive supports several file formats:
+
+-   TextFile - flat file with data in comma-, tab-, or space-separated 
value format or JSON notation
+-   SequenceFile - flat file consisting of binary key/value pairs
+-   RCFile - record columnar data consisting of binary key/value pairs; 
high row compression rate
+-   ORCFile - optimized row columnar data with stripe, footer, and 
postscript sections; reduces data size
+-   Parquet - compressed columnar data representation
+-   Avro - JSON-defined, schema-based data serialization format
--- End diff --

Just a suggestion, but I think this would read better as a 2-column 
term/definition table.  You could even make it a 3-column table to describe 
which PXF plug-ins are used with each format.


> add PXF HiveText and HiveRC profile examples to the documentation
> -
>
> Key: HAWQ-1071
> URL: https://issues.apache.org/jira/browse/HAWQ-1071
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> the current PXF Hive documentation includes an example for only the Hive 
> profile.  add examples for HiveText and HiveRC profiles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612362#comment-15612362
 ] 

ASF GitHub Bot commented on HAWQ-1071:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85368842
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -2,121 +2,450 @@
 title: Accessing Hive Data
 ---
 
-This topic describes how to access Hive data using PXF. You have several 
options for querying data stored in Hive. You can create external tables in PXF 
and then query those tables, or you can easily query Hive tables by using HAWQ 
and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored 
in HCatalog.
+Apache Hive is a distributed data warehousing infrastructure.  Hive 
facilitates managing large data sets supporting multiple data formats, 
including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive 
plug-in reads data stored in Hive, as well as HDFS or HBase.
+
+This section describes how to use PXF to access Hive data. Options for 
querying data stored in Hive include:
+
+-  Creating an external table in PXF and querying that table
+-  Querying Hive tables via PXF's integration with HCatalog
 
 ## Prerequisites
 
-Check the following before using PXF to access Hive:
+Before accessing Hive data with HAWQ and PXF, ensure that:
 
--   The PXF HDFS plug-in is installed on all cluster nodes.
+-   The PXF HDFS plug-in is installed on all cluster nodes. See 
[Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation 
information.
 -   The PXF Hive plug-in is installed on all cluster nodes.
 -   The Hive JAR files and conf directory are installed on all cluster 
nodes.
--   Test PXF on HDFS before connecting to Hive or HBase.
+-   You have tested PXF on HDFS.
 -   You are running the Hive Metastore service on a machine in your 
cluster. 
 -   You have set the `hive.metastore.uris` property in the `hive-site.xml` 
on the NameNode.
 
+## Hive File Formats
+
+Hive supports several file formats:
+
+-   TextFile - flat file with data in comma-, tab-, or space-separated 
value format or JSON notation
+-   SequenceFile - flat file consisting of binary key/value pairs
+-   RCFile - record columnar data consisting of binary key/value pairs; 
high row compression rate
+-   ORCFile - optimized row columnar data with stripe, footer, and 
postscript sections; reduces data size
+-   Parquet - compressed columnar data representation
+-   Avro - JSON-defined, schema-based data serialization format
+
+Refer to [File 
Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for 
detailed information about the file formats supported by Hive.
+
+The PXF Hive plug-in supports the following profiles for accessing the 
Hive file formats listed above. These include:
+
+- `Hive`
+- `HiveText`
+- `HiveRC`
+
+## Data Type Mapping
+
+### Primitive Data Types
+
+To represent Hive data in HAWQ, map data values that use a primitive data 
type to HAWQ columns of the same type.
+
+The following table summarizes external mapping rules for Hive primitive 
types.
+
+| Hive Data Type  | Hawq Data Type |
+|---|---|
+| boolean| bool |
+| int   | int4 |
+| smallint   | int2 |
+| tinyint   | int2 |
+| bigint   | int8 |
+| decimal  |  numeric  |
+| float   | float4 |
+| double   | float8 |
+| string   | text |
+| binary   | bytea |
+| char   | bpchar |
+| varchar   | varchar |
+| timestamp   | timestamp |
+| date   | date |
+
+
+### Complex Data Types
+
+Hive supports complex data types including array, struct, map, and union. 
PXF maps each of these complex types to `text`.  While HAWQ does not natively 
support these types, you can create HAWQ functions or application code to 
extract subcomponents of these complex data types.
+
+An example using complex data types is provided later in this topic.
+
+
+## Sample Data Set
+
+Examples used in this topic will operate on a common data set. This simple 
data set models a retail sales operation and includes fields with the following 
names and data types:
+
+- location - text
+- month - text
+- number\_of\_orders - integer
+- total\_sales - double
+
+Prepare the sample data set for use:
+
+1. First, create a text file:
+
+```
+$ vi /tmp/pxf_hive_datafile.txt
+```
+
+2. Add the following data to `pxf_hive_datafile.txt`; notice the use of 
the comma `,` to separate the four field values:
+
+```
+Prague,Jan,101,487

[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612370#comment-15612370
 ] 

ASF GitHub Bot commented on HAWQ-1107:
--

Github user kavinderd commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85372290
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,506 +2,449 @@
 title: Accessing HDFS File Data
 ---
 
-## Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
 
-Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
+This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
 
--   Test PXF on HDFS before connecting to Hive or HBase.
--   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
+## Prerequisites
 
-## Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
 
-The syntax for creating an external HDFS file is as follows: 
+-   The HDFS plug-in is installed on all cluster nodes. See [Installing 
PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
 
-``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
-( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]')
-  FORMAT '[TEXT | CSV | CUSTOM]' ();
-```
+## HDFS File Formats
 
-where `` is:
+The PXF HDFS plug-in supports reading the following file formats:
 
-``` pre
-   
FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class]
- | PROFILE=profile-name
-```
+- Text File - comma-separated value (.csv) or delimited format plain text 
file
+- Avro - JSON-defined, schema-based data serialization format
 
-**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
+The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
 
-Use an SQL `SELECT` statement to read from an HDFS READABLE table:
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
 
-``` sql
-SELECT ... FROM table_name;
+If you find that the pre-defined PXF HDFS profiles do not meet your needs, 
you may choose to create a custom HDFS profile from the existing HDFS 
serialization and deserialization classes. Refer to [Adding and Updating 
Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on 
creating a custom profile.
+
+## HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.  
These tools support typical file system operations including copying and 
listing files, changing file permissions, and so forth. 
+
+The HDFS file system command syntax is `hdfs dfs  []`. 
Invoked with no options, `hdfs dfs` lists the file system options supported by 
the tool.
+
+`hdfs dfs` options used in this topic are:
+
+| Option  | Description |
+|---|-|
+| `-cat`| Display file contents. |
+| `-mkdir`| Create directory in HDFS. |
+| `-put`| Copy file from local file system to HDFS. |
+
+Examples:
+
+Create a directory in HDFS:
+
+``` shell
+$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir
 ```
 
-Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table:
+Copy a text file to HDFS:
 
-``` sql
-INSERT INTO table_name ...;
+``` shell
+$ sudo -u hdfs hdfs dfs -put /tmp/example.txt /data/exampledir/
 ```
 
-To read the data in the files or to write based on the existing format, 
use `FORMAT`, `PROFILE`, or one of the classes.
-
-This topic describes the following:
-
--   FORMAT clause
--   Profile
--   Accessor
--   Resolver
--   Avro
-
-**Note:** For more details about the API and classes, see [PXF External 
Tables and 
API](PXFExternalTableandAPIReference.html#pxfexternaltableandapireference).
-
-### FORMAT clause
-
-Use one of the following formats to read data with any PXF connector:
-
--   `FORMAT 'TEXT'`: Use with plain delimited text files on HDFS.
--   `FORMAT 'CSV'`: Us

[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612361#comment-15612361
 ] 

ASF GitHub Bot commented on HAWQ-1071:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85372086
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -2,121 +2,450 @@
 title: Accessing Hive Data
 ---
 
-This topic describes how to access Hive data using PXF. You have several 
options for querying data stored in Hive. You can create external tables in PXF 
and then query those tables, or you can easily query Hive tables by using HAWQ 
and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored 
in HCatalog.
+Apache Hive is a distributed data warehousing infrastructure.  Hive 
facilitates managing large data sets supporting multiple data formats, 
including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive 
plug-in reads data stored in Hive, as well as HDFS or HBase.
+
+This section describes how to use PXF to access Hive data. Options for 
querying data stored in Hive include:
+
+-  Creating an external table in PXF and querying that table
+-  Querying Hive tables via PXF's integration with HCatalog
 
 ## Prerequisites
 
-Check the following before using PXF to access Hive:
+Before accessing Hive data with HAWQ and PXF, ensure that:
 
--   The PXF HDFS plug-in is installed on all cluster nodes.
+-   The PXF HDFS plug-in is installed on all cluster nodes. See 
[Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation 
information.
 -   The PXF Hive plug-in is installed on all cluster nodes.
 -   The Hive JAR files and conf directory are installed on all cluster 
nodes.
--   Test PXF on HDFS before connecting to Hive or HBase.
+-   You have tested PXF on HDFS.
 -   You are running the Hive Metastore service on a machine in your 
cluster. 
 -   You have set the `hive.metastore.uris` property in the `hive-site.xml` 
on the NameNode.
 
+## Hive File Formats
+
+Hive supports several file formats:
+
+-   TextFile - flat file with data in comma-, tab-, or space-separated 
value format or JSON notation
+-   SequenceFile - flat file consisting of binary key/value pairs
+-   RCFile - record columnar data consisting of binary key/value pairs; 
high row compression rate
+-   ORCFile - optimized row columnar data with stripe, footer, and 
postscript sections; reduces data size
+-   Parquet - compressed columnar data representation
+-   Avro - JSON-defined, schema-based data serialization format
+
+Refer to [File 
Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for 
detailed information about the file formats supported by Hive.
+
+The PXF Hive plug-in supports the following profiles for accessing the 
Hive file formats listed above. These include:
+
+- `Hive`
+- `HiveText`
+- `HiveRC`
+
+## Data Type Mapping
+
+### Primitive Data Types
+
+To represent Hive data in HAWQ, map data values that use a primitive data 
type to HAWQ columns of the same type.
+
+The following table summarizes external mapping rules for Hive primitive 
types.
+
+| Hive Data Type  | Hawq Data Type |
+|---|---|
+| boolean| bool |
+| int   | int4 |
+| smallint   | int2 |
+| tinyint   | int2 |
+| bigint   | int8 |
+| decimal  |  numeric  |
+| float   | float4 |
+| double   | float8 |
+| string   | text |
+| binary   | bytea |
+| char   | bpchar |
+| varchar   | varchar |
+| timestamp   | timestamp |
+| date   | date |
+
+
+### Complex Data Types
+
+Hive supports complex data types including array, struct, map, and union. 
PXF maps each of these complex types to `text`.  While HAWQ does not natively 
support these types, you can create HAWQ functions or application code to 
extract subcomponents of these complex data types.
+
+An example using complex data types is provided later in this topic.
+
+
+## Sample Data Set
+
+Examples used in this topic will operate on a common data set. This simple 
data set models a retail sales operation and includes fields with the following 
names and data types:
+
+- location - text
+- month - text
+- number\_of\_orders - integer
+- total\_sales - double
+
+Prepare the sample data set for use:
+
+1. First, create a text file:
+
+```
+$ vi /tmp/pxf_hive_datafile.txt
+```
+
+2. Add the following data to `pxf_hive_datafile.txt`; notice the use of 
the comma `,` to separate the four field values:
+
+```
+Prague,Jan,101,487

[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612373#comment-15612373
 ] 

ASF GitHub Bot commented on HAWQ-1107:
--

GitHub user lisakowen opened a pull request:

https://github.com/apache/incubator-hawq-docs/pull/41

HAWQ-1107 - incorporate kavinder's comments

incorporated kavinder's comments on HDFS plug in doc restructure.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lisakowen/incubator-hawq-docs 
feature/pxfhdfs-enhance

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq-docs/pull/41.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #41


commit e16a4a46b6ab2a180e99f5fc793bbabb4f4cbfec
Author: Lisa Owen 
Date:   2016-10-27T16:10:29Z

incorporate kavinder's comments




> PXF HDFS documentation - restructure content and include more examples
> --
>
> Key: HAWQ-1107
> URL: https://issues.apache.org/jira/browse/HAWQ-1107
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> the current PXF HDFS documentation does not include any runnable examples.  
> add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, 
> Avro) profiles.  restructure the content as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612406#comment-15612406
 ] 

ASF GitHub Bot commented on HAWQ-1107:
--

Github user kavinderd commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85375160
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,506 +2,449 @@
 title: Accessing HDFS File Data
 ---
 
-## Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
 
-Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
+This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
 
--   Test PXF on HDFS before connecting to Hive or HBase.
--   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
+## Prerequisites
 
-## Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
 
-The syntax for creating an external HDFS file is as follows: 
+-   The HDFS plug-in is installed on all cluster nodes. See [Installing 
PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
 
-``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
-( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]')
-  FORMAT '[TEXT | CSV | CUSTOM]' ();
-```
+## HDFS File Formats
 
-where `` is:
+The PXF HDFS plug-in supports reading the following file formats:
 
-``` pre
-   
FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class]
- | PROFILE=profile-name
-```
+- Text File - comma-separated value (.csv) or delimited format plain text 
file
+- Avro - JSON-defined, schema-based data serialization format
 
-**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
+The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
 
-Use an SQL `SELECT` statement to read from an HDFS READABLE table:
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
 
-``` sql
-SELECT ... FROM table_name;
+If you find that the pre-defined PXF HDFS profiles do not meet your needs, 
you may choose to create a custom HDFS profile from the existing HDFS 
serialization and deserialization classes. Refer to [Adding and Updating 
Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on 
creating a custom profile.
+
+## HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.  
These tools support typical file system operations including copying and 
listing files, changing file permissions, and so forth. 
+
+The HDFS file system command syntax is `hdfs dfs  []`. 
Invoked with no options, `hdfs dfs` lists the file system options supported by 
the tool.
+
+`hdfs dfs` options used in this topic are:
+
+| Option  | Description |
+|---|-|
+| `-cat`| Display file contents. |
+| `-mkdir`| Create directory in HDFS. |
+| `-put`| Copy file from local file system to HDFS. |
+
+Examples:
+
+Create a directory in HDFS:
+
+``` shell
+$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir
 ```
 
-Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table:
+Copy a text file to HDFS:
 
-``` sql
-INSERT INTO table_name ...;
+``` shell
+$ sudo -u hdfs hdfs dfs -put /tmp/example.txt /data/exampledir/
 ```
 
-To read the data in the files or to write based on the existing format, 
use `FORMAT`, `PROFILE`, or one of the classes.
-
-This topic describes the following:
-
--   FORMAT clause
--   Profile
--   Accessor
--   Resolver
--   Avro
-
-**Note:** For more details about the API and classes, see [PXF External 
Tables and 
API](PXFExternalTableandAPIReference.html#pxfexternaltableandapireference).
-
-### FORMAT clause
-
-Use one of the following formats to read data with any PXF connector:
-
--   `FORMAT 'TEXT'`: Use with plain delimited text files on HDFS.
--   `FORMAT 'CSV'`: Us

[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612400#comment-15612400
 ] 

ASF GitHub Bot commented on HAWQ-1107:
--

Github user lisakowen commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85374612
  
--- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
@@ -2,506 +2,449 @@
 title: Accessing HDFS File Data
 ---
 
-## Prerequisites
+HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
 
-Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
+This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
 
--   Test PXF on HDFS before connecting to Hive or HBase.
--   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
+## Prerequisites
 
-## Syntax
+Before working with HDFS file data using HAWQ and PXF, ensure that:
 
-The syntax for creating an external HDFS file is as follows: 
+-   The HDFS plug-in is installed on all cluster nodes. See [Installing 
PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
 
-``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
-( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]')
-  FORMAT '[TEXT | CSV | CUSTOM]' ();
-```
+## HDFS File Formats
 
-where `` is:
+The PXF HDFS plug-in supports reading the following file formats:
 
-``` pre
-   
FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class]
- | PROFILE=profile-name
-```
+- Text File - comma-separated value (.csv) or delimited format plain text 
file
+- Avro - JSON-defined, schema-based data serialization format
 
-**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
+The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
 
-Use an SQL `SELECT` statement to read from an HDFS READABLE table:
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `Avro` - Avro files
 
-``` sql
-SELECT ... FROM table_name;
+If you find that the pre-defined PXF HDFS profiles do not meet your needs, 
you may choose to create a custom HDFS profile from the existing HDFS 
serialization and deserialization classes. Refer to [Adding and Updating 
Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on 
creating a custom profile.
+
+## HDFS Shell Commands
+Hadoop includes command-line tools that interact directly with HDFS.  
These tools support typical file system operations including copying and 
listing files, changing file permissions, and so forth. 
+
+The HDFS file system command syntax is `hdfs dfs  []`. 
Invoked with no options, `hdfs dfs` lists the file system options supported by 
the tool.
+
+`hdfs dfs` options used in this topic are:
+
+| Option  | Description |
+|---|-|
+| `-cat`| Display file contents. |
+| `-mkdir`| Create directory in HDFS. |
+| `-put`| Copy file from local file system to HDFS. |
+
+Examples:
+
+Create a directory in HDFS:
+
+``` shell
+$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir
 ```
 
-Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table:
+Copy a text file to HDFS:
 
-``` sql
-INSERT INTO table_name ...;
+``` shell
+$ sudo -u hdfs hdfs dfs -put /tmp/example.txt /data/exampledir/
 ```
 
-To read the data in the files or to write based on the existing format, 
use `FORMAT`, `PROFILE`, or one of the classes.
-
-This topic describes the following:
-
--   FORMAT clause
--   Profile
--   Accessor
--   Resolver
--   Avro
-
-**Note:** For more details about the API and classes, see [PXF External 
Tables and 
API](PXFExternalTableandAPIReference.html#pxfexternaltableandapireference).
-
-### FORMAT clause
-
-Use one of the following formats to read data with any PXF connector:
-
--   `FORMAT 'TEXT'`: Use with plain delimited text files on HDFS.
--   `FORMAT 'CSV'`: Us

[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612454#comment-15612454
 ] 

ASF GitHub Bot commented on HAWQ-1071:
--

Github user kavinderd commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85376355
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -2,121 +2,450 @@
 title: Accessing Hive Data
 ---
 
-This topic describes how to access Hive data using PXF. You have several 
options for querying data stored in Hive. You can create external tables in PXF 
and then query those tables, or you can easily query Hive tables by using HAWQ 
and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored 
in HCatalog.
+Apache Hive is a distributed data warehousing infrastructure.  Hive 
facilitates managing large data sets supporting multiple data formats, 
including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive 
plug-in reads data stored in Hive, as well as HDFS or HBase.
+
+This section describes how to use PXF to access Hive data. Options for 
querying data stored in Hive include:
+
+-  Creating an external table in PXF and querying that table
+-  Querying Hive tables via PXF's integration with HCatalog
 
 ## Prerequisites
 
-Check the following before using PXF to access Hive:
+Before accessing Hive data with HAWQ and PXF, ensure that:
 
--   The PXF HDFS plug-in is installed on all cluster nodes.
+-   The PXF HDFS plug-in is installed on all cluster nodes. See 
[Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation 
information.
 -   The PXF Hive plug-in is installed on all cluster nodes.
 -   The Hive JAR files and conf directory are installed on all cluster 
nodes.
--   Test PXF on HDFS before connecting to Hive or HBase.
+-   You have tested PXF on HDFS.
 -   You are running the Hive Metastore service on a machine in your 
cluster. 
 -   You have set the `hive.metastore.uris` property in the `hive-site.xml` 
on the NameNode.
 
+## Hive File Formats
+
+Hive supports several file formats:
+
+-   TextFile - flat file with data in comma-, tab-, or space-separated 
value format or JSON notation
+-   SequenceFile - flat file consisting of binary key/value pairs
+-   RCFile - record columnar data consisting of binary key/value pairs; 
high row compression rate
+-   ORCFile - optimized row columnar data with stripe, footer, and 
postscript sections; reduces data size
+-   Parquet - compressed columnar data representation
+-   Avro - JSON-defined, schema-based data serialization format
+
+Refer to [File 
Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for 
detailed information about the file formats supported by Hive.
+
+The PXF Hive plug-in supports the following profiles for accessing the 
Hive file formats listed above. These include:
+
+- `Hive`
+- `HiveText`
+- `HiveRC`
+
+## Data Type Mapping
+
+### Primitive Data Types
+
+To represent Hive data in HAWQ, map data values that use a primitive data 
type to HAWQ columns of the same type.
+
+The following table summarizes external mapping rules for Hive primitive 
types.
+
+| Hive Data Type  | Hawq Data Type |
+|---|---|
+| boolean| bool |
+| int   | int4 |
+| smallint   | int2 |
+| tinyint   | int2 |
+| bigint   | int8 |
+| decimal  |  numeric  |
+| float   | float4 |
+| double   | float8 |
+| string   | text |
+| binary   | bytea |
+| char   | bpchar |
+| varchar   | varchar |
+| timestamp   | timestamp |
+| date   | date |
+
+
+### Complex Data Types
+
+Hive supports complex data types including array, struct, map, and union. 
PXF maps each of these complex types to `text`.  While HAWQ does not natively 
support these types, you can create HAWQ functions or application code to 
extract subcomponents of these complex data types.
+
+An example using complex data types is provided later in this topic.
+
+
+## Sample Data Set
+
+Examples used in this topic will operate on a common data set. This simple 
data set models a retail sales operation and includes fields with the following 
names and data types:
+
+- location - text
+- month - text
+- number\_of\_orders - integer
+- total\_sales - double
+
+Prepare the sample data set for use:
+
+1. First, create a text file:
+
+```
+$ vi /tmp/pxf_hive_datafile.txt
+```
+
+2. Add the following data to `pxf_hive_datafile.txt`; notice the use of 
the comma `,` to separate the four field values:
+
+```
+Prague,Jan,101,

[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612455#comment-15612455
 ] 

ASF GitHub Bot commented on HAWQ-1071:
--

Github user kavinderd commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85376289
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -2,121 +2,450 @@
 title: Accessing Hive Data
 ---
 
-This topic describes how to access Hive data using PXF. You have several 
options for querying data stored in Hive. You can create external tables in PXF 
and then query those tables, or you can easily query Hive tables by using HAWQ 
and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored 
in HCatalog.
+Apache Hive is a distributed data warehousing infrastructure.  Hive 
facilitates managing large data sets supporting multiple data formats, 
including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive 
plug-in reads data stored in Hive, as well as HDFS or HBase.
+
+This section describes how to use PXF to access Hive data. Options for 
querying data stored in Hive include:
+
+-  Creating an external table in PXF and querying that table
+-  Querying Hive tables via PXF's integration with HCatalog
 
 ## Prerequisites
 
-Check the following before using PXF to access Hive:
+Before accessing Hive data with HAWQ and PXF, ensure that:
 
--   The PXF HDFS plug-in is installed on all cluster nodes.
+-   The PXF HDFS plug-in is installed on all cluster nodes. See 
[Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation 
information.
 -   The PXF Hive plug-in is installed on all cluster nodes.
 -   The Hive JAR files and conf directory are installed on all cluster 
nodes.
--   Test PXF on HDFS before connecting to Hive or HBase.
+-   You have tested PXF on HDFS.
 -   You are running the Hive Metastore service on a machine in your 
cluster. 
 -   You have set the `hive.metastore.uris` property in the `hive-site.xml` 
on the NameNode.
 
+## Hive File Formats
+
+Hive supports several file formats:
+
+-   TextFile - flat file with data in comma-, tab-, or space-separated 
value format or JSON notation
+-   SequenceFile - flat file consisting of binary key/value pairs
+-   RCFile - record columnar data consisting of binary key/value pairs; 
high row compression rate
+-   ORCFile - optimized row columnar data with stripe, footer, and 
postscript sections; reduces data size
+-   Parquet - compressed columnar data representation
+-   Avro - JSON-defined, schema-based data serialization format
+
+Refer to [File 
Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for 
detailed information about the file formats supported by Hive.
+
+The PXF Hive plug-in supports the following profiles for accessing the 
Hive file formats listed above. These include:
+
+- `Hive`
+- `HiveText`
+- `HiveRC`
+
+## Data Type Mapping
+
+### Primitive Data Types
+
+To represent Hive data in HAWQ, map data values that use a primitive data 
type to HAWQ columns of the same type.
+
+The following table summarizes external mapping rules for Hive primitive 
types.
+
+| Hive Data Type  | Hawq Data Type |
+|---|---|
+| boolean| bool |
+| int   | int4 |
+| smallint   | int2 |
+| tinyint   | int2 |
+| bigint   | int8 |
+| decimal  |  numeric  |
+| float   | float4 |
+| double   | float8 |
+| string   | text |
+| binary   | bytea |
+| char   | bpchar |
+| varchar   | varchar |
+| timestamp   | timestamp |
+| date   | date |
+
+
+### Complex Data Types
+
+Hive supports complex data types including array, struct, map, and union. 
PXF maps each of these complex types to `text`.  While HAWQ does not natively 
support these types, you can create HAWQ functions or application code to 
extract subcomponents of these complex data types.
+
+An example using complex data types is provided later in this topic.
+
+
+## Sample Data Set
+
+Examples used in this topic will operate on a common data set. This simple 
data set models a retail sales operation and includes fields with the following 
names and data types:
+
+- location - text
+- month - text
+- number\_of\_orders - integer
+- total\_sales - double
+
+Prepare the sample data set for use:
+
+1. First, create a text file:
+
+```
+$ vi /tmp/pxf_hive_datafile.txt
+```
+
+2. Add the following data to `pxf_hive_datafile.txt`; notice the use of 
the comma `,` to separate the four field values:
+
+```
+Prague,Jan,101,

[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612453#comment-15612453
 ] 

ASF GitHub Bot commented on HAWQ-1071:
--

Github user kavinderd commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85378363
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -339,21 +601,21 @@ postgres=# CREATE EXTERNAL TABLE pxf_sales_part(
   delivery_state TEXT, 
   delivery_city TEXT
 )
-LOCATION ('pxf://namenode_host:51200/sales_part?Profile=Hive')
+LOCATION ('pxf://namenode:51200/sales_part?Profile=Hive')
 FORMAT 'custom' (FORMATTER='pxfwritable_import');
 
 postgres=# SELECT * FROM pxf_sales_part;
 ```
 
-### Example
+### Query Without Pushdown
 
 In the following example, the HAWQ query filters the `delivery_city` 
partition `Sacramento`. The filter on  `item_name` is not pushed down, since it 
is not a partition column. It is performed on the HAWQ side after all the data 
on `Sacramento` is transferred for processing.
 
 ``` sql
-postgres=# SELECT * FROM pxf_sales_part WHERE delivery_city = 'Sacramento' 
AND item_name = 'shirt';
+postgres=# SELECT * FROM pxf_sales_part WHERE delivery_city = 'Sacramento' 
AND item_name = 'cube';
 ```
 
-### Example
+### Query With Pushdown
--- End diff --

Somewhere it should be stated that the HAWQ GUC 
`pxf_enable_filter_pushdown` needs to be turned on. If this is off no filter 
pushdown will occur regardless of the nature of the query.


> add PXF HiveText and HiveRC profile examples to the documentation
> -
>
> Key: HAWQ-1071
> URL: https://issues.apache.org/jira/browse/HAWQ-1071
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> the current PXF Hive documentation includes an example for only the Hive 
> profile.  add examples for HiveText and HiveRC profiles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612800#comment-15612800
 ] 

ASF GitHub Bot commented on HAWQ-1071:
--

Github user lisakowen commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85403929
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -2,121 +2,450 @@
 title: Accessing Hive Data
 ---
 
-This topic describes how to access Hive data using PXF. You have several 
options for querying data stored in Hive. You can create external tables in PXF 
and then query those tables, or you can easily query Hive tables by using HAWQ 
and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored 
in HCatalog.
+Apache Hive is a distributed data warehousing infrastructure.  Hive 
facilitates managing large data sets supporting multiple data formats, 
including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive 
plug-in reads data stored in Hive, as well as HDFS or HBase.
+
+This section describes how to use PXF to access Hive data. Options for 
querying data stored in Hive include:
+
+-  Creating an external table in PXF and querying that table
+-  Querying Hive tables via PXF's integration with HCatalog
 
 ## Prerequisites
 
-Check the following before using PXF to access Hive:
+Before accessing Hive data with HAWQ and PXF, ensure that:
 
--   The PXF HDFS plug-in is installed on all cluster nodes.
+-   The PXF HDFS plug-in is installed on all cluster nodes. See 
[Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation 
information.
 -   The PXF Hive plug-in is installed on all cluster nodes.
 -   The Hive JAR files and conf directory are installed on all cluster 
nodes.
--   Test PXF on HDFS before connecting to Hive or HBase.
+-   You have tested PXF on HDFS.
 -   You are running the Hive Metastore service on a machine in your 
cluster. 
 -   You have set the `hive.metastore.uris` property in the `hive-site.xml` 
on the NameNode.
 
+## Hive File Formats
+
+Hive supports several file formats:
+
+-   TextFile - flat file with data in comma-, tab-, or space-separated 
value format or JSON notation
+-   SequenceFile - flat file consisting of binary key/value pairs
+-   RCFile - record columnar data consisting of binary key/value pairs; 
high row compression rate
+-   ORCFile - optimized row columnar data with stripe, footer, and 
postscript sections; reduces data size
+-   Parquet - compressed columnar data representation
+-   Avro - JSON-defined, schema-based data serialization format
+
+Refer to [File 
Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for 
detailed information about the file formats supported by Hive.
+
+The PXF Hive plug-in supports the following profiles for accessing the 
Hive file formats listed above. These include:
+
+- `Hive`
+- `HiveText`
+- `HiveRC`
+
+## Data Type Mapping
+
+### Primitive Data Types
+
+To represent Hive data in HAWQ, map data values that use a primitive data 
type to HAWQ columns of the same type.
+
+The following table summarizes external mapping rules for Hive primitive 
types.
+
+| Hive Data Type  | Hawq Data Type |
+|---|---|
+| boolean| bool |
+| int   | int4 |
+| smallint   | int2 |
+| tinyint   | int2 |
+| bigint   | int8 |
+| decimal  |  numeric  |
+| float   | float4 |
+| double   | float8 |
+| string   | text |
+| binary   | bytea |
+| char   | bpchar |
+| varchar   | varchar |
+| timestamp   | timestamp |
+| date   | date |
+
+
+### Complex Data Types
+
+Hive supports complex data types including array, struct, map, and union. 
PXF maps each of these complex types to `text`.  While HAWQ does not natively 
support these types, you can create HAWQ functions or application code to 
extract subcomponents of these complex data types.
+
+An example using complex data types is provided later in this topic.
+
+
+## Sample Data Set
+
+Examples used in this topic will operate on a common data set. This simple 
data set models a retail sales operation and includes fields with the following 
names and data types:
+
+- location - text
+- month - text
+- number\_of\_orders - integer
+- total\_sales - double
+
+Prepare the sample data set for use:
+
+1. First, create a text file:
+
+```
+$ vi /tmp/pxf_hive_datafile.txt
+```
+
+2. Add the following data to `pxf_hive_datafile.txt`; notice the use of 
the comma `,` to separate the four field values:
+
+```
+Prague,Jan,101,

[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612864#comment-15612864
 ] 

ASF GitHub Bot commented on HAWQ-1071:
--

Github user lisakowen commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85407620
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -339,21 +601,21 @@ postgres=# CREATE EXTERNAL TABLE pxf_sales_part(
   delivery_state TEXT, 
   delivery_city TEXT
 )
-LOCATION ('pxf://namenode_host:51200/sales_part?Profile=Hive')
+LOCATION ('pxf://namenode:51200/sales_part?Profile=Hive')
 FORMAT 'custom' (FORMATTER='pxfwritable_import');
 
 postgres=# SELECT * FROM pxf_sales_part;
 ```
 
-### Example
+### Query Without Pushdown
 
 In the following example, the HAWQ query filters the `delivery_city` 
partition `Sacramento`. The filter on  `item_name` is not pushed down, since it 
is not a partition column. It is performed on the HAWQ side after all the data 
on `Sacramento` is transferred for processing.
 
 ``` sql
-postgres=# SELECT * FROM pxf_sales_part WHERE delivery_city = 'Sacramento' 
AND item_name = 'shirt';
+postgres=# SELECT * FROM pxf_sales_part WHERE delivery_city = 'Sacramento' 
AND item_name = 'cube';
 ```
 
-### Example
+### Query With Pushdown
--- End diff --

yes, this is good info to share with the user!  i checked out the code, and 
it looks like this GUC is on by default.  i will add some text to that effect 
in the appropriate section.


> add PXF HiveText and HiveRC profile examples to the documentation
> -
>
> Key: HAWQ-1071
> URL: https://issues.apache.org/jira/browse/HAWQ-1071
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> the current PXF Hive documentation includes an example for only the Hive 
> profile.  add examples for HiveText and HiveRC profiles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612866#comment-15612866
 ] 

ASF GitHub Bot commented on HAWQ-1071:
--

Github user lisakowen commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85407700
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -2,121 +2,450 @@
 title: Accessing Hive Data
 ---
 
-This topic describes how to access Hive data using PXF. You have several 
options for querying data stored in Hive. You can create external tables in PXF 
and then query those tables, or you can easily query Hive tables by using HAWQ 
and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored 
in HCatalog.
+Apache Hive is a distributed data warehousing infrastructure.  Hive 
facilitates managing large data sets supporting multiple data formats, 
including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive 
plug-in reads data stored in Hive, as well as HDFS or HBase.
+
+This section describes how to use PXF to access Hive data. Options for 
querying data stored in Hive include:
+
+-  Creating an external table in PXF and querying that table
+-  Querying Hive tables via PXF's integration with HCatalog
 
 ## Prerequisites
 
-Check the following before using PXF to access Hive:
+Before accessing Hive data with HAWQ and PXF, ensure that:
 
--   The PXF HDFS plug-in is installed on all cluster nodes.
+-   The PXF HDFS plug-in is installed on all cluster nodes. See 
[Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation 
information.
 -   The PXF Hive plug-in is installed on all cluster nodes.
 -   The Hive JAR files and conf directory are installed on all cluster 
nodes.
--   Test PXF on HDFS before connecting to Hive or HBase.
+-   You have tested PXF on HDFS.
 -   You are running the Hive Metastore service on a machine in your 
cluster. 
 -   You have set the `hive.metastore.uris` property in the `hive-site.xml` 
on the NameNode.
 
+## Hive File Formats
+
+Hive supports several file formats:
+
+-   TextFile - flat file with data in comma-, tab-, or space-separated 
value format or JSON notation
+-   SequenceFile - flat file consisting of binary key/value pairs
+-   RCFile - record columnar data consisting of binary key/value pairs; 
high row compression rate
+-   ORCFile - optimized row columnar data with stripe, footer, and 
postscript sections; reduces data size
+-   Parquet - compressed columnar data representation
+-   Avro - JSON-defined, schema-based data serialization format
+
+Refer to [File 
Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for 
detailed information about the file formats supported by Hive.
+
+The PXF Hive plug-in supports the following profiles for accessing the 
Hive file formats listed above. These include:
+
+- `Hive`
+- `HiveText`
+- `HiveRC`
+
+## Data Type Mapping
+
+### Primitive Data Types
+
+To represent Hive data in HAWQ, map data values that use a primitive data 
type to HAWQ columns of the same type.
+
+The following table summarizes external mapping rules for Hive primitive 
types.
+
+| Hive Data Type  | Hawq Data Type |
+|---|---|
+| boolean| bool |
+| int   | int4 |
+| smallint   | int2 |
+| tinyint   | int2 |
+| bigint   | int8 |
+| decimal  |  numeric  |
+| float   | float4 |
+| double   | float8 |
+| string   | text |
+| binary   | bytea |
+| char   | bpchar |
+| varchar   | varchar |
+| timestamp   | timestamp |
+| date   | date |
+
+
+### Complex Data Types
+
+Hive supports complex data types including array, struct, map, and union. 
PXF maps each of these complex types to `text`.  While HAWQ does not natively 
support these types, you can create HAWQ functions or application code to 
extract subcomponents of these complex data types.
+
+An example using complex data types is provided later in this topic.
+
+
+## Sample Data Set
+
+Examples used in this topic will operate on a common data set. This simple 
data set models a retail sales operation and includes fields with the following 
names and data types:
+
+- location - text
+- month - text
+- number\_of\_orders - integer
+- total\_sales - double
+
+Prepare the sample data set for use:
+
+1. First, create a text file:
+
+```
+$ vi /tmp/pxf_hive_datafile.txt
+```
+
+2. Add the following data to `pxf_hive_datafile.txt`; notice the use of 
the comma `,` to separate the four field values:
+
+```
+Prague,Jan,101,

[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612876#comment-15612876
 ] 

ASF GitHub Bot commented on HAWQ-1071:
--

Github user lisakowen commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85408122
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -2,121 +2,450 @@
 title: Accessing Hive Data
 ---
 
-This topic describes how to access Hive data using PXF. You have several 
options for querying data stored in Hive. You can create external tables in PXF 
and then query those tables, or you can easily query Hive tables by using HAWQ 
and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored 
in HCatalog.
+Apache Hive is a distributed data warehousing infrastructure.  Hive 
facilitates managing large data sets supporting multiple data formats, 
including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive 
plug-in reads data stored in Hive, as well as HDFS or HBase.
+
+This section describes how to use PXF to access Hive data. Options for 
querying data stored in Hive include:
+
+-  Creating an external table in PXF and querying that table
+-  Querying Hive tables via PXF's integration with HCatalog
 
 ## Prerequisites
 
-Check the following before using PXF to access Hive:
+Before accessing Hive data with HAWQ and PXF, ensure that:
 
--   The PXF HDFS plug-in is installed on all cluster nodes.
+-   The PXF HDFS plug-in is installed on all cluster nodes. See 
[Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation 
information.
 -   The PXF Hive plug-in is installed on all cluster nodes.
 -   The Hive JAR files and conf directory are installed on all cluster 
nodes.
--   Test PXF on HDFS before connecting to Hive or HBase.
+-   You have tested PXF on HDFS.
 -   You are running the Hive Metastore service on a machine in your 
cluster. 
 -   You have set the `hive.metastore.uris` property in the `hive-site.xml` 
on the NameNode.
 
+## Hive File Formats
+
+Hive supports several file formats:
+
+-   TextFile - flat file with data in comma-, tab-, or space-separated 
value format or JSON notation
+-   SequenceFile - flat file consisting of binary key/value pairs
+-   RCFile - record columnar data consisting of binary key/value pairs; 
high row compression rate
+-   ORCFile - optimized row columnar data with stripe, footer, and 
postscript sections; reduces data size
+-   Parquet - compressed columnar data representation
+-   Avro - JSON-defined, schema-based data serialization format
+
+Refer to [File 
Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for 
detailed information about the file formats supported by Hive.
+
+The PXF Hive plug-in supports the following profiles for accessing the 
Hive file formats listed above. These include:
+
+- `Hive`
+- `HiveText`
+- `HiveRC`
+
+## Data Type Mapping
+
+### Primitive Data Types
+
+To represent Hive data in HAWQ, map data values that use a primitive data 
type to HAWQ columns of the same type.
+
+The following table summarizes external mapping rules for Hive primitive 
types.
+
+| Hive Data Type  | Hawq Data Type |
+|---|---|
+| boolean| bool |
+| int   | int4 |
+| smallint   | int2 |
+| tinyint   | int2 |
+| bigint   | int8 |
+| decimal  |  numeric  |
+| float   | float4 |
+| double   | float8 |
+| string   | text |
+| binary   | bytea |
+| char   | bpchar |
+| varchar   | varchar |
+| timestamp   | timestamp |
+| date   | date |
+
+
+### Complex Data Types
+
+Hive supports complex data types including array, struct, map, and union. 
PXF maps each of these complex types to `text`.  While HAWQ does not natively 
support these types, you can create HAWQ functions or application code to 
extract subcomponents of these complex data types.
+
+An example using complex data types is provided later in this topic.
+
+
+## Sample Data Set
+
+Examples used in this topic will operate on a common data set. This simple 
data set models a retail sales operation and includes fields with the following 
names and data types:
+
+- location - text
+- month - text
+- number\_of\_orders - integer
+- total\_sales - double
+
+Prepare the sample data set for use:
+
+1. First, create a text file:
+
+```
+$ vi /tmp/pxf_hive_datafile.txt
+```
+
+2. Add the following data to `pxf_hive_datafile.txt`; notice the use of 
the comma `,` to separate the four field values:
+
+```
+Prague,Jan,101,

[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613170#comment-15613170
 ] 

ASF GitHub Bot commented on HAWQ-1071:
--

Github user lisakowen commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85424776
  
--- Diff: pxf/HivePXF.html.md.erb ---
@@ -339,21 +601,21 @@ postgres=# CREATE EXTERNAL TABLE pxf_sales_part(
   delivery_state TEXT, 
   delivery_city TEXT
 )
-LOCATION ('pxf://namenode_host:51200/sales_part?Profile=Hive')
+LOCATION ('pxf://namenode:51200/sales_part?Profile=Hive')
 FORMAT 'custom' (FORMATTER='pxfwritable_import');
 
 postgres=# SELECT * FROM pxf_sales_part;
 ```
 
-### Example
+### Query Without Pushdown
 
 In the following example, the HAWQ query filters the `delivery_city` 
partition `Sacramento`. The filter on  `item_name` is not pushed down, since it 
is not a partition column. It is performed on the HAWQ side after all the data 
on `Sacramento` is transferred for processing.
 
 ``` sql
-postgres=# SELECT * FROM pxf_sales_part WHERE delivery_city = 'Sacramento' 
AND item_name = 'shirt';
+postgres=# SELECT * FROM pxf_sales_part WHERE delivery_city = 'Sacramento' 
AND item_name = 'cube';
 ```
 
-### Example
+### Query With Pushdown
--- End diff --

will also need to add this GUC to the documentation. 


> add PXF HiveText and HiveRC profile examples to the documentation
> -
>
> Key: HAWQ-1071
> URL: https://issues.apache.org/jira/browse/HAWQ-1071
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> the current PXF Hive documentation includes an example for only the Hive 
> profile.  add examples for HiveText and HiveRC profiles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613291#comment-15613291
 ] 

ASF GitHub Bot commented on HAWQ-1071:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-hawq-docs/pull/40


> add PXF HiveText and HiveRC profile examples to the documentation
> -
>
> Key: HAWQ-1071
> URL: https://issues.apache.org/jira/browse/HAWQ-1071
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> the current PXF Hive documentation includes an example for only the Hive 
> profile.  add examples for HiveText and HiveRC profiles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613661#comment-15613661
 ] 

ASF GitHub Bot commented on HAWQ-1071:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-hawq-docs/pull/39


> add PXF HiveText and HiveRC profile examples to the documentation
> -
>
> Key: HAWQ-1071
> URL: https://issues.apache.org/jira/browse/HAWQ-1071
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> the current PXF Hive documentation includes an example for only the Hive 
> profile.  add examples for HiveText and HiveRC profiles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613688#comment-15613688
 ] 

ASF GitHub Bot commented on HAWQ-1107:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-hawq-docs/pull/41


> PXF HDFS documentation - restructure content and include more examples
> --
>
> Key: HAWQ-1107
> URL: https://issues.apache.org/jira/browse/HAWQ-1107
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
>Priority: Minor
> Fix For: 2.0.1.0-incubating
>
>
> the current PXF HDFS documentation does not include any runnable examples.  
> add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, 
> Avro) profiles.  restructure the content as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1119) create new documentation topic for PXF writable profiles

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616785#comment-15616785
 ] 

ASF GitHub Bot commented on HAWQ-1119:
--

GitHub user lisakowen opened a pull request:

https://github.com/apache/incubator-hawq-docs/pull/46

HAWQ-1119 - create doc content for PXF writable profiles

created a new section for PXF writable profiles (HDFS plug in 
HdfsTextSimple and SequenceWritable).  included examples and discussions.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lisakowen/incubator-hawq-docs 
feature/pxfhdfs-writable

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq-docs/pull/46.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #46


commit f2304ce06b0529177efee1912c6c3c3b9aaf5b1f
Author: Lisa Owen 
Date:   2016-10-25T20:14:42Z

add file for PXF HDFS writable profile topic

commit 80dc3dbe33397ef872265afe50551957cd773bef
Author: Lisa Owen 
Date:   2016-10-26T19:32:04Z

adding more content

commit fddb5b8817fdf3a800eb7099d9417bd05735abab
Author: Lisa Owen 
Date:   2016-10-28T21:58:01Z

flesh out sequencewritable profile section

commit a88c167ead7135a6f43d11cb6c4921fe680e60b9
Author: Lisa Owen 
Date:   2016-10-28T22:00:17Z

change section title

commit a3cbdcf804d936c512f99e1b15905328f4f835f1
Author: Lisa Owen 
Date:   2016-10-28T22:13:54Z

add link to writing to HDFS in pxf overview page




> create new documentation topic for PXF writable profiles
> 
>
> Key: HAWQ-1119
> URL: https://issues.apache.org/jira/browse/HAWQ-1119
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
> Fix For: 2.0.1.0-incubating
>
>
> certain profiles supported by the existing PXF plug-ins support writable 
> tables.  create some documentation content for these profiles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1119) create new documentation topic for PXF writable profiles

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616789#comment-15616789
 ] 

ASF GitHub Bot commented on HAWQ-1119:
--

GitHub user lisakowen opened a pull request:

https://github.com/apache/incubator-hawq-docs/pull/47

HAWQ-1119 - subnav changes for new pxf section

subnav changes for new "writing to HDFS" pxf section.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lisakowen/incubator-hawq-docs 
feature/subnav-pxfhdfs-writable

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq-docs/pull/47.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #47


commit 42490f083ae5b84d1d5066a994fefe1d56bdc20a
Author: Lisa Owen 
Date:   2016-10-28T22:20:35Z

add pxf subtopic for writing data to hdfs




> create new documentation topic for PXF writable profiles
> 
>
> Key: HAWQ-1119
> URL: https://issues.apache.org/jira/browse/HAWQ-1119
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lisa Owen
>Assignee: David Yozie
> Fix For: 2.0.1.0-incubating
>
>
> certain profiles supported by the existing PXF plug-ins support writable 
> tables.  create some documentation content for these profiles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HAWQ-1119) create new documentation topic for PXF writable profiles

2016-10-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623612#comment-15623612
 ] 

ASF GitHub Bot commented on HAWQ-1119:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85814125
  
--- Diff: pxf/HDFSWritablePXF.html.md.erb ---
@@ -0,0 +1,410 @@
+---
+title: Writing Data to HDFS
+---
+
+The PXF HDFS plug-in supports writable external tables using the 
`HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable 
table to export data from a HAWQ internal table to HDFS.
+
+This section describes how to use these PXF profiles to create writable 
external tables.
+
+**Note**: You cannot directly query data in a HAWQ writable table.  After 
creating the external writable table, you must create a HAWQ readable external 
table accessing the HDFS file, then query that table. ??You can also create a 
Hive table to access the HDFS file.??
+
+## Prerequisites
+
+Before working with HDFS file data using HAWQ and PXF, ensure that:
+
+-   The HDFS plug-in is installed on all cluster nodes. See [Installing 
PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
+
+## Writing to PXF External Tables
+The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and 
`SequenceWritable`.
+
+Use the following syntax to create a HAWQ external writable table 
representing HDFS data: 
+
+``` sql
+CREATE EXTERNAL WRITABLE TABLE  
+(   [, ...] | LIKE  )
+LOCATION ('pxf://[:]/
+
?PROFILE=HdfsTextSimple|SequenceWritable[&=[...]]')
+FORMAT '[TEXT|CSV|CUSTOM]' ();
+```
+
+HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL 
TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the 
table below.
+
+| Keyword  | Value |
+|---|-|
+| \[:\]| The HDFS NameNode and port. |
+| \| The path to the file in the HDFS data store. |
+| PROFILE| The `PROFILE` keyword must specify one of the values 
`HdfsTextSimple` or `SequenceWritable`. |
+| \  | \ is profile-specific. These 
options are discussed in the next topic.|
+| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile 
when \ will reference a plain text delimited file. The 
`HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in 
`(delimiter=)` \. |
+| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` when 
\ will reference a comma-separated value file.  |
+| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the 
`SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports 
only the built-in `(formatter='pxfwritable_export)` (write) and 
`(formatter='pxfwritable_import)` (read) \.
+
+**Note**: When creating PXF external tables, you cannot use the `HEADER` 
option in your `FORMAT` specification.
+
+## Custom Options
+
+The `HdfsTextSimple` and `SequenceWritable` profiles support the following 
\:
+
+| Keyword  | Value Description |
+|---|-|
+| COMPRESSION_CODEC| The compression codec Java class name. If this 
option is not provided, no data compression is performed. Supported compression 
codecs include: `org.apache.hadoop.io.compress.DefaultCodec`, 
`org.apache.hadoop.io.compress.BZip2Codec`, and 
`org.apache.hadoop.io.compress.GzipCodec` (`HdfsTextSimple` profile only) |
+| COMPRESSION_TYPE| The compression type to employ; supported values 
are `RECORD` (the default) or `BLOCK`. |
+| DATA-SCHEMA| (`SequenceWritable` profile only) The name of the 
writer serialization/deserialization class. The jar file in which this class 
resides must be in the PXF class path. This option has no default value. |
+| THREAD-SAFE | Boolean value determining if a table query can run in 
multi-thread mode. Default value is `TRUE`, requests run in multi-threaded 
mode. When set to `FALSE`, requests will be handled in a single thread.  
`THREAD-SAFE` should be set appropriately when operations that are not 
thread-safe are performed (i.e. compression). |
+
+## HdfsTextSimple Profile
+
+Use the `HdfsTextSimple` profile when writing delimited data to a plain 
text file where each row is a single record.
+
+Writable tables created using the `HdfsTextSimple` profile can use no, 
record, or block compression. When compression is used, the default, gzip, and 
bzip2 Hadoop compression codecs are supported:
+
+- org.apache.hadoop.io.compress.DefaultCodec
+- org.apache.hadoop.io.compres

[jira] [Commented] (HAWQ-1119) create new documentation topic for PXF writable profiles

2016-10-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623628#comment-15623628
 ] 

ASF GitHub Bot commented on HAWQ-1119:
--

Github user dyozie commented on a diff in the pull request:

https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85813812
  
--- Diff: pxf/HDFSWritablePXF.html.md.erb ---
@@ -0,0 +1,410 @@
+---
+title: Writing Data to HDFS
+---
+
+The PXF HDFS plug-in supports writable external tables using the 
`HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable 
table to export data from a HAWQ internal table to HDFS.
+
+This section describes how to use these PXF profiles to create writable 
external tables.
+
+**Note**: You cannot directly query data in a HAWQ writable table.  After 
creating the external writable table, you must create a HAWQ readable external 
table accessing the HDFS file, then query that table. ??You can also create a 
Hive table to access the HDFS file.??
+
+## Prerequisites
+
+Before working with HDFS file data using HAWQ and PXF, ensure that:
+
+-   The HDFS plug-in is installed on all cluster nodes. See [Installing 
PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
+-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
+
+## Writing to PXF External Tables
+The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and 
`SequenceWritable`.
+
+Use the following syntax to create a HAWQ external writable table 
representing HDFS data: 
+
+``` sql
+CREATE EXTERNAL WRITABLE TABLE  
+(   [, ...] | LIKE  )
+LOCATION ('pxf://[:]/
+
?PROFILE=HdfsTextSimple|SequenceWritable[&=[...]]')
+FORMAT '[TEXT|CSV|CUSTOM]' ();
+```
+
+HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL 
TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the 
table below.
+
+| Keyword  | Value |
+|---|-|
+| \[:\]| The HDFS NameNode and port. |
+| \| The path to the file in the HDFS data store. |
+| PROFILE| The `PROFILE` keyword must specify one of the values 
`HdfsTextSimple` or `SequenceWritable`. |
+| \  | \ is profile-specific. These 
options are discussed in the next topic.|
+| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile 
when \ will reference a plain text delimited file. The 
`HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in 
`(delimiter=)` \. |
+| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` when 
\ will reference a comma-separated value file.  |
+| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the 
`SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports 
only the built-in `(formatter='pxfwritable_export)` (write) and 
`(formatter='pxfwritable_import)` (read) \.
+
+**Note**: When creating PXF external tables, you cannot use the `HEADER` 
option in your `FORMAT` specification.
+
+## Custom Options
+
+The `HdfsTextSimple` and `SequenceWritable` profiles support the following 
\:
+
+| Keyword  | Value Description |
+|---|-|
+| COMPRESSION_CODEC| The compression codec Java class name. If this 
option is not provided, no data compression is performed. Supported compression 
codecs include: `org.apache.hadoop.io.compress.DefaultCodec`, 
`org.apache.hadoop.io.compress.BZip2Codec`, and 
`org.apache.hadoop.io.compress.GzipCodec` (`HdfsTextSimple` profile only) |
+| COMPRESSION_TYPE| The compression type to employ; supported values 
are `RECORD` (the default) or `BLOCK`. |
+| DATA-SCHEMA| (`SequenceWritable` profile only) The name of the 
writer serialization/deserialization class. The jar file in which this class 
resides must be in the PXF class path. This option has no default value. |
+| THREAD-SAFE | Boolean value determining if a table query can run in 
multi-thread mode. Default value is `TRUE`, requests run in multi-threaded 
mode. When set to `FALSE`, requests will be handled in a single thread.  
`THREAD-SAFE` should be set appropriately when operations that are not 
thread-safe are performed (i.e. compression). |
+
+## HdfsTextSimple Profile
+
+Use the `HdfsTextSimple` profile when writing delimited data to a plain 
text file where each row is a single record.
+
+Writable tables created using the `HdfsTextSimple` profile can use no, 
record, or block compression. When compression is used, the default, gzip, and 
bzip2 Hadoop compression codecs are supported:
+
+- org.apache.hadoop.io.compress.DefaultCodec
+- org.apache.hadoop.io.compres

  1   2   3   4   5   6   7   8   9   10   >