[jira] [Commented] (HAWQ-1554) Add registered trademark symbol (®) to website.
[ https://issues.apache.org/jira/browse/HAWQ-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16266481#comment-16266481 ] ASF GitHub Bot commented on HAWQ-1554: -- GitHub user edespino opened a pull request: https://github.com/apache/incubator-hawq-site/pull/14 HAWQ-1554. Add registered trademark symbol (®) to website. This task tracks the update of the website's home page and download page (or section of the website) to add the ® character after the first and most prominent mentions of HAWQ in any text (i.e. not inside graphics). You can merge this pull request into a Git repository by running: $ git pull https://github.com/edespino/incubator-hawq-site HAWQ-1554 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-site/pull/14.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14 commit bfcc79a5d5e1fc7ab945f57bfb77a368f8a3dd1e Author: Ed Espino Date: 2017-11-27T08:00:08Z HAWQ-1554. Add registered trademark symbol (®) to website. > Add registered trademark symbol (®) to website. > --- > > Key: HAWQ-1554 > URL: https://issues.apache.org/jira/browse/HAWQ-1554 > Project: Apache HAWQ > Issue Type: Task > Components: Documentation >Reporter: Ed Espino >Assignee: Ed Espino > > Since HAWQ® is a registered trademark, we need to ensure we mark it > appropriately to ensure that we can preserve our rights. > This task tracks the update of the website's home page and download > page (or section of the website) to add the ® character after the first > and most prominent mentions of HAWQ in any text (i.e. not inside graphics). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HAWQ-1554) Add registered trademark symbol (®) to website.
[ https://issues.apache.org/jira/browse/HAWQ-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16266486#comment-16266486 ] ASF GitHub Bot commented on HAWQ-1554: -- Github user huor commented on the issue: https://github.com/apache/incubator-hawq-site/pull/14 Looks good, +1 > Add registered trademark symbol (®) to website. > --- > > Key: HAWQ-1554 > URL: https://issues.apache.org/jira/browse/HAWQ-1554 > Project: Apache HAWQ > Issue Type: Task > Components: Documentation >Reporter: Ed Espino >Assignee: Ed Espino > > Since HAWQ® is a registered trademark, we need to ensure we mark it > appropriately to ensure that we can preserve our rights. > This task tracks the update of the website's home page and download > page (or section of the website) to add the ® character after the first > and most prominent mentions of HAWQ in any text (i.e. not inside graphics). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HAWQ-1554) Add registered trademark symbol (®) to website.
[ https://issues.apache.org/jira/browse/HAWQ-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16266497#comment-16266497 ] ASF GitHub Bot commented on HAWQ-1554: -- Github user radarwave commented on the issue: https://github.com/apache/incubator-hawq-site/pull/14 LGTM +1 > Add registered trademark symbol (®) to website. > --- > > Key: HAWQ-1554 > URL: https://issues.apache.org/jira/browse/HAWQ-1554 > Project: Apache HAWQ > Issue Type: Task > Components: Documentation >Reporter: Ed Espino >Assignee: Ed Espino > > Since HAWQ® is a registered trademark, we need to ensure we mark it > appropriately to ensure that we can preserve our rights. > This task tracks the update of the website's home page and download > page (or section of the website) to add the ® character after the first > and most prominent mentions of HAWQ in any text (i.e. not inside graphics). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HAWQ-1554) Add registered trademark symbol (®) to website.
[ https://issues.apache.org/jira/browse/HAWQ-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16266501#comment-16266501 ] ASF GitHub Bot commented on HAWQ-1554: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq-site/pull/14 > Add registered trademark symbol (®) to website. > --- > > Key: HAWQ-1554 > URL: https://issues.apache.org/jira/browse/HAWQ-1554 > Project: Apache HAWQ > Issue Type: Task > Components: Documentation >Reporter: Ed Espino >Assignee: Ed Espino > > Since HAWQ® is a registered trademark, we need to ensure we mark it > appropriately to ensure that we can preserve our rights. > This task tracks the update of the website's home page and download > page (or section of the website) to add the ® character after the first > and most prominent mentions of HAWQ in any text (i.e. not inside graphics). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HAWQ-1553) User who doesn't have home directory can not run hawq extract command
[ https://issues.apache.org/jira/browse/HAWQ-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276972#comment-16276972 ] ASF GitHub Bot commented on HAWQ-1553: -- GitHub user outofmem0ry opened a pull request: https://github.com/apache/incubator-hawq-docs/pull/134 HAWQ-1553 Add option to hawq extract to specify log directory You can merge this pull request into a Git repository by running: $ git pull https://github.com/outofmem0ry/incubator-hawq-docs document/HAWQ-1553 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-docs/pull/134.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #134 commit 7f89466967bfe3d4b3f8f348f83996f8048d5a33 Author: Shubham Sharma Date: 2017-12-04T15:31:33Z HAWQ-1553 Add option to hawq extract to specify log directory > User who doesn't have home directory can not run hawq extract command > - > > Key: HAWQ-1553 > URL: https://issues.apache.org/jira/browse/HAWQ-1553 > Project: Apache HAWQ > Issue Type: Bug > Components: Command Line Tools >Reporter: Shubham Sharma >Assignee: Radar Lei > > HAWQ extract stores information in hawqextract_MMDD.log under directory > ~/hawqAdminLogs, and a user who doesn't have it's own home directory > encounters failure when running hawq extract. > We can add -l option in order to set the target log directory for hawq > extract. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HAWQ-1368) normal user who doesn't have home directory may have problem when running hawq register
[ https://issues.apache.org/jira/browse/HAWQ-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276975#comment-16276975 ] ASF GitHub Bot commented on HAWQ-1368: -- GitHub user outofmem0ry opened a pull request: https://github.com/apache/incubator-hawq-docs/pull/135 HAWQ-1368 Add option to hawq register to specify log directory You can merge this pull request into a Git repository by running: $ git pull https://github.com/outofmem0ry/incubator-hawq-docs document/HAWQ-1368 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-docs/pull/135.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #135 commit 1ac0fb26d0c358876c3a17c678b761132ccc6941 Author: Shubham Sharma Date: 2017-12-04T15:42:15Z HAWQ-1368 Add option to hawq register to specify log directory > normal user who doesn't have home directory may have problem when running > hawq register > --- > > Key: HAWQ-1368 > URL: https://issues.apache.org/jira/browse/HAWQ-1368 > Project: Apache HAWQ > Issue Type: Bug > Components: Command Line Tools >Reporter: Lili Ma >Assignee: Radar Lei > Fix For: backlog > > > HAWQ register stores information in hawqregister_MMDD.log under directory > ~/hawqAdminLogs, and normal user who doesn't have own home directory may > encounter failure when running hawq regsiter. > We can add -l option in order to set the target log directory and file name > of hawq register. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HAWQ-1562) Incorrect path to default log directory in documentation
[ https://issues.apache.org/jira/browse/HAWQ-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16277017#comment-16277017 ] ASF GitHub Bot commented on HAWQ-1562: -- GitHub user outofmem0ry opened a pull request: https://github.com/apache/incubator-hawq-docs/pull/136 HAWQ-1562 Fixed incorrect path to default log directory You can merge this pull request into a Git repository by running: $ git pull https://github.com/outofmem0ry/incubator-hawq-docs document/HAWQ-1562 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-docs/pull/136.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #136 commit 9c05af6f19c755526679fcd16792d3f272745e85 Author: Shubham Sharma Date: 2017-12-04T16:07:51Z HAWQ-1562 Fixed incorrect path to default log directory > Incorrect path to default log directory in documentation > > > Key: HAWQ-1562 > URL: https://issues.apache.org/jira/browse/HAWQ-1562 > Project: Apache HAWQ > Issue Type: Bug > Components: Documentation >Reporter: Shubham Sharma >Assignee: David Yozie > > In the current documentation six files point to wrong location of default > directories. The default log directory of management utilities is > ~/hawqAdminLogs but the documentation specifies ~/hawq/Adminlogs/ . There > list can be seen > [here|https://github.com/apache/incubator-hawq-docs/search?utf8=%E2%9C%93&q=Adminlogs&type=] > . -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HAWQ-1553) User who doesn't have home directory can not run hawq extract command
[ https://issues.apache.org/jira/browse/HAWQ-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16341820#comment-16341820 ] ASF GitHub Bot commented on HAWQ-1553: -- Github user lisakowen commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/134#discussion_r164250622 --- Diff: markdown/reference/cli/admin_utilities/hawqextract.html.md.erb --- @@ -73,6 +73,9 @@ where: -\\\-version Displays the version of this utility. +-l, -\\\-logdir \ +Specifies the log directory that `hawq extract` uses for logs. The default is `~/hawqAdminLogs/`. --- End diff -- how about "Specifies the directory that `hawq extract` uses for log files."? > User who doesn't have home directory can not run hawq extract command > - > > Key: HAWQ-1553 > URL: https://issues.apache.org/jira/browse/HAWQ-1553 > Project: Apache HAWQ > Issue Type: Bug > Components: Command Line Tools >Reporter: Shubham Sharma >Assignee: Radar Lei >Priority: Major > Fix For: 2.3.0.0-incubating > > > HAWQ extract stores information in hawqextract_MMDD.log under directory > ~/hawqAdminLogs, and a user who doesn't have it's own home directory > encounters failure when running hawq extract. > We can add -l option in order to set the target log directory for hawq > extract. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HAWQ-1368) normal user who doesn't have home directory may have problem when running hawq register
[ https://issues.apache.org/jira/browse/HAWQ-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16341824#comment-16341824 ] ASF GitHub Bot commented on HAWQ-1368: -- Github user lisakowen commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/135#discussion_r164250784 --- Diff: markdown/reference/cli/admin_utilities/hawqregister.html.md.erb --- @@ -200,6 +200,8 @@ group { -\\\-version Show the version of this utility, then exit. +-l, -\\\-logdir \ +Specifies the log directory that `hawq register` uses for logs. The default is `~/hawqAdminLogs/`. --- End diff -- same comment as hawq extract ... how about "Specifies the directory that `hawq register` uses for log files."? > normal user who doesn't have home directory may have problem when running > hawq register > --- > > Key: HAWQ-1368 > URL: https://issues.apache.org/jira/browse/HAWQ-1368 > Project: Apache HAWQ > Issue Type: Bug > Components: Command Line Tools >Reporter: Lili Ma >Assignee: Radar Lei >Priority: Major > Fix For: backlog > > > HAWQ register stores information in hawqregister_MMDD.log under directory > ~/hawqAdminLogs, and normal user who doesn't have own home directory may > encounter failure when running hawq regsiter. > We can add -l option in order to set the target log directory and file name > of hawq register. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HAWQ-1562) Incorrect path to default log directory in documentation
[ https://issues.apache.org/jira/browse/HAWQ-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354164#comment-16354164 ] ASF GitHub Bot commented on HAWQ-1562: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq-docs/pull/136 > Incorrect path to default log directory in documentation > > > Key: HAWQ-1562 > URL: https://issues.apache.org/jira/browse/HAWQ-1562 > Project: Apache HAWQ > Issue Type: Bug > Components: Documentation >Reporter: Shubham Sharma >Assignee: David Yozie >Priority: Major > > In the current documentation six files point to wrong location of default > directories. The default log directory of management utilities is > ~/hawqAdminLogs but the documentation specifies ~/hawq/Adminlogs/ . There > list can be seen > [here|https://github.com/apache/incubator-hawq-docs/search?utf8=%E2%9C%93&q=Adminlogs&type=] > . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HAWQ-1638) Issues with website
[ https://issues.apache.org/jira/browse/HAWQ-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16536742#comment-16536742 ] ASF GitHub Bot commented on HAWQ-1638: -- GitHub user radarwave opened a pull request: https://github.com/apache/incubator-hawq-site/pull/16 HAWQ-1638. Correct typo and use full name for ASF products Correct typo for madlib and use full name for ASF products with the first/main references. You can merge this pull request into a Git repository by running: $ git pull https://github.com/radarwave/incubator-hawq-site HAWQ-1638 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-site/pull/16.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16 commit c6a20e535d9a36d39977269b00b2e6cea05687f8 Author: rlei Date: 2018-07-09T09:52:04Z HAWQ-1638. Correct typo and use full name for ASF products with the first/main references. > Issues with website > --- > > Key: HAWQ-1638 > URL: https://issues.apache.org/jira/browse/HAWQ-1638 > Project: Apache HAWQ > Issue Type: Bug >Reporter: Sebb >Assignee: Radar Lei >Priority: Major > > The HAWQ page looks nice, however there are a few problems with it. > The phrase > "Plus, HAWQ® works Apache MADlib) machine learning libraries" > does not read well. Something missing? > The first/main references to ASF products such as Hadoop, YARN etc must use > the full name, i.e. Apache Hadoop etc. > The download section does not have any link to the KEYS file, nor any > instructions on how to use the KEYS+sig or hashes to validate downloads. > The download section still includes references to MD5 hashes. > These are deprecated and can be removed for older releases that have other > hashes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HAWQ-1638) Issues with website
[ https://issues.apache.org/jira/browse/HAWQ-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16536746#comment-16536746 ] ASF GitHub Bot commented on HAWQ-1638: -- Github user radarwave commented on the issue: https://github.com/apache/incubator-hawq-site/pull/16 Download links fix will be update later. @changleicn @edespino @jiny2 Would you help to review? Thanks. > Issues with website > --- > > Key: HAWQ-1638 > URL: https://issues.apache.org/jira/browse/HAWQ-1638 > Project: Apache HAWQ > Issue Type: Bug >Reporter: Sebb >Assignee: Radar Lei >Priority: Major > > The HAWQ page looks nice, however there are a few problems with it. > The phrase > "Plus, HAWQ® works Apache MADlib) machine learning libraries" > does not read well. Something missing? > The first/main references to ASF products such as Hadoop, YARN etc must use > the full name, i.e. Apache Hadoop etc. > The download section does not have any link to the KEYS file, nor any > instructions on how to use the KEYS+sig or hashes to validate downloads. > The download section still includes references to MD5 hashes. > These are deprecated and can be removed for older releases that have other > hashes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HAWQ-1638) Issues with website
[ https://issues.apache.org/jira/browse/HAWQ-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537839#comment-16537839 ] ASF GitHub Bot commented on HAWQ-1638: -- Github user changleicn commented on the issue: https://github.com/apache/incubator-hawq-site/pull/16 looks good! > Issues with website > --- > > Key: HAWQ-1638 > URL: https://issues.apache.org/jira/browse/HAWQ-1638 > Project: Apache HAWQ > Issue Type: Bug >Reporter: Sebb >Assignee: Radar Lei >Priority: Major > > The HAWQ page looks nice, however there are a few problems with it. > The phrase > "Plus, HAWQ® works Apache MADlib) machine learning libraries" > does not read well. Something missing? > The first/main references to ASF products such as Hadoop, YARN etc must use > the full name, i.e. Apache Hadoop etc. > The download section does not have any link to the KEYS file, nor any > instructions on how to use the KEYS+sig or hashes to validate downloads. > The download section still includes references to MD5 hashes. > These are deprecated and can be removed for older releases that have other > hashes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HAWQ-1638) Issues with website
[ https://issues.apache.org/jira/browse/HAWQ-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537890#comment-16537890 ] ASF GitHub Bot commented on HAWQ-1638: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq-site/pull/16 > Issues with website > --- > > Key: HAWQ-1638 > URL: https://issues.apache.org/jira/browse/HAWQ-1638 > Project: Apache HAWQ > Issue Type: Bug >Reporter: Sebb >Assignee: Radar Lei >Priority: Major > > The HAWQ page looks nice, however there are a few problems with it. > The phrase > "Plus, HAWQ® works Apache MADlib) machine learning libraries" > does not read well. Something missing? > The first/main references to ASF products such as Hadoop, YARN etc must use > the full name, i.e. Apache Hadoop etc. > The download section does not have any link to the KEYS file, nor any > instructions on how to use the KEYS+sig or hashes to validate downloads. > The download section still includes references to MD5 hashes. > These are deprecated and can be removed for older releases that have other > hashes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HAWQ-1638) Issues with website
[ https://issues.apache.org/jira/browse/HAWQ-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538353#comment-16538353 ] ASF GitHub Bot commented on HAWQ-1638: -- GitHub user radarwave opened a pull request: https://github.com/apache/incubator-hawq-site/pull/17 HAWQ-1638. Add how to verify downloaded files section, removed md5 keys. Added a section about how to verify downloaded files. Removed MD5 references. You can merge this pull request into a Git repository by running: $ git pull https://github.com/radarwave/incubator-hawq-site HAWQ-1638 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-site/pull/17.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17 commit fff3da1168b91ceec758dbfe6b1e0f8e6ae6fb4b Author: rlei Date: 2018-07-10T10:05:01Z HAWQ-1638. Add how to verify downloaded files section, removed md5 keys. > Issues with website > --- > > Key: HAWQ-1638 > URL: https://issues.apache.org/jira/browse/HAWQ-1638 > Project: Apache HAWQ > Issue Type: Bug >Reporter: Sebb >Assignee: Radar Lei >Priority: Major > > The HAWQ page looks nice, however there are a few problems with it. > The phrase > "Plus, HAWQ® works Apache MADlib) machine learning libraries" > does not read well. Something missing? > The first/main references to ASF products such as Hadoop, YARN etc must use > the full name, i.e. Apache Hadoop etc. > The download section does not have any link to the KEYS file, nor any > instructions on how to use the KEYS+sig or hashes to validate downloads. > The download section still includes references to MD5 hashes. > These are deprecated and can be removed for older releases that have other > hashes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HAWQ-1638) Issues with website
[ https://issues.apache.org/jira/browse/HAWQ-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538356#comment-16538356 ] ASF GitHub Bot commented on HAWQ-1638: -- Github user radarwave commented on the issue: https://github.com/apache/incubator-hawq-site/pull/17 @changleicn @edespino @jiny2 Please help to review, thanks. > Issues with website > --- > > Key: HAWQ-1638 > URL: https://issues.apache.org/jira/browse/HAWQ-1638 > Project: Apache HAWQ > Issue Type: Bug >Reporter: Sebb >Assignee: Radar Lei >Priority: Major > > The HAWQ page looks nice, however there are a few problems with it. > The phrase > "Plus, HAWQ® works Apache MADlib) machine learning libraries" > does not read well. Something missing? > The first/main references to ASF products such as Hadoop, YARN etc must use > the full name, i.e. Apache Hadoop etc. > The download section does not have any link to the KEYS file, nor any > instructions on how to use the KEYS+sig or hashes to validate downloads. > The download section still includes references to MD5 hashes. > These are deprecated and can be removed for older releases that have other > hashes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HAWQ-1638) Issues with website
[ https://issues.apache.org/jira/browse/HAWQ-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538359#comment-16538359 ] ASF GitHub Bot commented on HAWQ-1638: -- Github user radarwave commented on the issue: https://github.com/apache/incubator-hawq-site/pull/17 Add @dyozie > Issues with website > --- > > Key: HAWQ-1638 > URL: https://issues.apache.org/jira/browse/HAWQ-1638 > Project: Apache HAWQ > Issue Type: Bug >Reporter: Sebb >Assignee: Radar Lei >Priority: Major > > The HAWQ page looks nice, however there are a few problems with it. > The phrase > "Plus, HAWQ® works Apache MADlib) machine learning libraries" > does not read well. Something missing? > The first/main references to ASF products such as Hadoop, YARN etc must use > the full name, i.e. Apache Hadoop etc. > The download section does not have any link to the KEYS file, nor any > instructions on how to use the KEYS+sig or hashes to validate downloads. > The download section still includes references to MD5 hashes. > These are deprecated and can be removed for older releases that have other > hashes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HAWQ-1638) Issues with website
[ https://issues.apache.org/jira/browse/HAWQ-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539647#comment-16539647 ] ASF GitHub Bot commented on HAWQ-1638: -- Github user jiny2 commented on the issue: https://github.com/apache/incubator-hawq-site/pull/17 +1, LGTM. Thank you. > Issues with website > --- > > Key: HAWQ-1638 > URL: https://issues.apache.org/jira/browse/HAWQ-1638 > Project: Apache HAWQ > Issue Type: Bug >Reporter: Sebb >Assignee: Radar Lei >Priority: Major > > The HAWQ page looks nice, however there are a few problems with it. > The phrase > "Plus, HAWQ® works Apache MADlib) machine learning libraries" > does not read well. Something missing? > The first/main references to ASF products such as Hadoop, YARN etc must use > the full name, i.e. Apache Hadoop etc. > The download section does not have any link to the KEYS file, nor any > instructions on how to use the KEYS+sig or hashes to validate downloads. > The download section still includes references to MD5 hashes. > These are deprecated and can be removed for older releases that have other > hashes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HAWQ-1031) updates docs to reflect MASTER_DATA_DIRECTORY code changes
[ https://issues.apache.org/jira/browse/HAWQ-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15456493#comment-15456493 ] ASF GitHub Bot commented on HAWQ-1031: -- GitHub user lisakowen opened a pull request: https://github.com/apache/incubator-hawq-docs/pull/4 MASTER_DATA_DIRECTORY clarifications - HAWQ-1031 doc updates to remove references to MASTER_DATA_DIRECTORY. clarify when the hawq-site.xml hawq_master_directory config value should be used. should resolve HAWQ-1031 You can merge this pull request into a Git repository by running: $ git pull https://github.com/lisakowen/incubator-hawq-docs feature/mdatadir_chgs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-docs/pull/4.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4 commit 1a7efeffc1e1a5e010f7aa4cba4622f9da1357bb Author: Lisa Owen Date: 2016-08-29T20:23:01Z MASTER_DATA_DIRECTORY clarifications - HAWQ-1031 > updates docs to reflect MASTER_DATA_DIRECTORY code changes > -- > > Key: HAWQ-1031 > URL: https://issues.apache.org/jira/browse/HAWQ-1031 > Project: Apache HAWQ > Issue Type: Bug > Components: Documentation >Affects Versions: 2.0.1.0-incubating >Reporter: Lisa Owen >Assignee: Lei Chang >Priority: Minor > Fix For: 2.0.1.0-incubating > > > MASTER_DATA_DIRECTORY is no longer a required environment variable for any > HAWQ command. remove references in the docs, and point users to the > hawq-site.xml hawq_master_directory property value when required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1031) updates docs to reflect MASTER_DATA_DIRECTORY code changes
[ https://issues.apache.org/jira/browse/HAWQ-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15456599#comment-15456599 ] ASF GitHub Bot commented on HAWQ-1031: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq-docs/pull/4 > updates docs to reflect MASTER_DATA_DIRECTORY code changes > -- > > Key: HAWQ-1031 > URL: https://issues.apache.org/jira/browse/HAWQ-1031 > Project: Apache HAWQ > Issue Type: Bug > Components: Documentation >Affects Versions: 2.0.1.0-incubating >Reporter: Lisa Owen >Assignee: Lei Chang >Priority: Minor > Fix For: 2.0.1.0-incubating > > > MASTER_DATA_DIRECTORY is no longer a required environment variable for any > HAWQ command. remove references in the docs, and point users to the > hawq-site.xml hawq_master_directory property value when required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1056) "hawq check" help output and documentation updates needed
[ https://issues.apache.org/jira/browse/HAWQ-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15494197#comment-15494197 ] ASF GitHub Bot commented on HAWQ-1056: -- GitHub user lisakowen opened a pull request: https://github.com/apache/incubator-hawq-docs/pull/9 Feature/hawqcheck hadoopopt some cleanup to documentation for "hawq check" command. fixes the documentation part of HAWQ-1056. - add -h, --host option - clarify value of --hadoop, --hadoop-home option value should be the full install path to hadoop - modify the examples to use relevant values for hadoop_home You can merge this pull request into a Git repository by running: $ git pull https://github.com/lisakowen/incubator-hawq-docs feature/hawqcheck-hadoopopt Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-docs/pull/9.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9 commit 0f642f1ce67bd570d2043174bbdaed990c7840bb Author: Lisa Owen Date: 2016-09-14T21:11:38Z clarify use of hawq check --hadoop option commit 6704cc0b7a358fafba08b3fd66a5a12b5bb97f85 Author: Lisa Owen Date: 2016-09-14T21:21:07Z hawq check --hadoop option - misc cleanup commit 016630163015e782ef630338998ef1696f5f005e Author: Lisa Owen Date: 2016-09-15T17:52:32Z hawq check - add h/host option, cleanup commit 4a617974cf04d1b1758bdfbc490116b60bdefb79 Author: Lisa Owen Date: 2016-09-15T18:38:36Z hawq check - hadoop home is optional > "hawq check" help output and documentation updates needed > - > > Key: HAWQ-1056 > URL: https://issues.apache.org/jira/browse/HAWQ-1056 > Project: Apache HAWQ > Issue Type: Bug > Components: Command Line Tools, Documentation >Reporter: Lisa Owen >Assignee: David Yozie > Fix For: 2.0.1.0-incubating > > > help output and reference documentation for "hawq check" --hadoop option is > not clear. specifically, this option should identify the full path to the > hadoop installation. > additionally, the [-h | --host ] option appears to be missing in > both areas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1056) "hawq check" help output and documentation updates needed
[ https://issues.apache.org/jira/browse/HAWQ-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15508041#comment-15508041 ] ASF GitHub Bot commented on HAWQ-1056: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq-docs/pull/9 > "hawq check" help output and documentation updates needed > - > > Key: HAWQ-1056 > URL: https://issues.apache.org/jira/browse/HAWQ-1056 > Project: Apache HAWQ > Issue Type: Bug > Components: Command Line Tools, Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > help output and reference documentation for "hawq check" --hadoop option is > not clear. specifically, this option should identify the full path to the > hadoop installation. > additionally, the [-h | --host ] option appears to be missing in > both areas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation
[ https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586004#comment-15586004 ] ASF GitHub Bot commented on HAWQ-1095: -- GitHub user lisakowen opened a pull request: https://github.com/apache/incubator-hawq-docs/pull/23 HAWQ-1095 - enhance database api docs add content for jdbc, odbc, libpq You can merge this pull request into a Git repository by running: $ git pull https://github.com/lisakowen/incubator-hawq-docs feature/dbapiinfo Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-docs/pull/23.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23 commit 2c0f4b19bb2baef545467c9d39f097344c6358b2 Author: Lisa Owen Date: 2016-10-04T19:25:29Z restructure db API section; add libpq and links to driver and api docs commit f066326f0241050a22a8b592fcaae3aab037c504 Author: Lisa Owen Date: 2016-10-04T20:36:41Z clarify some statements commit fbb0571df9cdb1ba05a2ba970b560cb6388b72eb Author: Lisa Owen Date: 2016-10-04T23:11:07Z hawq supports datadirect drivers commit df2aaed3aab20b9d0fffa0c62df8a23c33864065 Author: Lisa Owen Date: 2016-10-04T23:26:02Z update driver names commit 245633e69bd0017f43a5cc20e82c9a5fc23b4079 Author: Lisa Owen Date: 2016-10-05T21:56:36Z provide locations of libpq lib and include file commit 57d76d2b86014f772754ca70cab95e4c337a71a2 Author: Lisa Owen Date: 2016-10-07T16:02:22Z add jdbc connection string and example commit 70e45af7d24a6699840eec176603b4b835121bef Author: Lisa Owen Date: 2016-10-07T23:48:39Z flesh out jdbc section; add connection URL specs commit 3288da3e8ce51482e1d6e6913a237cbf5fc0bc8e Author: Lisa Owen Date: 2016-10-10T19:08:48Z db drivers and apis - flesh out odbc section > enhance database driver and API documentation > - > > Key: HAWQ-1095 > URL: https://issues.apache.org/jira/browse/HAWQ-1095 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > docs contain very brief references to JDBC/ODBC and none at all to libpq. > add more content in these areas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)
[ https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586041#comment-15586041 ] ASF GitHub Bot commented on HAWQ-1096: -- GitHub user lisakowen opened a pull request: https://github.com/apache/incubator-hawq-docs/pull/25 HAWQ-1096 - add content for hawq built-in languages add content for sql, c, and internal hawq built in languages You can merge this pull request into a Git repository by running: $ git pull https://github.com/lisakowen/incubator-hawq-docs feature/builtin-langs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-docs/pull/25.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #25 commit 504c662be21dc344a161b81a9c627a8f6d7861cd Author: Lisa Owen Date: 2016-10-05T21:33:36Z add file discussing hawq built-in languages commit 8e27e9093f1d27277d676386144ee895ad004f86 Author: Lisa Owen Date: 2016-10-05T21:34:36Z include built-in languages in PL lang landing page commit bd85fdbc31cb463855c2606fde48d803dccb3de2 Author: Lisa Owen Date: 2016-10-05T21:47:11Z c user-defined function example - add _c to function name to avoid confusion commit 1332870d01d2f8da2f8284ac167253d7005c6dfd Author: Lisa Owen Date: 2016-10-10T22:24:20Z builtin langs - clarify and add some links > document the HAWQ built-in languages (SQL, C, internal) > --- > > Key: HAWQ-1096 > URL: https://issues.apache.org/jira/browse/HAWQ-1096 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > > the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, > C and internal. add content to introduce these languages with relevant > examples and links. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)
[ https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586400#comment-15586400 ] ASF GitHub Bot commented on HAWQ-1096: -- GitHub user lisakowen opened a pull request: https://github.com/apache/incubator-hawq-docs/pull/27 HAWQ-1096 - add subnav entry for built-in languages add subnav for new topic You can merge this pull request into a Git repository by running: $ git pull https://github.com/lisakowen/incubator-hawq-docs feature/subnav-builtin-langs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-docs/pull/27.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #27 > document the HAWQ built-in languages (SQL, C, internal) > --- > > Key: HAWQ-1096 > URL: https://issues.apache.org/jira/browse/HAWQ-1096 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > > the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, > C and internal. add content to introduce these languages with relevant > examples and links. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation
[ https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587080#comment-15587080 ] ASF GitHub Bot commented on HAWQ-1095: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r83974977 --- Diff: clientaccess/g-database-application-interfaces.html.md.erb --- @@ -1,8 +1,96 @@ --- -title: ODBC/JDBC Application Interfaces +title: HAWQ Database Drivers and APIs --- +You may want to connect your existing Business Intelligence (BI) or Analytics applications with HAWQ. The database application programming interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs. -You may want to deploy your existing Business Intelligence (BI) or Analytics applications with HAWQ. The most commonly used database application programming interfaces with HAWQ are the ODBC and JDBC APIs. +HAWQ provides the following connectivity tools for connecting to the database: + + - ODBC driver + - JDBC driver + - `libpq` - PostgreSQL C API + +## HAWQ Drivers + +ODBC and JDBC drivers for HAWQ are available as a separate download from Pivotal Network [Pivotal Network](https://network.pivotal.io/products/pivotal-hdb). + +### ODBC Driver + +The ODBC API specifies a standard set of C interfaces for accessing database management systems. For additional information on using the ODBC API, refer to the [ODBC Programmer's Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) documentation. + +HAWQ supports the DataDirect ODBC Driver. Installation instructions for this driver are provided on the Pivotal Network driver download page. Refer to [HAWQ ODBC Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23) for HAWQ-specific ODBC driver information. + + Connection Data Source +The information required by the HAWQ ODBC driver to connect to a database is typically stored in a named data source. Depending on your platform, you may use [GUI](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_through_a_GUI_14.html%23) or [command line](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_in_the_UNIX_2fLinux_odbc_13.html%23) tools to create your data source definition. On Linux, ODBC data sources are typically defined in a file named `odbc.ini`. + +Commonly-specified HAWQ ODBC data source connection properties include: + +| Property Name| Value Description | +|---|-| +| Database | name of the database to which you want to connect | +| Driver | full path to the ODBC driver library file | +| HostName | HAWQ master host name | +| MaxLongVarcharSize | maximum size of columns of type long varchar | +| Password | password used to connect to the specified database | +| PortNumber | HAWQ master database port number | + +Refer to [Connection Option Descriptions](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fgreenplum-connection-option-descriptions.html%23) for a list of ODBC connection properties supported by the HAWQ DataDirect ODBC driver. + +Example HAWQ DataDirect ODBC driver data source definition: + +``` shell +[HAWQ-201] +Driver=/usr/local/hawq_drivers/odbc/lib/ddgplm27.so +Description=DataDirect 7.1 Greenplum Wire Protocol - for HAWQ +Database=getstartdb +HostName=hdm1 +PortNumber=5432 +Password=changeme +MaxLongVarcharSize=8192 +``` + +The first line, `[HAWQ-201]`, identifies the name of the data source. + +ODBC connection properties may also be specified in a connection string identifying ei
[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation
[ https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587079#comment-15587079 ] ASF GitHub Bot commented on HAWQ-1095: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r83974918 --- Diff: clientaccess/g-database-application-interfaces.html.md.erb --- @@ -1,8 +1,96 @@ --- -title: ODBC/JDBC Application Interfaces +title: HAWQ Database Drivers and APIs --- +You may want to connect your existing Business Intelligence (BI) or Analytics applications with HAWQ. The database application programming interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs. -You may want to deploy your existing Business Intelligence (BI) or Analytics applications with HAWQ. The most commonly used database application programming interfaces with HAWQ are the ODBC and JDBC APIs. +HAWQ provides the following connectivity tools for connecting to the database: + + - ODBC driver + - JDBC driver + - `libpq` - PostgreSQL C API + +## HAWQ Drivers + +ODBC and JDBC drivers for HAWQ are available as a separate download from Pivotal Network [Pivotal Network](https://network.pivotal.io/products/pivotal-hdb). + +### ODBC Driver + +The ODBC API specifies a standard set of C interfaces for accessing database management systems. For additional information on using the ODBC API, refer to the [ODBC Programmer's Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) documentation. + +HAWQ supports the DataDirect ODBC Driver. Installation instructions for this driver are provided on the Pivotal Network driver download page. Refer to [HAWQ ODBC Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23) for HAWQ-specific ODBC driver information. + + Connection Data Source +The information required by the HAWQ ODBC driver to connect to a database is typically stored in a named data source. Depending on your platform, you may use [GUI](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_through_a_GUI_14.html%23) or [command line](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_in_the_UNIX_2fLinux_odbc_13.html%23) tools to create your data source definition. On Linux, ODBC data sources are typically defined in a file named `odbc.ini`. + +Commonly-specified HAWQ ODBC data source connection properties include: + +| Property Name| Value Description | +|---|-| +| Database | name of the database to which you want to connect | +| Driver | full path to the ODBC driver library file | +| HostName | HAWQ master host name | +| MaxLongVarcharSize | maximum size of columns of type long varchar | +| Password | password used to connect to the specified database | +| PortNumber | HAWQ master database port number | + +Refer to [Connection Option Descriptions](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fgreenplum-connection-option-descriptions.html%23) for a list of ODBC connection properties supported by the HAWQ DataDirect ODBC driver. + +Example HAWQ DataDirect ODBC driver data source definition: + +``` shell +[HAWQ-201] +Driver=/usr/local/hawq_drivers/odbc/lib/ddgplm27.so +Description=DataDirect 7.1 Greenplum Wire Protocol - for HAWQ +Database=getstartdb +HostName=hdm1 +PortNumber=5432 +Password=changeme +MaxLongVarcharSize=8192 +``` + +The first line, `[HAWQ-201]`, identifies the name of the data source. + +ODBC connection properties may also be specified in a connection string identifying ei
[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation
[ https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587077#comment-15587077 ] ASF GitHub Bot commented on HAWQ-1095: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r83974424 --- Diff: clientaccess/g-database-application-interfaces.html.md.erb --- @@ -1,8 +1,96 @@ --- -title: ODBC/JDBC Application Interfaces +title: HAWQ Database Drivers and APIs --- +You may want to connect your existing Business Intelligence (BI) or Analytics applications with HAWQ. The database application programming interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs. -You may want to deploy your existing Business Intelligence (BI) or Analytics applications with HAWQ. The most commonly used database application programming interfaces with HAWQ are the ODBC and JDBC APIs. +HAWQ provides the following connectivity tools for connecting to the database: + + - ODBC driver + - JDBC driver + - `libpq` - PostgreSQL C API + +## HAWQ Drivers + +ODBC and JDBC drivers for HAWQ are available as a separate download from Pivotal Network [Pivotal Network](https://network.pivotal.io/products/pivotal-hdb). + +### ODBC Driver + +The ODBC API specifies a standard set of C interfaces for accessing database management systems. For additional information on using the ODBC API, refer to the [ODBC Programmer's Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) documentation. + +HAWQ supports the DataDirect ODBC Driver. Installation instructions for this driver are provided on the Pivotal Network driver download page. Refer to [HAWQ ODBC Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23) for HAWQ-specific ODBC driver information. + + Connection Data Source +The information required by the HAWQ ODBC driver to connect to a database is typically stored in a named data source. Depending on your platform, you may use [GUI](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_through_a_GUI_14.html%23) or [command line](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_in_the_UNIX_2fLinux_odbc_13.html%23) tools to create your data source definition. On Linux, ODBC data sources are typically defined in a file named `odbc.ini`. + +Commonly-specified HAWQ ODBC data source connection properties include: + +| Property Name| Value Description | +|---|-| +| Database | name of the database to which you want to connect | +| Driver | full path to the ODBC driver library file | +| HostName | HAWQ master host name | +| MaxLongVarcharSize | maximum size of columns of type long varchar | +| Password | password used to connect to the specified database | +| PortNumber | HAWQ master database port number | --- End diff -- Let's initial-capitalize the second column. > enhance database driver and API documentation > - > > Key: HAWQ-1095 > URL: https://issues.apache.org/jira/browse/HAWQ-1095 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > docs contain very brief references to JDBC/ODBC and none at all to libpq. > add more content in these areas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation
[ https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587081#comment-15587081 ] ASF GitHub Bot commented on HAWQ-1095: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r83974367 --- Diff: clientaccess/g-database-application-interfaces.html.md.erb --- @@ -1,8 +1,96 @@ --- -title: ODBC/JDBC Application Interfaces +title: HAWQ Database Drivers and APIs --- +You may want to connect your existing Business Intelligence (BI) or Analytics applications with HAWQ. The database application programming interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs. -You may want to deploy your existing Business Intelligence (BI) or Analytics applications with HAWQ. The most commonly used database application programming interfaces with HAWQ are the ODBC and JDBC APIs. +HAWQ provides the following connectivity tools for connecting to the database: + + - ODBC driver + - JDBC driver + - `libpq` - PostgreSQL C API + +## HAWQ Drivers + +ODBC and JDBC drivers for HAWQ are available as a separate download from Pivotal Network [Pivotal Network](https://network.pivotal.io/products/pivotal-hdb). + +### ODBC Driver + +The ODBC API specifies a standard set of C interfaces for accessing database management systems. For additional information on using the ODBC API, refer to the [ODBC Programmer's Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) documentation. + +HAWQ supports the DataDirect ODBC Driver. Installation instructions for this driver are provided on the Pivotal Network driver download page. Refer to [HAWQ ODBC Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23) for HAWQ-specific ODBC driver information. --- End diff -- Are you sure the datadirect link contains the same info available in the HAWQ ODBC download? > enhance database driver and API documentation > - > > Key: HAWQ-1095 > URL: https://issues.apache.org/jira/browse/HAWQ-1095 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > docs contain very brief references to JDBC/ODBC and none at all to libpq. > add more content in these areas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation
[ https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587078#comment-15587078 ] ASF GitHub Bot commented on HAWQ-1095: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r83974668 --- Diff: clientaccess/g-database-application-interfaces.html.md.erb --- @@ -1,8 +1,96 @@ --- -title: ODBC/JDBC Application Interfaces +title: HAWQ Database Drivers and APIs --- +You may want to connect your existing Business Intelligence (BI) or Analytics applications with HAWQ. The database application programming interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs. -You may want to deploy your existing Business Intelligence (BI) or Analytics applications with HAWQ. The most commonly used database application programming interfaces with HAWQ are the ODBC and JDBC APIs. +HAWQ provides the following connectivity tools for connecting to the database: + + - ODBC driver + - JDBC driver + - `libpq` - PostgreSQL C API + +## HAWQ Drivers + +ODBC and JDBC drivers for HAWQ are available as a separate download from Pivotal Network [Pivotal Network](https://network.pivotal.io/products/pivotal-hdb). + +### ODBC Driver + +The ODBC API specifies a standard set of C interfaces for accessing database management systems. For additional information on using the ODBC API, refer to the [ODBC Programmer's Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) documentation. + +HAWQ supports the DataDirect ODBC Driver. Installation instructions for this driver are provided on the Pivotal Network driver download page. Refer to [HAWQ ODBC Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23) for HAWQ-specific ODBC driver information. + + Connection Data Source +The information required by the HAWQ ODBC driver to connect to a database is typically stored in a named data source. Depending on your platform, you may use [GUI](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_through_a_GUI_14.html%23) or [command line](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_in_the_UNIX_2fLinux_odbc_13.html%23) tools to create your data source definition. On Linux, ODBC data sources are typically defined in a file named `odbc.ini`. + +Commonly-specified HAWQ ODBC data source connection properties include: + +| Property Name| Value Description | +|---|-| +| Database | name of the database to which you want to connect | +| Driver | full path to the ODBC driver library file | +| HostName | HAWQ master host name | +| MaxLongVarcharSize | maximum size of columns of type long varchar | +| Password | password used to connect to the specified database | +| PortNumber | HAWQ master database port number | + +Refer to [Connection Option Descriptions](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fgreenplum-connection-option-descriptions.html%23) for a list of ODBC connection properties supported by the HAWQ DataDirect ODBC driver. + +Example HAWQ DataDirect ODBC driver data source definition: + +``` shell +[HAWQ-201] +Driver=/usr/local/hawq_drivers/odbc/lib/ddgplm27.so +Description=DataDirect 7.1 Greenplum Wire Protocol - for HAWQ +Database=getstartdb +HostName=hdm1 +PortNumber=5432 +Password=changeme +MaxLongVarcharSize=8192 +``` + +The first line, `[HAWQ-201]`, identifies the name of the data source. + +ODBC connection properties may also be specified in a connection string identifying ei
[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation
[ https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587103#comment-15587103 ] ASF GitHub Bot commented on HAWQ-1095: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq-docs/pull/26 > enhance database driver and API documentation > - > > Key: HAWQ-1095 > URL: https://issues.apache.org/jira/browse/HAWQ-1095 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > docs contain very brief references to JDBC/ODBC and none at all to libpq. > add more content in these areas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation
[ https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587108#comment-15587108 ] ASF GitHub Bot commented on HAWQ-1095: -- Github user lisakowen commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r83976521 --- Diff: clientaccess/g-database-application-interfaces.html.md.erb --- @@ -1,8 +1,96 @@ --- -title: ODBC/JDBC Application Interfaces +title: HAWQ Database Drivers and APIs --- +You may want to connect your existing Business Intelligence (BI) or Analytics applications with HAWQ. The database application programming interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs. -You may want to deploy your existing Business Intelligence (BI) or Analytics applications with HAWQ. The most commonly used database application programming interfaces with HAWQ are the ODBC and JDBC APIs. +HAWQ provides the following connectivity tools for connecting to the database: + + - ODBC driver + - JDBC driver + - `libpq` - PostgreSQL C API + +## HAWQ Drivers + +ODBC and JDBC drivers for HAWQ are available as a separate download from Pivotal Network [Pivotal Network](https://network.pivotal.io/products/pivotal-hdb). + +### ODBC Driver + +The ODBC API specifies a standard set of C interfaces for accessing database management systems. For additional information on using the ODBC API, refer to the [ODBC Programmer's Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) documentation. + +HAWQ supports the DataDirect ODBC Driver. Installation instructions for this driver are provided on the Pivotal Network driver download page. Refer to [HAWQ ODBC Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23) for HAWQ-specific ODBC driver information. + + Connection Data Source +The information required by the HAWQ ODBC driver to connect to a database is typically stored in a named data source. Depending on your platform, you may use [GUI](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_through_a_GUI_14.html%23) or [command line](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_in_the_UNIX_2fLinux_odbc_13.html%23) tools to create your data source definition. On Linux, ODBC data sources are typically defined in a file named `odbc.ini`. + +Commonly-specified HAWQ ODBC data source connection properties include: + +| Property Name| Value Description | +|---|-| +| Database | name of the database to which you want to connect | +| Driver | full path to the ODBC driver library file | +| HostName | HAWQ master host name | +| MaxLongVarcharSize | maximum size of columns of type long varchar | +| Password | password used to connect to the specified database | +| PortNumber | HAWQ master database port number | + +Refer to [Connection Option Descriptions](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fgreenplum-connection-option-descriptions.html%23) for a list of ODBC connection properties supported by the HAWQ DataDirect ODBC driver. + +Example HAWQ DataDirect ODBC driver data source definition: + +``` shell +[HAWQ-201] +Driver=/usr/local/hawq_drivers/odbc/lib/ddgplm27.so +Description=DataDirect 7.1 Greenplum Wire Protocol - for HAWQ +Database=getstartdb +HostName=hdm1 +PortNumber=5432 +Password=changeme +MaxLongVarcharSize=8192 +``` + +The first line, `[HAWQ-201]`, identifies the name of the data source. + +ODBC connection properties may also be specified in a connection string identifying
[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation
[ https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587132#comment-15587132 ] ASF GitHub Bot commented on HAWQ-1095: -- Github user lisakowen commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r83977371 --- Diff: clientaccess/g-database-application-interfaces.html.md.erb --- @@ -1,8 +1,96 @@ --- -title: ODBC/JDBC Application Interfaces +title: HAWQ Database Drivers and APIs --- +You may want to connect your existing Business Intelligence (BI) or Analytics applications with HAWQ. The database application programming interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs. -You may want to deploy your existing Business Intelligence (BI) or Analytics applications with HAWQ. The most commonly used database application programming interfaces with HAWQ are the ODBC and JDBC APIs. +HAWQ provides the following connectivity tools for connecting to the database: + + - ODBC driver + - JDBC driver + - `libpq` - PostgreSQL C API + +## HAWQ Drivers + +ODBC and JDBC drivers for HAWQ are available as a separate download from Pivotal Network [Pivotal Network](https://network.pivotal.io/products/pivotal-hdb). + +### ODBC Driver + +The ODBC API specifies a standard set of C interfaces for accessing database management systems. For additional information on using the ODBC API, refer to the [ODBC Programmer's Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) documentation. + +HAWQ supports the DataDirect ODBC Driver. Installation instructions for this driver are provided on the Pivotal Network driver download page. Refer to [HAWQ ODBC Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23) for HAWQ-specific ODBC driver information. --- End diff -- users will download the readme from pivnet. the link at the end of the readme points to a datadirect page from which one could navigate to the links i have included. i don't see any other docs when i untar the download package. > enhance database driver and API documentation > - > > Key: HAWQ-1095 > URL: https://issues.apache.org/jira/browse/HAWQ-1095 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > docs contain very brief references to JDBC/ODBC and none at all to libpq. > add more content in these areas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)
[ https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587173#comment-15587173 ] ASF GitHub Bot commented on HAWQ-1096: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83978549 --- Diff: plext/builtin_langs.html.md.erb --- @@ -0,0 +1,110 @@ +--- +title: Using HAWQ Built-In Languages +--- + +This section provides an introduction to using the HAWQ built-in languages. + +HAWQ supports user-defined functions created with the SQL and C built-in languages. HAWQ also supports user-defined aliases for internal functions. + + +## Enabling Built-in Language Support + +Support for SQL, internal, and C language user-defined functions is enabled by default for all HAWQ databases. + +## SQL + +SQL functions execute an arbitrary list of SQL statements. The SQL statements in the body of an SQL function must be separated by semicolons. The final statement in a non-void-returning SQL function must be a [SELECT](../reference/sql/SELECT.html) that returns data of the type specified by the function's return type. The function will return a single or set of rows corresponding to this last SQL query. + +The following example creates and calls an SQL function to count the number of rows of the database named `orders`: + +``` sql +gpadmin=# CREATE FUNCTION count_orders() RETURNS bigint AS $$ + SELECT count(*) FROM orders; +$$ LANGUAGE SQL; +CREATE FUNCTION +gpadmin=# select count_orders(); + my_count +-- + 830513 +(1 row) +``` + +For additional information on creating SQL functions, refer to [Query Language (SQL) Functions](https://www.postgresql.org/docs/8.2/static/xfunc-sql.html) in the PostgreSQL documentation. + +## Internal + +Many HAWQ internal functions are written in C. These functions are declared during initialization of the database cluster and statically linked to the HAWQ server. See [Built-in Functions and Operators](../query/functions-operators.html#topic29) for detailed information on HAWQ internal functions. + +While users cannot define new internal functions, they can create aliases for existing internal functions. + +The following example creates a new function named `all_caps` that will be defined as an alias for the `upper` HAWQ internal function: --- End diff -- Edit: change "that will be defined as an" to "that is an" > document the HAWQ built-in languages (SQL, C, internal) > --- > > Key: HAWQ-1096 > URL: https://issues.apache.org/jira/browse/HAWQ-1096 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > > the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, > C and internal. add content to introduce these languages with relevant > examples and links. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)
[ https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587171#comment-15587171 ] ASF GitHub Bot commented on HAWQ-1096: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83978465 --- Diff: plext/builtin_langs.html.md.erb --- @@ -0,0 +1,110 @@ +--- +title: Using HAWQ Built-In Languages +--- + +This section provides an introduction to using the HAWQ built-in languages. + +HAWQ supports user-defined functions created with the SQL and C built-in languages. HAWQ also supports user-defined aliases for internal functions. + + +## Enabling Built-in Language Support + +Support for SQL, internal, and C language user-defined functions is enabled by default for all HAWQ databases. + +## SQL + +SQL functions execute an arbitrary list of SQL statements. The SQL statements in the body of an SQL function must be separated by semicolons. The final statement in a non-void-returning SQL function must be a [SELECT](../reference/sql/SELECT.html) that returns data of the type specified by the function's return type. The function will return a single or set of rows corresponding to this last SQL query. + +The following example creates and calls an SQL function to count the number of rows of the database named `orders`: + +``` sql +gpadmin=# CREATE FUNCTION count_orders() RETURNS bigint AS $$ + SELECT count(*) FROM orders; +$$ LANGUAGE SQL; +CREATE FUNCTION +gpadmin=# select count_orders(); + my_count +-- + 830513 +(1 row) +``` + +For additional information on creating SQL functions, refer to [Query Language (SQL) Functions](https://www.postgresql.org/docs/8.2/static/xfunc-sql.html) in the PostgreSQL documentation. + +## Internal + +Many HAWQ internal functions are written in C. These functions are declared during initialization of the database cluster and statically linked to the HAWQ server. See [Built-in Functions and Operators](../query/functions-operators.html#topic29) for detailed information on HAWQ internal functions. + +While users cannot define new internal functions, they can create aliases for existing internal functions. --- End diff -- Reword: **You** cannot define new internal functions, **but you** can create... > document the HAWQ built-in languages (SQL, C, internal) > --- > > Key: HAWQ-1096 > URL: https://issues.apache.org/jira/browse/HAWQ-1096 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > > the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, > C and internal. add content to introduce these languages with relevant > examples and links. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)
[ https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587175#comment-15587175 ] ASF GitHub Bot commented on HAWQ-1096: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83978153 --- Diff: plext/builtin_langs.html.md.erb --- @@ -0,0 +1,110 @@ +--- +title: Using HAWQ Built-In Languages +--- + +This section provides an introduction to using the HAWQ built-in languages. + +HAWQ supports user-defined functions created with the SQL and C built-in languages. HAWQ also supports user-defined aliases for internal functions. + + +## Enabling Built-in Language Support + +Support for SQL, internal, and C language user-defined functions is enabled by default for all HAWQ databases. + +## SQL + +SQL functions execute an arbitrary list of SQL statements. The SQL statements in the body of an SQL function must be separated by semicolons. The final statement in a non-void-returning SQL function must be a [SELECT](../reference/sql/SELECT.html) that returns data of the type specified by the function's return type. The function will return a single or set of rows corresponding to this last SQL query. + +The following example creates and calls an SQL function to count the number of rows of the database named `orders`: + +``` sql +gpadmin=# CREATE FUNCTION count_orders() RETURNS bigint AS $$ + SELECT count(*) FROM orders; +$$ LANGUAGE SQL; +CREATE FUNCTION +gpadmin=# select count_orders(); + my_count +-- + 830513 +(1 row) +``` + +For additional information on creating SQL functions, refer to [Query Language (SQL) Functions](https://www.postgresql.org/docs/8.2/static/xfunc-sql.html) in the PostgreSQL documentation. + +## Internal + +Many HAWQ internal functions are written in C. These functions are declared during initialization of the database cluster and statically linked to the HAWQ server. See [Built-in Functions and Operators](../query/functions-operators.html#topic29) for detailed information on HAWQ internal functions. + +While users cannot define new internal functions, they can create aliases for existing internal functions. + +The following example creates a new function named `all_caps` that will be defined as an alias for the `upper` HAWQ internal function: + + +``` sql +gpadmin=# CREATE FUNCTION all_caps (text) RETURNS text AS 'upper' +LANGUAGE internal STRICT; +CREATE FUNCTION +gpadmin=# SELECT all_caps('change me'); + all_caps +--- + CHANGE ME +(1 row) + +``` + +For more information on aliasing internal functions, refer to [Internal Functions](https://www.postgresql.org/docs/8.2/static/xfunc-internal.html) in the PostgreSQL documentation. + +## C --- End diff -- This id value is the same as the previous one - should be unique. Also change header to "C Functions"? > document the HAWQ built-in languages (SQL, C, internal) > --- > > Key: HAWQ-1096 > URL: https://issues.apache.org/jira/browse/HAWQ-1096 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > > the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, > C and internal. add content to introduce these languages with relevant > examples and links. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)
[ https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587172#comment-15587172 ] ASF GitHub Bot commented on HAWQ-1096: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83976414 --- Diff: plext/UsingProceduralLanguages.html.md.erb --- @@ -1,13 +1,16 @@ --- -title: Using Procedural Languages and Extensions in HAWQ +title: Using Languages and Extensions in HAWQ --- -HAWQ allows user-defined functions to be written in other languages besides SQL and C. These other languages are generically called *procedural languages* (PLs). +HAWQ supports user-defined functions created with the SQL and C built-in languages, including supporting user-defined aliases for internal functions. --- End diff -- This needs a bit of an edit: HAWQ supports user-defined functions **that are** created with the SQL and C built-in languages, **and also supports** user-defined aliases for internal functions. > document the HAWQ built-in languages (SQL, C, internal) > --- > > Key: HAWQ-1096 > URL: https://issues.apache.org/jira/browse/HAWQ-1096 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > > the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, > C and internal. add content to introduce these languages with relevant > examples and links. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)
[ https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587170#comment-15587170 ] ASF GitHub Bot commented on HAWQ-1096: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83978854 --- Diff: plext/builtin_langs.html.md.erb --- @@ -0,0 +1,110 @@ +--- +title: Using HAWQ Built-In Languages +--- + +This section provides an introduction to using the HAWQ built-in languages. + +HAWQ supports user-defined functions created with the SQL and C built-in languages. HAWQ also supports user-defined aliases for internal functions. + + +## Enabling Built-in Language Support + +Support for SQL, internal, and C language user-defined functions is enabled by default for all HAWQ databases. + +## SQL + +SQL functions execute an arbitrary list of SQL statements. The SQL statements in the body of an SQL function must be separated by semicolons. The final statement in a non-void-returning SQL function must be a [SELECT](../reference/sql/SELECT.html) that returns data of the type specified by the function's return type. The function will return a single or set of rows corresponding to this last SQL query. + +The following example creates and calls an SQL function to count the number of rows of the database named `orders`: + +``` sql +gpadmin=# CREATE FUNCTION count_orders() RETURNS bigint AS $$ + SELECT count(*) FROM orders; +$$ LANGUAGE SQL; +CREATE FUNCTION +gpadmin=# select count_orders(); + my_count +-- + 830513 +(1 row) +``` + +For additional information on creating SQL functions, refer to [Query Language (SQL) Functions](https://www.postgresql.org/docs/8.2/static/xfunc-sql.html) in the PostgreSQL documentation. + +## Internal + +Many HAWQ internal functions are written in C. These functions are declared during initialization of the database cluster and statically linked to the HAWQ server. See [Built-in Functions and Operators](../query/functions-operators.html#topic29) for detailed information on HAWQ internal functions. + +While users cannot define new internal functions, they can create aliases for existing internal functions. + +The following example creates a new function named `all_caps` that will be defined as an alias for the `upper` HAWQ internal function: + + +``` sql +gpadmin=# CREATE FUNCTION all_caps (text) RETURNS text AS 'upper' +LANGUAGE internal STRICT; +CREATE FUNCTION +gpadmin=# SELECT all_caps('change me'); + all_caps +--- + CHANGE ME +(1 row) + +``` + +For more information on aliasing internal functions, refer to [Internal Functions](https://www.postgresql.org/docs/8.2/static/xfunc-internal.html) in the PostgreSQL documentation. + +## C + +User-defined functions written in C must be compiled into shared libraries to be loaded by the HAWQ server on demand. This dynamic loading distinguishes C language functions from internal functions that are written in C. --- End diff -- Avoid passive voice here: "You must compile user-defined functions written in C into shared libraries so that the HAWQ server can load them on demand." > document the HAWQ built-in languages (SQL, C, internal) > --- > > Key: HAWQ-1096 > URL: https://issues.apache.org/jira/browse/HAWQ-1096 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > > the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, > C and internal. add content to introduce these languages with relevant > examples and links. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)
[ https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587168#comment-15587168 ] ASF GitHub Bot commented on HAWQ-1096: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83977628 --- Diff: plext/builtin_langs.html.md.erb --- @@ -0,0 +1,110 @@ +--- +title: Using HAWQ Built-In Languages +--- + +This section provides an introduction to using the HAWQ built-in languages. + +HAWQ supports user-defined functions created with the SQL and C built-in languages. HAWQ also supports user-defined aliases for internal functions. + + +## Enabling Built-in Language Support + +Support for SQL, internal, and C language user-defined functions is enabled by default for all HAWQ databases. + +## SQL + +SQL functions execute an arbitrary list of SQL statements. The SQL statements in the body of an SQL function must be separated by semicolons. The final statement in a non-void-returning SQL function must be a [SELECT](../reference/sql/SELECT.html) that returns data of the type specified by the function's return type. The function will return a single or set of rows corresponding to this last SQL query. --- End diff -- Global: change "an SQL" to "a SQL" (pronounced 'sequel') > document the HAWQ built-in languages (SQL, C, internal) > --- > > Key: HAWQ-1096 > URL: https://issues.apache.org/jira/browse/HAWQ-1096 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > > the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, > C and internal. add content to introduce these languages with relevant > examples and links. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)
[ https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587174#comment-15587174 ] ASF GitHub Bot commented on HAWQ-1096: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83978174 --- Diff: plext/builtin_langs.html.md.erb --- @@ -0,0 +1,110 @@ +--- +title: Using HAWQ Built-In Languages +--- + +This section provides an introduction to using the HAWQ built-in languages. + +HAWQ supports user-defined functions created with the SQL and C built-in languages. HAWQ also supports user-defined aliases for internal functions. + + +## Enabling Built-in Language Support + +Support for SQL, internal, and C language user-defined functions is enabled by default for all HAWQ databases. + +## SQL + +SQL functions execute an arbitrary list of SQL statements. The SQL statements in the body of an SQL function must be separated by semicolons. The final statement in a non-void-returning SQL function must be a [SELECT](../reference/sql/SELECT.html) that returns data of the type specified by the function's return type. The function will return a single or set of rows corresponding to this last SQL query. + +The following example creates and calls an SQL function to count the number of rows of the database named `orders`: + +``` sql +gpadmin=# CREATE FUNCTION count_orders() RETURNS bigint AS $$ + SELECT count(*) FROM orders; +$$ LANGUAGE SQL; +CREATE FUNCTION +gpadmin=# select count_orders(); + my_count +-- + 830513 +(1 row) +``` + +For additional information on creating SQL functions, refer to [Query Language (SQL) Functions](https://www.postgresql.org/docs/8.2/static/xfunc-sql.html) in the PostgreSQL documentation. + +## Internal --- End diff -- Change title to "Internal Functions"? > document the HAWQ built-in languages (SQL, C, internal) > --- > > Key: HAWQ-1096 > URL: https://issues.apache.org/jira/browse/HAWQ-1096 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > > the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, > C and internal. add content to introduce these languages with relevant > examples and links. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)
[ https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587169#comment-15587169 ] ASF GitHub Bot commented on HAWQ-1096: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83979056 --- Diff: plext/builtin_langs.html.md.erb --- @@ -0,0 +1,110 @@ +--- +title: Using HAWQ Built-In Languages +--- + +This section provides an introduction to using the HAWQ built-in languages. + +HAWQ supports user-defined functions created with the SQL and C built-in languages. HAWQ also supports user-defined aliases for internal functions. + + +## Enabling Built-in Language Support + +Support for SQL, internal, and C language user-defined functions is enabled by default for all HAWQ databases. + +## SQL + +SQL functions execute an arbitrary list of SQL statements. The SQL statements in the body of an SQL function must be separated by semicolons. The final statement in a non-void-returning SQL function must be a [SELECT](../reference/sql/SELECT.html) that returns data of the type specified by the function's return type. The function will return a single or set of rows corresponding to this last SQL query. + +The following example creates and calls an SQL function to count the number of rows of the database named `orders`: + +``` sql +gpadmin=# CREATE FUNCTION count_orders() RETURNS bigint AS $$ + SELECT count(*) FROM orders; +$$ LANGUAGE SQL; +CREATE FUNCTION +gpadmin=# select count_orders(); + my_count +-- + 830513 +(1 row) +``` + +For additional information on creating SQL functions, refer to [Query Language (SQL) Functions](https://www.postgresql.org/docs/8.2/static/xfunc-sql.html) in the PostgreSQL documentation. --- End diff -- Global edit: Change "For additional information on" to "For additional information about" > document the HAWQ built-in languages (SQL, C, internal) > --- > > Key: HAWQ-1096 > URL: https://issues.apache.org/jira/browse/HAWQ-1096 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > > the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, > C and internal. add content to introduce these languages with relevant > examples and links. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation
[ https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15589261#comment-15589261 ] ASF GitHub Bot commented on HAWQ-1095: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r84117499 --- Diff: clientaccess/g-database-application-interfaces.html.md.erb --- @@ -1,8 +1,96 @@ --- -title: ODBC/JDBC Application Interfaces +title: HAWQ Database Drivers and APIs --- +You may want to connect your existing Business Intelligence (BI) or Analytics applications with HAWQ. The database application programming interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs. -You may want to deploy your existing Business Intelligence (BI) or Analytics applications with HAWQ. The most commonly used database application programming interfaces with HAWQ are the ODBC and JDBC APIs. +HAWQ provides the following connectivity tools for connecting to the database: + + - ODBC driver + - JDBC driver + - `libpq` - PostgreSQL C API + +## HAWQ Drivers + +ODBC and JDBC drivers for HAWQ are available as a separate download from Pivotal Network [Pivotal Network](https://network.pivotal.io/products/pivotal-hdb). + +### ODBC Driver + +The ODBC API specifies a standard set of C interfaces for accessing database management systems. For additional information on using the ODBC API, refer to the [ODBC Programmer's Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) documentation. + +HAWQ supports the DataDirect ODBC Driver. Installation instructions for this driver are provided on the Pivotal Network driver download page. Refer to [HAWQ ODBC Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23) for HAWQ-specific ODBC driver information. --- End diff -- Ok - thanks. I think in other cases PDFs of the actual docs are included. This might only be in the Windows downloads. > enhance database driver and API documentation > - > > Key: HAWQ-1095 > URL: https://issues.apache.org/jira/browse/HAWQ-1095 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > docs contain very brief references to JDBC/ODBC and none at all to libpq. > add more content in these areas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)
[ https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15589308#comment-15589308 ] ASF GitHub Bot commented on HAWQ-1096: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq-docs/pull/27 > document the HAWQ built-in languages (SQL, C, internal) > --- > > Key: HAWQ-1096 > URL: https://issues.apache.org/jira/browse/HAWQ-1096 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > > the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, > C and internal. add content to introduce these languages with relevant > examples and links. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation
[ https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15589318#comment-15589318 ] ASF GitHub Bot commented on HAWQ-1095: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq-docs/pull/23 > enhance database driver and API documentation > - > > Key: HAWQ-1095 > URL: https://issues.apache.org/jira/browse/HAWQ-1095 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > docs contain very brief references to JDBC/ODBC and none at all to libpq. > add more content in these areas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)
[ https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15589313#comment-15589313 ] ASF GitHub Bot commented on HAWQ-1096: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq-docs/pull/25 > document the HAWQ built-in languages (SQL, C, internal) > --- > > Key: HAWQ-1096 > URL: https://issues.apache.org/jira/browse/HAWQ-1096 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > > the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, > C and internal. add content to introduce these languages with relevant > examples and links. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15602949#comment-15602949 ] ASF GitHub Bot commented on HAWQ-1107: -- GitHub user lisakowen opened a pull request: https://github.com/apache/incubator-hawq-docs/pull/33 HAWQ-1107 - enhance PXF HDFS plugin documentation added more examples, restructured the content, removed SequenceWritable references. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lisakowen/incubator-hawq-docs feature/pxfhdfs-enhance Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-docs/pull/33.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #33 commit 9ca277927bebd9c8d79bdf4619dfaf94a695c838 Author: Lisa Owen Date: 2016-10-14T22:29:22Z start restructuring HDFS plug-in page commit 2da7a92a3e8431335a48005d55a70c9eba333e16 Author: Lisa Owen Date: 2016-10-17T23:27:23Z more content and rearranging of pxf hdfs plugin page commit 5a941a70bda0e8466b5aa5dd2885840fce14c522 Author: Lisa Owen Date: 2016-10-18T16:57:09Z more rework of hdfs plug in page commit fd029d568589f5a4e2461d92437963d97f7d3198 Author: Lisa Owen Date: 2016-10-20T19:20:21Z remove SerialWritable, use namenode for host commit 6ba64f94d5b11397c98f46eb14d5c6e48d17a6cc Author: Lisa Owen Date: 2016-10-20T21:12:43Z use more descriptive file names commit 86d13b312ea8591949b8a811973937ab60f74df9 Author: Lisa Owen Date: 2016-10-20T22:36:01Z more mods to HDFS plugin docs > PXF HDFS documentation - restructure content and include more examples > -- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, > Avro) profiles. restructure the content as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15602963#comment-15602963 ] ASF GitHub Bot commented on HAWQ-1107: -- GitHub user lisakowen opened a pull request: https://github.com/apache/incubator-hawq-docs/pull/34 HAWQ-1107 - subnav chgs for pxf hdfs plugin content restructure subnav changes You can merge this pull request into a Git repository by running: $ git pull https://github.com/lisakowen/incubator-hawq-docs feature/subnav-pxfhdfs-enhance Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-docs/pull/34.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #34 commit f350e41fa419e9fb661f4ccb6e8793b7d9e9a40b Author: Lisa Owen Date: 2016-10-24T19:30:37Z subna chgs for pxf hdfs plugin content restructure > PXF HDFS documentation - restructure content and include more examples > -- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, > Avro) profiles. restructure the content as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606510#comment-15606510 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85000415 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,388 +2,282 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; -``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, etc. -``` sql -INSERT INTO table_name ...; -``` +The HDFS file system command is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. + +`hdfs dfs` options used in this section are identified in the table below: + +| Option | Description | +|---|-| +| `-cat`| Display file contents. | +| `-mkdir`| Create directory in HDFS. | +| `-put`| Copy file from local file system to HDFS. | + +### Create Data Files + +Perform the following steps to create data files used in subsequent exercises: + +1. Create an HDFS directory for PXF example data files: + +``` shell + $ sudo -u hdfs hdfs dfs -mkdir -p /data/pxf_examples +``` + +2. Create a delimited plain text file: + +``` shell +$ vi /tmp/pxf_hdfs_simple.txt +``` + +3. Copy and paste the following data into `pxf_hdfs_simple.txt`: + +``` pre +Prague,Jan,101,4875.33 +Rome,Mar,87,1557.39 +Bangalore,May,317,8936.99 +Beijing,Jul,411,11600.67 +``` + +Notice the use of the comma `,` to separate the four data fields. + +4. Add the data file to HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_simple.txt /data/pxf_examples/ +``` + +5. Display the contents of the `pxf_hdfs_simple.txt` file stored in HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -cat /data/pxf_examples/pxf_hdfs_simple.txt +``` + +6. Create a second delimited plain text file: + +``` shell +$ vi /tmp/pxf_hdfs_multi.txt +``` -To read the data in the files or to write based on the existing format, use `FORMAT`, `P
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606508#comment-15606508 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r84997631 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,388 +2,282 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; -``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, etc. --- End diff -- Change "etc." to "and so forth." > PXF HDFS documentation - restructure content and include more examples > -- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, > Avro) profiles. restructure the content as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606515#comment-15606515 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85003579 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -415,93 +312,101 @@ The following example uses the Avro schema shown in [Sample Avro Schema](#topic_ {"name":"street", "type":"string"}, {"name":"city", "type":"string"}] } - }, { - "name": "relationship", -"type": { -"type": "enum", -"name": "relationshipEnum", -"symbols": ["MARRIED","LOVE","FRIEND","COLLEAGUE","STRANGER","ENEMY"] -} - }, { -"name" : "md5", -"type": { -"type" : "fixed", -"name" : "md5Fixed", -"size" : 4 -} } ], "doc:" : "A basic schema for storing messages" } ``` - Sample Avro Data (JSON) +### Sample Avro Data (JSON) + +Create a text file named `pxf_hdfs_avro.txt`: + +``` shell +$ vi /tmp/pxf_hdfs_avro.txt +``` + +Enter the following data into `pxf_hdfs_avro.txt`: ``` pre -{"id":1, "username":"john","followers":["kate", "santosh"], "rank":null, "relationship": "FRIEND", "fmap": {"kate":10,"santosh":4}, -"address":{"street":"renaissance drive", "number":1,"city":"san jose"}, "md5":\u3F00\u007A\u0073\u0074} +{"id":1, "username":"john","followers":["kate", "santosh"], "relationship": "FRIEND", "fmap": {"kate":10,"santosh":4}, "address":{"number":1, "street":"renaissance drive", "city":"san jose"}} + +{"id":2, "username":"jim","followers":["john", "pam"], "relationship": "COLLEAGUE", "fmap": {"john":3,"pam":3}, "address":{"number":9, "street":"deer creek", "city":"palo alto"}} +``` + +The sample data uses a comma `,` to separate top level records and a colon `:` to separate map/key values and record field name/values. -{"id":2, "username":"jim","followers":["john", "pam"], "rank":3, "relationship": "COLLEAGUE", "fmap": {"john":3,"pam":3}, -"address":{"street":"deer creek", "number":9,"city":"palo alto"}, "md5":\u0010\u0021\u0003\u0004} +Convert the text file to Avro format. There are various ways to perform the conversion programmatically and via the command line. In this example, we use the [Java Avro tools](http://avro.apache.org/releases.html), and the jar file resides in the current directory: + +``` shell +$ java -jar ./avro-tools-1.8.1.jar fromjson --schema-file /tmp/avro_schema.avsc /tmp/pxf_hdfs_avro.txt > /tmp/pxf_hdfs_avro.avro ``` -To map this Avro file to an external table, the top-level primitive fields ("id" of type long and "username" of type string) are mapped to their equivalent HAWQ types (bigint and text). The remaining complex fields are mapped to text columns: +The generated Avro binary data file is written to `/tmp/pxf_hdfs_avro.avro`. Copy this file to HDFS: -``` sql -gpadmin=# CREATE EXTERNAL TABLE avro_complex - (id bigint, - username text, - followers text, - rank int, - fmap text, - address text, - relationship text, - md5 bytea) -LOCATION ('pxf://namehost:51200/tmp/avro_complex?PROFILE=Avro') -FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'); +``` shell +$ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_avro.avro /data/pxf_examples/ ``` +### Querying Avro Data + +Create a queryable external table from this Avro file: -The above command uses default delimiters for separating components of the complex types. This command is equivalent to the one above, but it explicitly sets the delimiters using the Avro profile parameters: +- Map the top-level primitive fields, `id` (type long) and `username` (type string), to their equivalent HAWQ types (bigint and text). +- Map the remaining complex fields to type text. +- Explicitly set the record, map, and collection delimiters using the Avro profile custom options: ``` sql -gpadmin=# CREATE EXTERNAL TABLE avro_complex - (id bigint, - username text, - followers text, - rank int, - fmap text, - address text, - relationship text, - md5 bytea) -LOCATION ('pxf://localhost:51200/tmp/avro_complex?PROFILE=Avro&COLLECTION_DELIM=,&MAPKEY_DELIM=:&RECORDKEY_DELIM=:') -FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'); +gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_avro(id bigint, username text, followers text, fmap text, relationship text, address text) +LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_avro.avro?PROFILE=Avro&CO
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606514#comment-15606514 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85003214 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,388 +2,282 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; -``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, etc. -``` sql -INSERT INTO table_name ...; -``` +The HDFS file system command is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. + +`hdfs dfs` options used in this section are identified in the table below: + +| Option | Description | +|---|-| +| `-cat`| Display file contents. | +| `-mkdir`| Create directory in HDFS. | +| `-put`| Copy file from local file system to HDFS. | + +### Create Data Files + +Perform the following steps to create data files used in subsequent exercises: + +1. Create an HDFS directory for PXF example data files: + +``` shell + $ sudo -u hdfs hdfs dfs -mkdir -p /data/pxf_examples +``` + +2. Create a delimited plain text file: + +``` shell +$ vi /tmp/pxf_hdfs_simple.txt +``` + +3. Copy and paste the following data into `pxf_hdfs_simple.txt`: + +``` pre +Prague,Jan,101,4875.33 +Rome,Mar,87,1557.39 +Bangalore,May,317,8936.99 +Beijing,Jul,411,11600.67 +``` + +Notice the use of the comma `,` to separate the four data fields. + +4. Add the data file to HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_simple.txt /data/pxf_examples/ +``` + +5. Display the contents of the `pxf_hdfs_simple.txt` file stored in HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -cat /data/pxf_examples/pxf_hdfs_simple.txt +``` + +6. Create a second delimited plain text file: + +``` shell +$ vi /tmp/pxf_hdfs_multi.txt +``` -To read the data in the files or to write based on the existing format, use `FORMAT`, `P
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606509#comment-15606509 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r84998515 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,388 +2,282 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; -``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, etc. -``` sql -INSERT INTO table_name ...; -``` +The HDFS file system command is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. + +`hdfs dfs` options used in this section are identified in the table below: --- End diff -- Edit: The `hdfs dfs` options used in this section are: > PXF HDFS documentation - restructure content and include more examples > -- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, > Avro) profiles. restructure the content as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606511#comment-15606511 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r84996425 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,388 +2,282 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. --- End diff -- Add an XREF here. > PXF HDFS documentation - restructure content and include more examples > -- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, > Avro) profiles. restructure the content as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606513#comment-15606513 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85002565 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,388 +2,282 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; -``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, etc. -``` sql -INSERT INTO table_name ...; -``` +The HDFS file system command is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. + +`hdfs dfs` options used in this section are identified in the table below: + +| Option | Description | +|---|-| +| `-cat`| Display file contents. | +| `-mkdir`| Create directory in HDFS. | +| `-put`| Copy file from local file system to HDFS. | + +### Create Data Files + +Perform the following steps to create data files used in subsequent exercises: + +1. Create an HDFS directory for PXF example data files: + +``` shell + $ sudo -u hdfs hdfs dfs -mkdir -p /data/pxf_examples +``` + +2. Create a delimited plain text file: + +``` shell +$ vi /tmp/pxf_hdfs_simple.txt +``` + +3. Copy and paste the following data into `pxf_hdfs_simple.txt`: + +``` pre +Prague,Jan,101,4875.33 +Rome,Mar,87,1557.39 +Bangalore,May,317,8936.99 +Beijing,Jul,411,11600.67 +``` + +Notice the use of the comma `,` to separate the four data fields. + +4. Add the data file to HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_simple.txt /data/pxf_examples/ +``` + +5. Display the contents of the `pxf_hdfs_simple.txt` file stored in HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -cat /data/pxf_examples/pxf_hdfs_simple.txt +``` + +6. Create a second delimited plain text file: + +``` shell +$ vi /tmp/pxf_hdfs_multi.txt +``` -To read the data in the files or to write based on the existing format, use `FORMAT`, `P
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606506#comment-15606506 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r84999127 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,388 +2,282 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; -``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, etc. -``` sql -INSERT INTO table_name ...; -``` +The HDFS file system command is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. + +`hdfs dfs` options used in this section are identified in the table below: + +| Option | Description | +|---|-| +| `-cat`| Display file contents. | +| `-mkdir`| Create directory in HDFS. | +| `-put`| Copy file from local file system to HDFS. | + +### Create Data Files + +Perform the following steps to create data files used in subsequent exercises: --- End diff -- I think this procedure needs a bit more explanation about what its trying to accomplish. It seems like this should be optional in the context of the larger topic, as readers might already have files in HDFS that they want to reference. Just add some notes to say that you can optionally follow the steps to create some sample files in HDFS for use in later examples. > PXF HDFS documentation - restructure content and include more examples > -- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, > Avro) profiles. restructure the content as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606512#comment-15606512 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r84999604 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,388 +2,282 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; -``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, etc. -``` sql -INSERT INTO table_name ...; -``` +The HDFS file system command is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. + +`hdfs dfs` options used in this section are identified in the table below: + +| Option | Description | +|---|-| +| `-cat`| Display file contents. | +| `-mkdir`| Create directory in HDFS. | +| `-put`| Copy file from local file system to HDFS. | + +### Create Data Files + +Perform the following steps to create data files used in subsequent exercises: + +1. Create an HDFS directory for PXF example data files: + +``` shell + $ sudo -u hdfs hdfs dfs -mkdir -p /data/pxf_examples +``` + +2. Create a delimited plain text file: + +``` shell +$ vi /tmp/pxf_hdfs_simple.txt --- End diff -- Does it make sense to change these into `echo` commands so they can just be cut/pasted? Like: $ echo 'Prague,Jan,101,4875.33 Rome,Mar,87,1557.39 Bangalore,May,317,8936.99 Beijing,Jul,411,11600.67' >> pxf_hdfs_simple.txt > PXF HDFS documentation - restructure content and include more examples > -- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritabl
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606507#comment-15606507 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r84997781 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,388 +2,282 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; -``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, etc. -``` sql -INSERT INTO table_name ...; -``` +The HDFS file system command is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. --- End diff -- command -> command syntax > PXF HDFS documentation - restructure content and include more examples > -- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, > Avro) profiles. restructure the content as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606649#comment-15606649 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq-docs/pull/34 > PXF HDFS documentation - restructure content and include more examples > -- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, > Avro) profiles. restructure the content as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15608979#comment-15608979 ] ASF GitHub Bot commented on HAWQ-1107: -- GitHub user lisakowen opened a pull request: https://github.com/apache/incubator-hawq-docs/pull/38 HAWQ-1107 - more subnav changes for HDFS plugin remove all submenus You can merge this pull request into a Git repository by running: $ git pull https://github.com/lisakowen/incubator-hawq-docs feature/subnav-pxfhdfs-enhance Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-docs/pull/38.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #38 > PXF HDFS documentation - restructure content and include more examples > -- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, > Avro) profiles. restructure the content as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15609383#comment-15609383 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq-docs/pull/38 > PXF HDFS documentation - restructure content and include more examples > -- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, > Avro) profiles. restructure the content as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15609382#comment-15609382 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq-docs/pull/33 > PXF HDFS documentation - restructure content and include more examples > -- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, > Avro) profiles. restructure the content as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15609892#comment-15609892 ] ASF GitHub Bot commented on HAWQ-1071: -- GitHub user lisakowen opened a pull request: https://github.com/apache/incubator-hawq-docs/pull/39 HAWQ-1071 - add examples for HiveText and HiveRC plugins added examples, restructured content, added hive command line section. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lisakowen/incubator-hawq-docs feature/pxfhive-enhance Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-docs/pull/39.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #39 commit 0398a62fefd3627273927f938b4d082a25bf3003 Author: Lisa Owen Date: 2016-09-26T21:37:04Z restructure PXF Hive pulug-in page; add more relevant examples commit 457d703a3f5c057e241acf985fbc35da34f6a075 Author: Lisa Owen Date: 2016-09-26T22:40:10Z PXF Hive plug-in mods commit 822d7545e746490e55507866c62dca5ea2d5349a Author: Lisa Owen Date: 2016-10-03T22:19:03Z clean up some extra whitespace commit 8c986b60b8db3edd77c10f23704cc9174c52a803 Author: Lisa Owen Date: 2016-10-11T18:37:34Z include list of hive profile names in file format section commit 150fa67857871d58ea05eb14c023215c932ab7b1 Author: Lisa Owen Date: 2016-10-11T19:03:39Z link to CREATE EXTERNAL TABLE ref page commit 5cdd8f8c35a51360fe3bfdedeff796bf1e0f31f3 Author: Lisa Owen Date: 2016-10-11T20:27:17Z sql commands all caps commit 67e8b9699c9eec64d04ce9e6048ffb385f7f3573 Author: Lisa Owen Date: 2016-10-11T20:33:35Z use <> for optional args commit 54b2c01a80d477cc093d7eb1ed2aa8c0bf762d36 Author: Lisa Owen Date: 2016-10-22T00:16:24Z fix some duplicate ids commit 284c3ec2db38e8d9020826e3bf292efad76c1819 Author: Lisa Owen Date: 2016-10-26T15:38:37Z restructure to use numbered steps commit 2a38a0322abda804cfd4fc8aa39f142f0d83ea11 Author: Lisa Owen Date: 2016-10-26T17:20:28Z note/notice > add PXF HiveText and HiveRC profile examples to the documentation > - > > Key: HAWQ-1071 > URL: https://issues.apache.org/jira/browse/HAWQ-1071 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF Hive documentation includes an example for only the Hive > profile. add examples for HiveText and HiveRC profiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15609898#comment-15609898 ] ASF GitHub Bot commented on HAWQ-1071: -- GitHub user lisakowen opened a pull request: https://github.com/apache/incubator-hawq-docs/pull/40 HAWQ-1071 - subnav changes for pxf enhancement work removed all submenus from the subnav You can merge this pull request into a Git repository by running: $ git pull https://github.com/lisakowen/incubator-hawq-docs feature/subnav-pxfhive-enhance Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-docs/pull/40.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #40 commit c3f381265b2c48b89f98863888ccd00b2926880c Author: Lisa Owen Date: 2016-10-24T19:43:04Z subnav chgs for hive plugin content restructure commit 54445c6815a166e4e275455ea64221322087 Author: Lisa Owen Date: 2016-10-26T16:36:31Z remove submenu from pxf hive plugin subnav > add PXF HiveText and HiveRC profile examples to the documentation > - > > Key: HAWQ-1071 > URL: https://issues.apache.org/jira/browse/HAWQ-1071 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF Hive documentation includes an example for only the Hive > profile. add examples for HiveText and HiveRC profiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612282#comment-15612282 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user kavinderd commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85362806 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,506 +2,449 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; +If you find that the pre-defined PXF HDFS profiles do not meet your needs, you may choose to create a custom HDFS profile from the existing HDFS serialization and deserialization classes. Refer to [Adding and Updating Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on creating a custom profile. + +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, and so forth. + +The HDFS file system command syntax is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. + +`hdfs dfs` options used in this topic are: + +| Option | Description | +|---|-| +| `-cat`| Display file contents. | +| `-mkdir`| Create directory in HDFS. | +| `-put`| Copy file from local file system to HDFS. | + +Examples: + +Create a directory in HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir ``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +Copy a text file to HDFS: -``` sql -INSERT INTO table_name ...; +``` shell +$ sudo -u hdfs hdfs dfs -put /tmp/example.txt /data/exampledir/ ``` -To read the data in the files or to write based on the existing format, use `FORMAT`, `PROFILE`, or one of the classes. - -This topic describes the following: - -- FORMAT clause -- Profile -- Accessor -- Resolver -- Avro - -**Note:** For more details about the API and classes, see [PXF External Tables and API](PXFExternalTableandAPIReference.html#pxfexternaltableandapireference). - -### FORMAT clause - -Use one of the following formats to read data with any PXF connector: - -- `FORMAT 'TEXT'`: Use with plain delimited text files on HDFS. -- `FORMAT 'CSV'`: Us
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612280#comment-15612280 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user kavinderd commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85361807 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,506 +2,449 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; +If you find that the pre-defined PXF HDFS profiles do not meet your needs, you may choose to create a custom HDFS profile from the existing HDFS serialization and deserialization classes. Refer to [Adding and Updating Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on creating a custom profile. + +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, and so forth. + +The HDFS file system command syntax is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. + +`hdfs dfs` options used in this topic are: + +| Option | Description | +|---|-| +| `-cat`| Display file contents. | +| `-mkdir`| Create directory in HDFS. | +| `-put`| Copy file from local file system to HDFS. | + +Examples: + +Create a directory in HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir --- End diff -- You don't necessarily have to run hdfs commands as `sudo -u hdfs` if the current user has the hdfs client and permissions. > PXF HDFS documentation - restructure content and include more examples > -- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, > Avro) profiles. restructure the content as well. -- This message was sent by Atlassian JIRA (v6.3.4#633
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612281#comment-15612281 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user kavinderd commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85362384 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,506 +2,449 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; +If you find that the pre-defined PXF HDFS profiles do not meet your needs, you may choose to create a custom HDFS profile from the existing HDFS serialization and deserialization classes. Refer to [Adding and Updating Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on creating a custom profile. + +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, and so forth. + +The HDFS file system command syntax is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. + +`hdfs dfs` options used in this topic are: + +| Option | Description | +|---|-| +| `-cat`| Display file contents. | +| `-mkdir`| Create directory in HDFS. | +| `-put`| Copy file from local file system to HDFS. | + +Examples: + +Create a directory in HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir ``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +Copy a text file to HDFS: -``` sql -INSERT INTO table_name ...; +``` shell +$ sudo -u hdfs hdfs dfs -put /tmp/example.txt /data/exampledir/ ``` -To read the data in the files or to write based on the existing format, use `FORMAT`, `PROFILE`, or one of the classes. - -This topic describes the following: - -- FORMAT clause -- Profile -- Accessor -- Resolver -- Avro - -**Note:** For more details about the API and classes, see [PXF External Tables and API](PXFExternalTableandAPIReference.html#pxfexternaltableandapireference). - -### FORMAT clause - -Use one of the following formats to read data with any PXF connector: - -- `FORMAT 'TEXT'`: Use with plain delimited text files on HDFS. -- `FORMAT 'CSV'`: Us
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612279#comment-15612279 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user kavinderd commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85358483 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,506 +2,449 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; +If you find that the pre-defined PXF HDFS profiles do not meet your needs, you may choose to create a custom HDFS profile from the existing HDFS serialization and deserialization classes. Refer to [Adding and Updating Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on creating a custom profile. + +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, and so forth. + +The HDFS file system command syntax is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. + +`hdfs dfs` options used in this topic are: + +| Option | Description | +|---|-| +| `-cat`| Display file contents. | +| `-mkdir`| Create directory in HDFS. | +| `-put`| Copy file from local file system to HDFS. | + +Examples: + +Create a directory in HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir ``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +Copy a text file to HDFS: -``` sql -INSERT INTO table_name ...; +``` shell +$ sudo -u hdfs hdfs dfs -put /tmp/example.txt /data/exampledir/ ``` -To read the data in the files or to write based on the existing format, use `FORMAT`, `PROFILE`, or one of the classes. - -This topic describes the following: - -- FORMAT clause -- Profile -- Accessor -- Resolver -- Avro - -**Note:** For more details about the API and classes, see [PXF External Tables and API](PXFExternalTableandAPIReference.html#pxfexternaltableandapireference). - -### FORMAT clause - -Use one of the following formats to read data with any PXF connector: - -- `FORMAT 'TEXT'`: Use with plain delimited text files on HDFS. -- `FORMAT 'CSV'`: Us
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612342#comment-15612342 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user lisakowen commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85371514 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,506 +2,449 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; +If you find that the pre-defined PXF HDFS profiles do not meet your needs, you may choose to create a custom HDFS profile from the existing HDFS serialization and deserialization classes. Refer to [Adding and Updating Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on creating a custom profile. + +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, and so forth. + +The HDFS file system command syntax is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. + +`hdfs dfs` options used in this topic are: + +| Option | Description | +|---|-| +| `-cat`| Display file contents. | +| `-mkdir`| Create directory in HDFS. | +| `-put`| Copy file from local file system to HDFS. | + +Examples: + +Create a directory in HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir ``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +Copy a text file to HDFS: -``` sql -INSERT INTO table_name ...; +``` shell +$ sudo -u hdfs hdfs dfs -put /tmp/example.txt /data/exampledir/ ``` -To read the data in the files or to write based on the existing format, use `FORMAT`, `PROFILE`, or one of the classes. - -This topic describes the following: - -- FORMAT clause -- Profile -- Accessor -- Resolver -- Avro - -**Note:** For more details about the API and classes, see [PXF External Tables and API](PXFExternalTableandAPIReference.html#pxfexternaltableandapireference). - -### FORMAT clause - -Use one of the following formats to read data with any PXF connector: - -- `FORMAT 'TEXT'`: Use with plain delimited text files on HDFS. -- `FORMAT 'CSV'`: Us
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612340#comment-15612340 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user lisakowen commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85371358 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,506 +2,449 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; +If you find that the pre-defined PXF HDFS profiles do not meet your needs, you may choose to create a custom HDFS profile from the existing HDFS serialization and deserialization classes. Refer to [Adding and Updating Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on creating a custom profile. + +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, and so forth. + +The HDFS file system command syntax is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. + +`hdfs dfs` options used in this topic are: + +| Option | Description | +|---|-| +| `-cat`| Display file contents. | +| `-mkdir`| Create directory in HDFS. | +| `-put`| Copy file from local file system to HDFS. | + +Examples: + +Create a directory in HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir ``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +Copy a text file to HDFS: -``` sql -INSERT INTO table_name ...; +``` shell +$ sudo -u hdfs hdfs dfs -put /tmp/example.txt /data/exampledir/ ``` -To read the data in the files or to write based on the existing format, use `FORMAT`, `PROFILE`, or one of the classes. - -This topic describes the following: - -- FORMAT clause -- Profile -- Accessor -- Resolver -- Avro - -**Note:** For more details about the API and classes, see [PXF External Tables and API](PXFExternalTableandAPIReference.html#pxfexternaltableandapireference). - -### FORMAT clause - -Use one of the following formats to read data with any PXF connector: - -- `FORMAT 'TEXT'`: Use with plain delimited text files on HDFS. -- `FORMAT 'CSV'`: Us
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612360#comment-15612360 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85371576 --- Diff: pxf/HivePXF.html.md.erb --- @@ -151,184 +477,120 @@ To enable HCatalog query integration in HAWQ, perform the following steps: postgres=# GRANT ALL ON PROTOCOL pxf TO "role"; ``` -3. To query a Hive table with HCatalog integration, simply query HCatalog directly from HAWQ. The query syntax is: -``` sql -postgres=# SELECT * FROM hcatalog.hive-db-name.hive-table-name; -``` +To query a Hive table with HCatalog integration, query HCatalog directly from HAWQ. The query syntax is: + +``` sql +postgres=# SELECT * FROM hcatalog.hive-db-name.hive-table-name; +``` -For example: +For example: -``` sql -postgres=# SELECT * FROM hcatalog.default.sales; -``` - -4. To obtain a description of a Hive table with HCatalog integration, you can use the `psql` client interface. -- Within HAWQ, use either the `\d hcatalog.hive-db-name.hive-table-name` or `\d+ hcatalog.hive-db-name.hive-table-name` commands to describe a single table. For example, from the `psql` client interface: - -``` shell -$ psql -d postgres -postgres=# \d hcatalog.default.test - -PXF Hive Table "default.test" -Column| Type ---+ - name | text - type | text - supplier_key | int4 - full_price | float8 -``` -- Use `\d hcatalog.hive-db-name.*` to describe the whole database schema. For example: - -``` shell -postgres=# \d hcatalog.default.* - -PXF Hive Table "default.test" -Column| Type ---+ - type | text - name | text - supplier_key | int4 - full_price | float8 - -PXF Hive Table "default.testabc" - Column | Type -+-- - type | text - name | text -``` -- Use `\d hcatalog.*.*` to describe the whole schema: - -``` shell -postgres=# \d hcatalog.*.* - -PXF Hive Table "default.test" -Column| Type ---+ - type | text - name | text - supplier_key | int4 - full_price | float8 - -PXF Hive Table "default.testabc" - Column | Type -+-- - type | text - name | text - -PXF Hive Table "userdb.test" - Column | Type ---+-- - address | text - username | text - -``` - -**Note:** When using `\d` or `\d+` commands in the `psql` HAWQ client, `hcatalog` will not be listed as a database. If you use other `psql` compatible clients, `hcatalog` will be listed as a database with a size value of `-1` since `hcatalog` is not a real database in HAWQ. - -5. Alternatively, you can use the **pxf\_get\_item\_fields** user-defined function (UDF) to obtain Hive table descriptions from other client interfaces or third-party applications. The UDF takes a PXF profile and a table pattern string as its input parameters. - -**Note:** Currently the only supported input profile is `'Hive'`. - -For example, the following statement returns a description of a specific table. The description includes path, itemname (table), fieldname, and fieldtype. +``` sql +postgres=# SELECT * FROM hcatalog.default.sales_info; +``` + +To obtain a description of a Hive table with HCatalog integration, you can use the `psql` client interface. + +- Within HAWQ, use either the `\d hcatalog.hive-db-name.hive-table-name` or `\d+ hcatalog.hive-db-name.hive-table-name` commands to describe a single table. For example, from the `psql` client interface: + +``` shell +$ psql -d postgres +``` ``` sql -postgres=# select * from pxf_get_item_fields('Hive','default.test'); +postgres=# \d hcatalog.default.sales_info_rcfile; ``` - -``` pre - path
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612359#comment-15612359 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85365959 --- Diff: pxf/HivePXF.html.md.erb --- @@ -2,121 +2,450 @@ title: Accessing Hive Data --- -This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog. +Apache Hive is a distributed data warehousing infrastructure. Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. + +This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include: + +- Creating an external table in PXF and querying that table +- Querying Hive tables via PXF's integration with HCatalog ## Prerequisites -Check the following before using PXF to access Hive: +Before accessing Hive data with HAWQ and PXF, ensure that: -- The PXF HDFS plug-in is installed on all cluster nodes. +- The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. - The PXF Hive plug-in is installed on all cluster nodes. - The Hive JAR files and conf directory are installed on all cluster nodes. -- Test PXF on HDFS before connecting to Hive or HBase. +- You have tested PXF on HDFS. - You are running the Hive Metastore service on a machine in your cluster. - You have set the `hive.metastore.uris` property in the `hive-site.xml` on the NameNode. +## Hive File Formats + +Hive supports several file formats: + +- TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation +- SequenceFile - flat file consisting of binary key/value pairs +- RCFile - record columnar data consisting of binary key/value pairs; high row compression rate +- ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size +- Parquet - compressed columnar data representation +- Avro - JSON-defined, schema-based data serialization format + +Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive. + +The PXF Hive plug-in supports the following profiles for accessing the Hive file formats listed above. These include: + +- `Hive` +- `HiveText` +- `HiveRC` + +## Data Type Mapping + +### Primitive Data Types + +To represent Hive data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type. + +The following table summarizes external mapping rules for Hive primitive types. + +| Hive Data Type | Hawq Data Type | +|---|---| +| boolean| bool | +| int | int4 | +| smallint | int2 | +| tinyint | int2 | +| bigint | int8 | +| decimal | numeric | +| float | float4 | +| double | float8 | +| string | text | +| binary | bytea | +| char | bpchar | +| varchar | varchar | +| timestamp | timestamp | +| date | date | + + +### Complex Data Types + +Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`. While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract subcomponents of these complex data types. + +An example using complex data types is provided later in this topic. + + +## Sample Data Set + +Examples used in this topic will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types: + +- location - text +- month - text +- number\_of\_orders - integer +- total\_sales - double --- End diff -- Also consider term/definition table here. > add PXF HiveText and HiveRC profile examples to the documentation > - > > Key: HAWQ-1071 > URL: https://issues.apache.org/jira/browse/HAWQ-1071 > Project: Apa
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612356#comment-15612356 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85368752 --- Diff: pxf/HivePXF.html.md.erb --- @@ -2,121 +2,450 @@ title: Accessing Hive Data --- -This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog. +Apache Hive is a distributed data warehousing infrastructure. Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. + +This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include: + +- Creating an external table in PXF and querying that table +- Querying Hive tables via PXF's integration with HCatalog ## Prerequisites -Check the following before using PXF to access Hive: +Before accessing Hive data with HAWQ and PXF, ensure that: -- The PXF HDFS plug-in is installed on all cluster nodes. +- The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. - The PXF Hive plug-in is installed on all cluster nodes. - The Hive JAR files and conf directory are installed on all cluster nodes. -- Test PXF on HDFS before connecting to Hive or HBase. +- You have tested PXF on HDFS. - You are running the Hive Metastore service on a machine in your cluster. - You have set the `hive.metastore.uris` property in the `hive-site.xml` on the NameNode. +## Hive File Formats + +Hive supports several file formats: + +- TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation +- SequenceFile - flat file consisting of binary key/value pairs +- RCFile - record columnar data consisting of binary key/value pairs; high row compression rate +- ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size +- Parquet - compressed columnar data representation +- Avro - JSON-defined, schema-based data serialization format + +Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive. + +The PXF Hive plug-in supports the following profiles for accessing the Hive file formats listed above. These include: + +- `Hive` +- `HiveText` +- `HiveRC` + +## Data Type Mapping + +### Primitive Data Types + +To represent Hive data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type. + +The following table summarizes external mapping rules for Hive primitive types. + +| Hive Data Type | Hawq Data Type | +|---|---| +| boolean| bool | +| int | int4 | +| smallint | int2 | +| tinyint | int2 | +| bigint | int8 | +| decimal | numeric | +| float | float4 | +| double | float8 | +| string | text | +| binary | bytea | +| char | bpchar | +| varchar | varchar | +| timestamp | timestamp | +| date | date | + + +### Complex Data Types + +Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`. While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract subcomponents of these complex data types. + +An example using complex data types is provided later in this topic. + + +## Sample Data Set + +Examples used in this topic will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types: + +- location - text +- month - text +- number\_of\_orders - integer +- total\_sales - double + +Prepare the sample data set for use: + +1. First, create a text file: + +``` +$ vi /tmp/pxf_hive_datafile.txt +``` + +2. Add the following data to `pxf_hive_datafile.txt`; notice the use of the comma `,` to separate the four field values: + +``` +Prague,Jan,101,487
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612364#comment-15612364 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85366470 --- Diff: pxf/HivePXF.html.md.erb --- @@ -2,121 +2,450 @@ title: Accessing Hive Data --- -This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog. +Apache Hive is a distributed data warehousing infrastructure. Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. + +This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include: + +- Creating an external table in PXF and querying that table +- Querying Hive tables via PXF's integration with HCatalog ## Prerequisites -Check the following before using PXF to access Hive: +Before accessing Hive data with HAWQ and PXF, ensure that: -- The PXF HDFS plug-in is installed on all cluster nodes. +- The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. - The PXF Hive plug-in is installed on all cluster nodes. - The Hive JAR files and conf directory are installed on all cluster nodes. -- Test PXF on HDFS before connecting to Hive or HBase. +- You have tested PXF on HDFS. - You are running the Hive Metastore service on a machine in your cluster. - You have set the `hive.metastore.uris` property in the `hive-site.xml` on the NameNode. +## Hive File Formats + +Hive supports several file formats: + +- TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation +- SequenceFile - flat file consisting of binary key/value pairs +- RCFile - record columnar data consisting of binary key/value pairs; high row compression rate +- ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size +- Parquet - compressed columnar data representation +- Avro - JSON-defined, schema-based data serialization format + +Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive. + +The PXF Hive plug-in supports the following profiles for accessing the Hive file formats listed above. These include: + +- `Hive` +- `HiveText` +- `HiveRC` + +## Data Type Mapping + +### Primitive Data Types + +To represent Hive data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type. + +The following table summarizes external mapping rules for Hive primitive types. + +| Hive Data Type | Hawq Data Type | +|---|---| +| boolean| bool | +| int | int4 | +| smallint | int2 | +| tinyint | int2 | +| bigint | int8 | +| decimal | numeric | +| float | float4 | +| double | float8 | +| string | text | +| binary | bytea | +| char | bpchar | +| varchar | varchar | +| timestamp | timestamp | +| date | date | + + +### Complex Data Types + +Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`. While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract subcomponents of these complex data types. + +An example using complex data types is provided later in this topic. + + +## Sample Data Set + +Examples used in this topic will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types: + +- location - text +- month - text +- number\_of\_orders - integer +- total\_sales - double + +Prepare the sample data set for use: + +1. First, create a text file: + +``` +$ vi /tmp/pxf_hive_datafile.txt +``` + +2. Add the following data to `pxf_hive_datafile.txt`; notice the use of the comma `,` to separate the four field values: + +``` +Prague,Jan,101,487
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612363#comment-15612363 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85367290 --- Diff: pxf/HivePXF.html.md.erb --- @@ -2,121 +2,450 @@ title: Accessing Hive Data --- -This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog. +Apache Hive is a distributed data warehousing infrastructure. Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. + +This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include: + +- Creating an external table in PXF and querying that table +- Querying Hive tables via PXF's integration with HCatalog ## Prerequisites -Check the following before using PXF to access Hive: +Before accessing Hive data with HAWQ and PXF, ensure that: -- The PXF HDFS plug-in is installed on all cluster nodes. +- The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. - The PXF Hive plug-in is installed on all cluster nodes. - The Hive JAR files and conf directory are installed on all cluster nodes. -- Test PXF on HDFS before connecting to Hive or HBase. +- You have tested PXF on HDFS. - You are running the Hive Metastore service on a machine in your cluster. - You have set the `hive.metastore.uris` property in the `hive-site.xml` on the NameNode. +## Hive File Formats + +Hive supports several file formats: + +- TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation +- SequenceFile - flat file consisting of binary key/value pairs +- RCFile - record columnar data consisting of binary key/value pairs; high row compression rate +- ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size +- Parquet - compressed columnar data representation +- Avro - JSON-defined, schema-based data serialization format + +Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive. + +The PXF Hive plug-in supports the following profiles for accessing the Hive file formats listed above. These include: + +- `Hive` +- `HiveText` +- `HiveRC` + +## Data Type Mapping + +### Primitive Data Types + +To represent Hive data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type. + +The following table summarizes external mapping rules for Hive primitive types. + +| Hive Data Type | Hawq Data Type | +|---|---| +| boolean| bool | +| int | int4 | +| smallint | int2 | +| tinyint | int2 | +| bigint | int8 | +| decimal | numeric | +| float | float4 | +| double | float8 | +| string | text | +| binary | bytea | +| char | bpchar | +| varchar | varchar | +| timestamp | timestamp | +| date | date | + + +### Complex Data Types + +Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`. While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract subcomponents of these complex data types. + +An example using complex data types is provided later in this topic. + + +## Sample Data Set + +Examples used in this topic will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types: + +- location - text +- month - text +- number\_of\_orders - integer +- total\_sales - double + +Prepare the sample data set for use: + +1. First, create a text file: + +``` +$ vi /tmp/pxf_hive_datafile.txt +``` + +2. Add the following data to `pxf_hive_datafile.txt`; notice the use of the comma `,` to separate the four field values: + +``` +Prague,Jan,101,487
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612367#comment-15612367 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85367943 --- Diff: pxf/HivePXF.html.md.erb --- @@ -2,121 +2,450 @@ title: Accessing Hive Data --- -This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog. +Apache Hive is a distributed data warehousing infrastructure. Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. + +This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include: + +- Creating an external table in PXF and querying that table +- Querying Hive tables via PXF's integration with HCatalog ## Prerequisites -Check the following before using PXF to access Hive: +Before accessing Hive data with HAWQ and PXF, ensure that: -- The PXF HDFS plug-in is installed on all cluster nodes. +- The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. - The PXF Hive plug-in is installed on all cluster nodes. - The Hive JAR files and conf directory are installed on all cluster nodes. -- Test PXF on HDFS before connecting to Hive or HBase. +- You have tested PXF on HDFS. - You are running the Hive Metastore service on a machine in your cluster. - You have set the `hive.metastore.uris` property in the `hive-site.xml` on the NameNode. +## Hive File Formats + +Hive supports several file formats: + +- TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation +- SequenceFile - flat file consisting of binary key/value pairs +- RCFile - record columnar data consisting of binary key/value pairs; high row compression rate +- ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size +- Parquet - compressed columnar data representation +- Avro - JSON-defined, schema-based data serialization format + +Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive. + +The PXF Hive plug-in supports the following profiles for accessing the Hive file formats listed above. These include: + +- `Hive` +- `HiveText` +- `HiveRC` + +## Data Type Mapping + +### Primitive Data Types + +To represent Hive data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type. + +The following table summarizes external mapping rules for Hive primitive types. + +| Hive Data Type | Hawq Data Type | +|---|---| +| boolean| bool | +| int | int4 | +| smallint | int2 | +| tinyint | int2 | +| bigint | int8 | +| decimal | numeric | +| float | float4 | +| double | float8 | +| string | text | +| binary | bytea | +| char | bpchar | +| varchar | varchar | +| timestamp | timestamp | +| date | date | + + +### Complex Data Types + +Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`. While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract subcomponents of these complex data types. + +An example using complex data types is provided later in this topic. + + +## Sample Data Set + +Examples used in this topic will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types: + +- location - text +- month - text +- number\_of\_orders - integer +- total\_sales - double + +Prepare the sample data set for use: + +1. First, create a text file: + +``` +$ vi /tmp/pxf_hive_datafile.txt +``` + +2. Add the following data to `pxf_hive_datafile.txt`; notice the use of the comma `,` to separate the four field values: + +``` +Prague,Jan,101,487
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612357#comment-15612357 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85367789 --- Diff: pxf/HivePXF.html.md.erb --- @@ -2,121 +2,450 @@ title: Accessing Hive Data --- -This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog. +Apache Hive is a distributed data warehousing infrastructure. Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. + +This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include: + +- Creating an external table in PXF and querying that table +- Querying Hive tables via PXF's integration with HCatalog ## Prerequisites -Check the following before using PXF to access Hive: +Before accessing Hive data with HAWQ and PXF, ensure that: -- The PXF HDFS plug-in is installed on all cluster nodes. +- The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. - The PXF Hive plug-in is installed on all cluster nodes. - The Hive JAR files and conf directory are installed on all cluster nodes. -- Test PXF on HDFS before connecting to Hive or HBase. +- You have tested PXF on HDFS. - You are running the Hive Metastore service on a machine in your cluster. - You have set the `hive.metastore.uris` property in the `hive-site.xml` on the NameNode. +## Hive File Formats + +Hive supports several file formats: + +- TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation +- SequenceFile - flat file consisting of binary key/value pairs +- RCFile - record columnar data consisting of binary key/value pairs; high row compression rate +- ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size +- Parquet - compressed columnar data representation +- Avro - JSON-defined, schema-based data serialization format + +Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive. + +The PXF Hive plug-in supports the following profiles for accessing the Hive file formats listed above. These include: + +- `Hive` +- `HiveText` +- `HiveRC` + +## Data Type Mapping + +### Primitive Data Types + +To represent Hive data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type. + +The following table summarizes external mapping rules for Hive primitive types. + +| Hive Data Type | Hawq Data Type | +|---|---| +| boolean| bool | +| int | int4 | +| smallint | int2 | +| tinyint | int2 | +| bigint | int8 | +| decimal | numeric | +| float | float4 | +| double | float8 | +| string | text | +| binary | bytea | +| char | bpchar | +| varchar | varchar | +| timestamp | timestamp | +| date | date | + + +### Complex Data Types + +Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`. While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract subcomponents of these complex data types. + +An example using complex data types is provided later in this topic. + + +## Sample Data Set + +Examples used in this topic will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types: + +- location - text +- month - text +- number\_of\_orders - integer +- total\_sales - double + +Prepare the sample data set for use: + +1. First, create a text file: + +``` +$ vi /tmp/pxf_hive_datafile.txt +``` + +2. Add the following data to `pxf_hive_datafile.txt`; notice the use of the comma `,` to separate the four field values: + +``` +Prague,Jan,101,487
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612366#comment-15612366 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85369947 --- Diff: pxf/HivePXF.html.md.erb --- @@ -2,121 +2,450 @@ title: Accessing Hive Data --- -This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog. +Apache Hive is a distributed data warehousing infrastructure. Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. + +This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include: + +- Creating an external table in PXF and querying that table +- Querying Hive tables via PXF's integration with HCatalog ## Prerequisites -Check the following before using PXF to access Hive: +Before accessing Hive data with HAWQ and PXF, ensure that: -- The PXF HDFS plug-in is installed on all cluster nodes. +- The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. - The PXF Hive plug-in is installed on all cluster nodes. - The Hive JAR files and conf directory are installed on all cluster nodes. -- Test PXF on HDFS before connecting to Hive or HBase. +- You have tested PXF on HDFS. - You are running the Hive Metastore service on a machine in your cluster. - You have set the `hive.metastore.uris` property in the `hive-site.xml` on the NameNode. +## Hive File Formats + +Hive supports several file formats: + +- TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation +- SequenceFile - flat file consisting of binary key/value pairs +- RCFile - record columnar data consisting of binary key/value pairs; high row compression rate +- ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size +- Parquet - compressed columnar data representation +- Avro - JSON-defined, schema-based data serialization format + +Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive. + +The PXF Hive plug-in supports the following profiles for accessing the Hive file formats listed above. These include: + +- `Hive` +- `HiveText` +- `HiveRC` + +## Data Type Mapping + +### Primitive Data Types + +To represent Hive data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type. + +The following table summarizes external mapping rules for Hive primitive types. + +| Hive Data Type | Hawq Data Type | +|---|---| +| boolean| bool | +| int | int4 | +| smallint | int2 | +| tinyint | int2 | +| bigint | int8 | +| decimal | numeric | +| float | float4 | +| double | float8 | +| string | text | +| binary | bytea | +| char | bpchar | +| varchar | varchar | +| timestamp | timestamp | +| date | date | + + +### Complex Data Types + +Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`. While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract subcomponents of these complex data types. + +An example using complex data types is provided later in this topic. + + +## Sample Data Set + +Examples used in this topic will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types: + +- location - text +- month - text +- number\_of\_orders - integer +- total\_sales - double + +Prepare the sample data set for use: + +1. First, create a text file: + +``` +$ vi /tmp/pxf_hive_datafile.txt +``` + +2. Add the following data to `pxf_hive_datafile.txt`; notice the use of the comma `,` to separate the four field values: + +``` +Prague,Jan,101,487
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612358#comment-15612358 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85370681 --- Diff: pxf/HivePXF.html.md.erb --- @@ -151,184 +477,120 @@ To enable HCatalog query integration in HAWQ, perform the following steps: postgres=# GRANT ALL ON PROTOCOL pxf TO "role"; ``` -3. To query a Hive table with HCatalog integration, simply query HCatalog directly from HAWQ. The query syntax is: -``` sql -postgres=# SELECT * FROM hcatalog.hive-db-name.hive-table-name; -``` +To query a Hive table with HCatalog integration, query HCatalog directly from HAWQ. The query syntax is: --- End diff -- It's a bit awkward to drop out of the procedure and into free-form discussion of the various operations. I think it might be better to put the previous 3-step procedure into a new subsection like "Enabling HCatalog Integration" and then putting the remaining non-procedural content into "Usage" ? > add PXF HiveText and HiveRC profile examples to the documentation > - > > Key: HAWQ-1071 > URL: https://issues.apache.org/jira/browse/HAWQ-1071 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF Hive documentation includes an example for only the Hive > profile. add examples for HiveText and HiveRC profiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612365#comment-15612365 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85365540 --- Diff: pxf/HivePXF.html.md.erb --- @@ -2,121 +2,450 @@ title: Accessing Hive Data --- -This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog. +Apache Hive is a distributed data warehousing infrastructure. Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. + +This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include: + +- Creating an external table in PXF and querying that table +- Querying Hive tables via PXF's integration with HCatalog ## Prerequisites -Check the following before using PXF to access Hive: +Before accessing Hive data with HAWQ and PXF, ensure that: -- The PXF HDFS plug-in is installed on all cluster nodes. +- The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. - The PXF Hive plug-in is installed on all cluster nodes. - The Hive JAR files and conf directory are installed on all cluster nodes. -- Test PXF on HDFS before connecting to Hive or HBase. +- You have tested PXF on HDFS. - You are running the Hive Metastore service on a machine in your cluster. - You have set the `hive.metastore.uris` property in the `hive-site.xml` on the NameNode. +## Hive File Formats + +Hive supports several file formats: + +- TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation +- SequenceFile - flat file consisting of binary key/value pairs +- RCFile - record columnar data consisting of binary key/value pairs; high row compression rate +- ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size +- Parquet - compressed columnar data representation +- Avro - JSON-defined, schema-based data serialization format --- End diff -- Just a suggestion, but I think this would read better as a 2-column term/definition table. You could even make it a 3-column table to describe which PXF plug-ins are used with each format. > add PXF HiveText and HiveRC profile examples to the documentation > - > > Key: HAWQ-1071 > URL: https://issues.apache.org/jira/browse/HAWQ-1071 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF Hive documentation includes an example for only the Hive > profile. add examples for HiveText and HiveRC profiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612362#comment-15612362 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85368842 --- Diff: pxf/HivePXF.html.md.erb --- @@ -2,121 +2,450 @@ title: Accessing Hive Data --- -This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog. +Apache Hive is a distributed data warehousing infrastructure. Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. + +This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include: + +- Creating an external table in PXF and querying that table +- Querying Hive tables via PXF's integration with HCatalog ## Prerequisites -Check the following before using PXF to access Hive: +Before accessing Hive data with HAWQ and PXF, ensure that: -- The PXF HDFS plug-in is installed on all cluster nodes. +- The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. - The PXF Hive plug-in is installed on all cluster nodes. - The Hive JAR files and conf directory are installed on all cluster nodes. -- Test PXF on HDFS before connecting to Hive or HBase. +- You have tested PXF on HDFS. - You are running the Hive Metastore service on a machine in your cluster. - You have set the `hive.metastore.uris` property in the `hive-site.xml` on the NameNode. +## Hive File Formats + +Hive supports several file formats: + +- TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation +- SequenceFile - flat file consisting of binary key/value pairs +- RCFile - record columnar data consisting of binary key/value pairs; high row compression rate +- ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size +- Parquet - compressed columnar data representation +- Avro - JSON-defined, schema-based data serialization format + +Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive. + +The PXF Hive plug-in supports the following profiles for accessing the Hive file formats listed above. These include: + +- `Hive` +- `HiveText` +- `HiveRC` + +## Data Type Mapping + +### Primitive Data Types + +To represent Hive data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type. + +The following table summarizes external mapping rules for Hive primitive types. + +| Hive Data Type | Hawq Data Type | +|---|---| +| boolean| bool | +| int | int4 | +| smallint | int2 | +| tinyint | int2 | +| bigint | int8 | +| decimal | numeric | +| float | float4 | +| double | float8 | +| string | text | +| binary | bytea | +| char | bpchar | +| varchar | varchar | +| timestamp | timestamp | +| date | date | + + +### Complex Data Types + +Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`. While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract subcomponents of these complex data types. + +An example using complex data types is provided later in this topic. + + +## Sample Data Set + +Examples used in this topic will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types: + +- location - text +- month - text +- number\_of\_orders - integer +- total\_sales - double + +Prepare the sample data set for use: + +1. First, create a text file: + +``` +$ vi /tmp/pxf_hive_datafile.txt +``` + +2. Add the following data to `pxf_hive_datafile.txt`; notice the use of the comma `,` to separate the four field values: + +``` +Prague,Jan,101,487
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612370#comment-15612370 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user kavinderd commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85372290 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,506 +2,449 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; +If you find that the pre-defined PXF HDFS profiles do not meet your needs, you may choose to create a custom HDFS profile from the existing HDFS serialization and deserialization classes. Refer to [Adding and Updating Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on creating a custom profile. + +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, and so forth. + +The HDFS file system command syntax is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. + +`hdfs dfs` options used in this topic are: + +| Option | Description | +|---|-| +| `-cat`| Display file contents. | +| `-mkdir`| Create directory in HDFS. | +| `-put`| Copy file from local file system to HDFS. | + +Examples: + +Create a directory in HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir ``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +Copy a text file to HDFS: -``` sql -INSERT INTO table_name ...; +``` shell +$ sudo -u hdfs hdfs dfs -put /tmp/example.txt /data/exampledir/ ``` -To read the data in the files or to write based on the existing format, use `FORMAT`, `PROFILE`, or one of the classes. - -This topic describes the following: - -- FORMAT clause -- Profile -- Accessor -- Resolver -- Avro - -**Note:** For more details about the API and classes, see [PXF External Tables and API](PXFExternalTableandAPIReference.html#pxfexternaltableandapireference). - -### FORMAT clause - -Use one of the following formats to read data with any PXF connector: - -- `FORMAT 'TEXT'`: Use with plain delimited text files on HDFS. -- `FORMAT 'CSV'`: Us
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612361#comment-15612361 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85372086 --- Diff: pxf/HivePXF.html.md.erb --- @@ -2,121 +2,450 @@ title: Accessing Hive Data --- -This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog. +Apache Hive is a distributed data warehousing infrastructure. Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. + +This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include: + +- Creating an external table in PXF and querying that table +- Querying Hive tables via PXF's integration with HCatalog ## Prerequisites -Check the following before using PXF to access Hive: +Before accessing Hive data with HAWQ and PXF, ensure that: -- The PXF HDFS plug-in is installed on all cluster nodes. +- The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. - The PXF Hive plug-in is installed on all cluster nodes. - The Hive JAR files and conf directory are installed on all cluster nodes. -- Test PXF on HDFS before connecting to Hive or HBase. +- You have tested PXF on HDFS. - You are running the Hive Metastore service on a machine in your cluster. - You have set the `hive.metastore.uris` property in the `hive-site.xml` on the NameNode. +## Hive File Formats + +Hive supports several file formats: + +- TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation +- SequenceFile - flat file consisting of binary key/value pairs +- RCFile - record columnar data consisting of binary key/value pairs; high row compression rate +- ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size +- Parquet - compressed columnar data representation +- Avro - JSON-defined, schema-based data serialization format + +Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive. + +The PXF Hive plug-in supports the following profiles for accessing the Hive file formats listed above. These include: + +- `Hive` +- `HiveText` +- `HiveRC` + +## Data Type Mapping + +### Primitive Data Types + +To represent Hive data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type. + +The following table summarizes external mapping rules for Hive primitive types. + +| Hive Data Type | Hawq Data Type | +|---|---| +| boolean| bool | +| int | int4 | +| smallint | int2 | +| tinyint | int2 | +| bigint | int8 | +| decimal | numeric | +| float | float4 | +| double | float8 | +| string | text | +| binary | bytea | +| char | bpchar | +| varchar | varchar | +| timestamp | timestamp | +| date | date | + + +### Complex Data Types + +Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`. While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract subcomponents of these complex data types. + +An example using complex data types is provided later in this topic. + + +## Sample Data Set + +Examples used in this topic will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types: + +- location - text +- month - text +- number\_of\_orders - integer +- total\_sales - double + +Prepare the sample data set for use: + +1. First, create a text file: + +``` +$ vi /tmp/pxf_hive_datafile.txt +``` + +2. Add the following data to `pxf_hive_datafile.txt`; notice the use of the comma `,` to separate the four field values: + +``` +Prague,Jan,101,487
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612373#comment-15612373 ] ASF GitHub Bot commented on HAWQ-1107: -- GitHub user lisakowen opened a pull request: https://github.com/apache/incubator-hawq-docs/pull/41 HAWQ-1107 - incorporate kavinder's comments incorporated kavinder's comments on HDFS plug in doc restructure. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lisakowen/incubator-hawq-docs feature/pxfhdfs-enhance Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-docs/pull/41.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #41 commit e16a4a46b6ab2a180e99f5fc793bbabb4f4cbfec Author: Lisa Owen Date: 2016-10-27T16:10:29Z incorporate kavinder's comments > PXF HDFS documentation - restructure content and include more examples > -- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, > Avro) profiles. restructure the content as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612406#comment-15612406 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user kavinderd commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85375160 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,506 +2,449 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; +If you find that the pre-defined PXF HDFS profiles do not meet your needs, you may choose to create a custom HDFS profile from the existing HDFS serialization and deserialization classes. Refer to [Adding and Updating Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on creating a custom profile. + +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, and so forth. + +The HDFS file system command syntax is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. + +`hdfs dfs` options used in this topic are: + +| Option | Description | +|---|-| +| `-cat`| Display file contents. | +| `-mkdir`| Create directory in HDFS. | +| `-put`| Copy file from local file system to HDFS. | + +Examples: + +Create a directory in HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir ``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +Copy a text file to HDFS: -``` sql -INSERT INTO table_name ...; +``` shell +$ sudo -u hdfs hdfs dfs -put /tmp/example.txt /data/exampledir/ ``` -To read the data in the files or to write based on the existing format, use `FORMAT`, `PROFILE`, or one of the classes. - -This topic describes the following: - -- FORMAT clause -- Profile -- Accessor -- Resolver -- Avro - -**Note:** For more details about the API and classes, see [PXF External Tables and API](PXFExternalTableandAPIReference.html#pxfexternaltableandapireference). - -### FORMAT clause - -Use one of the following formats to read data with any PXF connector: - -- `FORMAT 'TEXT'`: Use with plain delimited text files on HDFS. -- `FORMAT 'CSV'`: Us
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612400#comment-15612400 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user lisakowen commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85374612 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,506 +2,449 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[&custom-option=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; +If you find that the pre-defined PXF HDFS profiles do not meet your needs, you may choose to create a custom HDFS profile from the existing HDFS serialization and deserialization classes. Refer to [Adding and Updating Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on creating a custom profile. + +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, and so forth. + +The HDFS file system command syntax is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. + +`hdfs dfs` options used in this topic are: + +| Option | Description | +|---|-| +| `-cat`| Display file contents. | +| `-mkdir`| Create directory in HDFS. | +| `-put`| Copy file from local file system to HDFS. | + +Examples: + +Create a directory in HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir ``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +Copy a text file to HDFS: -``` sql -INSERT INTO table_name ...; +``` shell +$ sudo -u hdfs hdfs dfs -put /tmp/example.txt /data/exampledir/ ``` -To read the data in the files or to write based on the existing format, use `FORMAT`, `PROFILE`, or one of the classes. - -This topic describes the following: - -- FORMAT clause -- Profile -- Accessor -- Resolver -- Avro - -**Note:** For more details about the API and classes, see [PXF External Tables and API](PXFExternalTableandAPIReference.html#pxfexternaltableandapireference). - -### FORMAT clause - -Use one of the following formats to read data with any PXF connector: - -- `FORMAT 'TEXT'`: Use with plain delimited text files on HDFS. -- `FORMAT 'CSV'`: Us
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612454#comment-15612454 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user kavinderd commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85376355 --- Diff: pxf/HivePXF.html.md.erb --- @@ -2,121 +2,450 @@ title: Accessing Hive Data --- -This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog. +Apache Hive is a distributed data warehousing infrastructure. Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. + +This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include: + +- Creating an external table in PXF and querying that table +- Querying Hive tables via PXF's integration with HCatalog ## Prerequisites -Check the following before using PXF to access Hive: +Before accessing Hive data with HAWQ and PXF, ensure that: -- The PXF HDFS plug-in is installed on all cluster nodes. +- The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. - The PXF Hive plug-in is installed on all cluster nodes. - The Hive JAR files and conf directory are installed on all cluster nodes. -- Test PXF on HDFS before connecting to Hive or HBase. +- You have tested PXF on HDFS. - You are running the Hive Metastore service on a machine in your cluster. - You have set the `hive.metastore.uris` property in the `hive-site.xml` on the NameNode. +## Hive File Formats + +Hive supports several file formats: + +- TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation +- SequenceFile - flat file consisting of binary key/value pairs +- RCFile - record columnar data consisting of binary key/value pairs; high row compression rate +- ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size +- Parquet - compressed columnar data representation +- Avro - JSON-defined, schema-based data serialization format + +Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive. + +The PXF Hive plug-in supports the following profiles for accessing the Hive file formats listed above. These include: + +- `Hive` +- `HiveText` +- `HiveRC` + +## Data Type Mapping + +### Primitive Data Types + +To represent Hive data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type. + +The following table summarizes external mapping rules for Hive primitive types. + +| Hive Data Type | Hawq Data Type | +|---|---| +| boolean| bool | +| int | int4 | +| smallint | int2 | +| tinyint | int2 | +| bigint | int8 | +| decimal | numeric | +| float | float4 | +| double | float8 | +| string | text | +| binary | bytea | +| char | bpchar | +| varchar | varchar | +| timestamp | timestamp | +| date | date | + + +### Complex Data Types + +Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`. While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract subcomponents of these complex data types. + +An example using complex data types is provided later in this topic. + + +## Sample Data Set + +Examples used in this topic will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types: + +- location - text +- month - text +- number\_of\_orders - integer +- total\_sales - double + +Prepare the sample data set for use: + +1. First, create a text file: + +``` +$ vi /tmp/pxf_hive_datafile.txt +``` + +2. Add the following data to `pxf_hive_datafile.txt`; notice the use of the comma `,` to separate the four field values: + +``` +Prague,Jan,101,
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612455#comment-15612455 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user kavinderd commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85376289 --- Diff: pxf/HivePXF.html.md.erb --- @@ -2,121 +2,450 @@ title: Accessing Hive Data --- -This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog. +Apache Hive is a distributed data warehousing infrastructure. Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. + +This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include: + +- Creating an external table in PXF and querying that table +- Querying Hive tables via PXF's integration with HCatalog ## Prerequisites -Check the following before using PXF to access Hive: +Before accessing Hive data with HAWQ and PXF, ensure that: -- The PXF HDFS plug-in is installed on all cluster nodes. +- The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. - The PXF Hive plug-in is installed on all cluster nodes. - The Hive JAR files and conf directory are installed on all cluster nodes. -- Test PXF on HDFS before connecting to Hive or HBase. +- You have tested PXF on HDFS. - You are running the Hive Metastore service on a machine in your cluster. - You have set the `hive.metastore.uris` property in the `hive-site.xml` on the NameNode. +## Hive File Formats + +Hive supports several file formats: + +- TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation +- SequenceFile - flat file consisting of binary key/value pairs +- RCFile - record columnar data consisting of binary key/value pairs; high row compression rate +- ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size +- Parquet - compressed columnar data representation +- Avro - JSON-defined, schema-based data serialization format + +Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive. + +The PXF Hive plug-in supports the following profiles for accessing the Hive file formats listed above. These include: + +- `Hive` +- `HiveText` +- `HiveRC` + +## Data Type Mapping + +### Primitive Data Types + +To represent Hive data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type. + +The following table summarizes external mapping rules for Hive primitive types. + +| Hive Data Type | Hawq Data Type | +|---|---| +| boolean| bool | +| int | int4 | +| smallint | int2 | +| tinyint | int2 | +| bigint | int8 | +| decimal | numeric | +| float | float4 | +| double | float8 | +| string | text | +| binary | bytea | +| char | bpchar | +| varchar | varchar | +| timestamp | timestamp | +| date | date | + + +### Complex Data Types + +Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`. While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract subcomponents of these complex data types. + +An example using complex data types is provided later in this topic. + + +## Sample Data Set + +Examples used in this topic will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types: + +- location - text +- month - text +- number\_of\_orders - integer +- total\_sales - double + +Prepare the sample data set for use: + +1. First, create a text file: + +``` +$ vi /tmp/pxf_hive_datafile.txt +``` + +2. Add the following data to `pxf_hive_datafile.txt`; notice the use of the comma `,` to separate the four field values: + +``` +Prague,Jan,101,
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612453#comment-15612453 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user kavinderd commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85378363 --- Diff: pxf/HivePXF.html.md.erb --- @@ -339,21 +601,21 @@ postgres=# CREATE EXTERNAL TABLE pxf_sales_part( delivery_state TEXT, delivery_city TEXT ) -LOCATION ('pxf://namenode_host:51200/sales_part?Profile=Hive') +LOCATION ('pxf://namenode:51200/sales_part?Profile=Hive') FORMAT 'custom' (FORMATTER='pxfwritable_import'); postgres=# SELECT * FROM pxf_sales_part; ``` -### Example +### Query Without Pushdown In the following example, the HAWQ query filters the `delivery_city` partition `Sacramento`. The filter on `item_name` is not pushed down, since it is not a partition column. It is performed on the HAWQ side after all the data on `Sacramento` is transferred for processing. ``` sql -postgres=# SELECT * FROM pxf_sales_part WHERE delivery_city = 'Sacramento' AND item_name = 'shirt'; +postgres=# SELECT * FROM pxf_sales_part WHERE delivery_city = 'Sacramento' AND item_name = 'cube'; ``` -### Example +### Query With Pushdown --- End diff -- Somewhere it should be stated that the HAWQ GUC `pxf_enable_filter_pushdown` needs to be turned on. If this is off no filter pushdown will occur regardless of the nature of the query. > add PXF HiveText and HiveRC profile examples to the documentation > - > > Key: HAWQ-1071 > URL: https://issues.apache.org/jira/browse/HAWQ-1071 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF Hive documentation includes an example for only the Hive > profile. add examples for HiveText and HiveRC profiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612800#comment-15612800 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user lisakowen commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85403929 --- Diff: pxf/HivePXF.html.md.erb --- @@ -2,121 +2,450 @@ title: Accessing Hive Data --- -This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog. +Apache Hive is a distributed data warehousing infrastructure. Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. + +This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include: + +- Creating an external table in PXF and querying that table +- Querying Hive tables via PXF's integration with HCatalog ## Prerequisites -Check the following before using PXF to access Hive: +Before accessing Hive data with HAWQ and PXF, ensure that: -- The PXF HDFS plug-in is installed on all cluster nodes. +- The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. - The PXF Hive plug-in is installed on all cluster nodes. - The Hive JAR files and conf directory are installed on all cluster nodes. -- Test PXF on HDFS before connecting to Hive or HBase. +- You have tested PXF on HDFS. - You are running the Hive Metastore service on a machine in your cluster. - You have set the `hive.metastore.uris` property in the `hive-site.xml` on the NameNode. +## Hive File Formats + +Hive supports several file formats: + +- TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation +- SequenceFile - flat file consisting of binary key/value pairs +- RCFile - record columnar data consisting of binary key/value pairs; high row compression rate +- ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size +- Parquet - compressed columnar data representation +- Avro - JSON-defined, schema-based data serialization format + +Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive. + +The PXF Hive plug-in supports the following profiles for accessing the Hive file formats listed above. These include: + +- `Hive` +- `HiveText` +- `HiveRC` + +## Data Type Mapping + +### Primitive Data Types + +To represent Hive data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type. + +The following table summarizes external mapping rules for Hive primitive types. + +| Hive Data Type | Hawq Data Type | +|---|---| +| boolean| bool | +| int | int4 | +| smallint | int2 | +| tinyint | int2 | +| bigint | int8 | +| decimal | numeric | +| float | float4 | +| double | float8 | +| string | text | +| binary | bytea | +| char | bpchar | +| varchar | varchar | +| timestamp | timestamp | +| date | date | + + +### Complex Data Types + +Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`. While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract subcomponents of these complex data types. + +An example using complex data types is provided later in this topic. + + +## Sample Data Set + +Examples used in this topic will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types: + +- location - text +- month - text +- number\_of\_orders - integer +- total\_sales - double + +Prepare the sample data set for use: + +1. First, create a text file: + +``` +$ vi /tmp/pxf_hive_datafile.txt +``` + +2. Add the following data to `pxf_hive_datafile.txt`; notice the use of the comma `,` to separate the four field values: + +``` +Prague,Jan,101,
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612864#comment-15612864 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user lisakowen commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85407620 --- Diff: pxf/HivePXF.html.md.erb --- @@ -339,21 +601,21 @@ postgres=# CREATE EXTERNAL TABLE pxf_sales_part( delivery_state TEXT, delivery_city TEXT ) -LOCATION ('pxf://namenode_host:51200/sales_part?Profile=Hive') +LOCATION ('pxf://namenode:51200/sales_part?Profile=Hive') FORMAT 'custom' (FORMATTER='pxfwritable_import'); postgres=# SELECT * FROM pxf_sales_part; ``` -### Example +### Query Without Pushdown In the following example, the HAWQ query filters the `delivery_city` partition `Sacramento`. The filter on `item_name` is not pushed down, since it is not a partition column. It is performed on the HAWQ side after all the data on `Sacramento` is transferred for processing. ``` sql -postgres=# SELECT * FROM pxf_sales_part WHERE delivery_city = 'Sacramento' AND item_name = 'shirt'; +postgres=# SELECT * FROM pxf_sales_part WHERE delivery_city = 'Sacramento' AND item_name = 'cube'; ``` -### Example +### Query With Pushdown --- End diff -- yes, this is good info to share with the user! i checked out the code, and it looks like this GUC is on by default. i will add some text to that effect in the appropriate section. > add PXF HiveText and HiveRC profile examples to the documentation > - > > Key: HAWQ-1071 > URL: https://issues.apache.org/jira/browse/HAWQ-1071 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF Hive documentation includes an example for only the Hive > profile. add examples for HiveText and HiveRC profiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612866#comment-15612866 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user lisakowen commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85407700 --- Diff: pxf/HivePXF.html.md.erb --- @@ -2,121 +2,450 @@ title: Accessing Hive Data --- -This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog. +Apache Hive is a distributed data warehousing infrastructure. Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. + +This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include: + +- Creating an external table in PXF and querying that table +- Querying Hive tables via PXF's integration with HCatalog ## Prerequisites -Check the following before using PXF to access Hive: +Before accessing Hive data with HAWQ and PXF, ensure that: -- The PXF HDFS plug-in is installed on all cluster nodes. +- The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. - The PXF Hive plug-in is installed on all cluster nodes. - The Hive JAR files and conf directory are installed on all cluster nodes. -- Test PXF on HDFS before connecting to Hive or HBase. +- You have tested PXF on HDFS. - You are running the Hive Metastore service on a machine in your cluster. - You have set the `hive.metastore.uris` property in the `hive-site.xml` on the NameNode. +## Hive File Formats + +Hive supports several file formats: + +- TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation +- SequenceFile - flat file consisting of binary key/value pairs +- RCFile - record columnar data consisting of binary key/value pairs; high row compression rate +- ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size +- Parquet - compressed columnar data representation +- Avro - JSON-defined, schema-based data serialization format + +Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive. + +The PXF Hive plug-in supports the following profiles for accessing the Hive file formats listed above. These include: + +- `Hive` +- `HiveText` +- `HiveRC` + +## Data Type Mapping + +### Primitive Data Types + +To represent Hive data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type. + +The following table summarizes external mapping rules for Hive primitive types. + +| Hive Data Type | Hawq Data Type | +|---|---| +| boolean| bool | +| int | int4 | +| smallint | int2 | +| tinyint | int2 | +| bigint | int8 | +| decimal | numeric | +| float | float4 | +| double | float8 | +| string | text | +| binary | bytea | +| char | bpchar | +| varchar | varchar | +| timestamp | timestamp | +| date | date | + + +### Complex Data Types + +Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`. While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract subcomponents of these complex data types. + +An example using complex data types is provided later in this topic. + + +## Sample Data Set + +Examples used in this topic will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types: + +- location - text +- month - text +- number\_of\_orders - integer +- total\_sales - double + +Prepare the sample data set for use: + +1. First, create a text file: + +``` +$ vi /tmp/pxf_hive_datafile.txt +``` + +2. Add the following data to `pxf_hive_datafile.txt`; notice the use of the comma `,` to separate the four field values: + +``` +Prague,Jan,101,
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612876#comment-15612876 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user lisakowen commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85408122 --- Diff: pxf/HivePXF.html.md.erb --- @@ -2,121 +2,450 @@ title: Accessing Hive Data --- -This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog. +Apache Hive is a distributed data warehousing infrastructure. Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. + +This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include: + +- Creating an external table in PXF and querying that table +- Querying Hive tables via PXF's integration with HCatalog ## Prerequisites -Check the following before using PXF to access Hive: +Before accessing Hive data with HAWQ and PXF, ensure that: -- The PXF HDFS plug-in is installed on all cluster nodes. +- The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. - The PXF Hive plug-in is installed on all cluster nodes. - The Hive JAR files and conf directory are installed on all cluster nodes. -- Test PXF on HDFS before connecting to Hive or HBase. +- You have tested PXF on HDFS. - You are running the Hive Metastore service on a machine in your cluster. - You have set the `hive.metastore.uris` property in the `hive-site.xml` on the NameNode. +## Hive File Formats + +Hive supports several file formats: + +- TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation +- SequenceFile - flat file consisting of binary key/value pairs +- RCFile - record columnar data consisting of binary key/value pairs; high row compression rate +- ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size +- Parquet - compressed columnar data representation +- Avro - JSON-defined, schema-based data serialization format + +Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive. + +The PXF Hive plug-in supports the following profiles for accessing the Hive file formats listed above. These include: + +- `Hive` +- `HiveText` +- `HiveRC` + +## Data Type Mapping + +### Primitive Data Types + +To represent Hive data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type. + +The following table summarizes external mapping rules for Hive primitive types. + +| Hive Data Type | Hawq Data Type | +|---|---| +| boolean| bool | +| int | int4 | +| smallint | int2 | +| tinyint | int2 | +| bigint | int8 | +| decimal | numeric | +| float | float4 | +| double | float8 | +| string | text | +| binary | bytea | +| char | bpchar | +| varchar | varchar | +| timestamp | timestamp | +| date | date | + + +### Complex Data Types + +Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`. While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract subcomponents of these complex data types. + +An example using complex data types is provided later in this topic. + + +## Sample Data Set + +Examples used in this topic will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types: + +- location - text +- month - text +- number\_of\_orders - integer +- total\_sales - double + +Prepare the sample data set for use: + +1. First, create a text file: + +``` +$ vi /tmp/pxf_hive_datafile.txt +``` + +2. Add the following data to `pxf_hive_datafile.txt`; notice the use of the comma `,` to separate the four field values: + +``` +Prague,Jan,101,
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613170#comment-15613170 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user lisakowen commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85424776 --- Diff: pxf/HivePXF.html.md.erb --- @@ -339,21 +601,21 @@ postgres=# CREATE EXTERNAL TABLE pxf_sales_part( delivery_state TEXT, delivery_city TEXT ) -LOCATION ('pxf://namenode_host:51200/sales_part?Profile=Hive') +LOCATION ('pxf://namenode:51200/sales_part?Profile=Hive') FORMAT 'custom' (FORMATTER='pxfwritable_import'); postgres=# SELECT * FROM pxf_sales_part; ``` -### Example +### Query Without Pushdown In the following example, the HAWQ query filters the `delivery_city` partition `Sacramento`. The filter on `item_name` is not pushed down, since it is not a partition column. It is performed on the HAWQ side after all the data on `Sacramento` is transferred for processing. ``` sql -postgres=# SELECT * FROM pxf_sales_part WHERE delivery_city = 'Sacramento' AND item_name = 'shirt'; +postgres=# SELECT * FROM pxf_sales_part WHERE delivery_city = 'Sacramento' AND item_name = 'cube'; ``` -### Example +### Query With Pushdown --- End diff -- will also need to add this GUC to the documentation. > add PXF HiveText and HiveRC profile examples to the documentation > - > > Key: HAWQ-1071 > URL: https://issues.apache.org/jira/browse/HAWQ-1071 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF Hive documentation includes an example for only the Hive > profile. add examples for HiveText and HiveRC profiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613291#comment-15613291 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq-docs/pull/40 > add PXF HiveText and HiveRC profile examples to the documentation > - > > Key: HAWQ-1071 > URL: https://issues.apache.org/jira/browse/HAWQ-1071 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF Hive documentation includes an example for only the Hive > profile. add examples for HiveText and HiveRC profiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613661#comment-15613661 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq-docs/pull/39 > add PXF HiveText and HiveRC profile examples to the documentation > - > > Key: HAWQ-1071 > URL: https://issues.apache.org/jira/browse/HAWQ-1071 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF Hive documentation includes an example for only the Hive > profile. add examples for HiveText and HiveRC profiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613688#comment-15613688 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq-docs/pull/41 > PXF HDFS documentation - restructure content and include more examples > -- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, > Avro) profiles. restructure the content as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1119) create new documentation topic for PXF writable profiles
[ https://issues.apache.org/jira/browse/HAWQ-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616785#comment-15616785 ] ASF GitHub Bot commented on HAWQ-1119: -- GitHub user lisakowen opened a pull request: https://github.com/apache/incubator-hawq-docs/pull/46 HAWQ-1119 - create doc content for PXF writable profiles created a new section for PXF writable profiles (HDFS plug in HdfsTextSimple and SequenceWritable). included examples and discussions. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lisakowen/incubator-hawq-docs feature/pxfhdfs-writable Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-docs/pull/46.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #46 commit f2304ce06b0529177efee1912c6c3c3b9aaf5b1f Author: Lisa Owen Date: 2016-10-25T20:14:42Z add file for PXF HDFS writable profile topic commit 80dc3dbe33397ef872265afe50551957cd773bef Author: Lisa Owen Date: 2016-10-26T19:32:04Z adding more content commit fddb5b8817fdf3a800eb7099d9417bd05735abab Author: Lisa Owen Date: 2016-10-28T21:58:01Z flesh out sequencewritable profile section commit a88c167ead7135a6f43d11cb6c4921fe680e60b9 Author: Lisa Owen Date: 2016-10-28T22:00:17Z change section title commit a3cbdcf804d936c512f99e1b15905328f4f835f1 Author: Lisa Owen Date: 2016-10-28T22:13:54Z add link to writing to HDFS in pxf overview page > create new documentation topic for PXF writable profiles > > > Key: HAWQ-1119 > URL: https://issues.apache.org/jira/browse/HAWQ-1119 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie > Fix For: 2.0.1.0-incubating > > > certain profiles supported by the existing PXF plug-ins support writable > tables. create some documentation content for these profiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1119) create new documentation topic for PXF writable profiles
[ https://issues.apache.org/jira/browse/HAWQ-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616789#comment-15616789 ] ASF GitHub Bot commented on HAWQ-1119: -- GitHub user lisakowen opened a pull request: https://github.com/apache/incubator-hawq-docs/pull/47 HAWQ-1119 - subnav changes for new pxf section subnav changes for new "writing to HDFS" pxf section. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lisakowen/incubator-hawq-docs feature/subnav-pxfhdfs-writable Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-docs/pull/47.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #47 commit 42490f083ae5b84d1d5066a994fefe1d56bdc20a Author: Lisa Owen Date: 2016-10-28T22:20:35Z add pxf subtopic for writing data to hdfs > create new documentation topic for PXF writable profiles > > > Key: HAWQ-1119 > URL: https://issues.apache.org/jira/browse/HAWQ-1119 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie > Fix For: 2.0.1.0-incubating > > > certain profiles supported by the existing PXF plug-ins support writable > tables. create some documentation content for these profiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1119) create new documentation topic for PXF writable profiles
[ https://issues.apache.org/jira/browse/HAWQ-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623612#comment-15623612 ] ASF GitHub Bot commented on HAWQ-1119: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85814125 --- Diff: pxf/HDFSWritablePXF.html.md.erb --- @@ -0,0 +1,410 @@ +--- +title: Writing Data to HDFS +--- + +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles. You might create a writable table to export data from a HAWQ internal table to HDFS. + +This section describes how to use these PXF profiles to create writable external tables. + +**Note**: You cannot directly query data in a HAWQ writable table. After creating the external writable table, you must create a HAWQ readable external table accessing the HDFS file, then query that table. ??You can also create a Hive table to access the HDFS file.?? + +## Prerequisites + +Before working with HDFS file data using HAWQ and PXF, ensure that: + +- The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. + +## Writing to PXF External Tables +The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and `SequenceWritable`. + +Use the following syntax to create a HAWQ external writable table representing HDFS data: + +``` sql +CREATE EXTERNAL WRITABLE TABLE +( [, ...] | LIKE ) +LOCATION ('pxf://[:]/ + ?PROFILE=HdfsTextSimple|SequenceWritable[&=[...]]') +FORMAT '[TEXT|CSV|CUSTOM]' (); +``` + +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below. + +| Keyword | Value | +|---|-| +| \[:\]| The HDFS NameNode and port. | +| \| The path to the file in the HDFS data store. | +| PROFILE| The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. | +| \ | \ is profile-specific. These options are discussed in the next topic.| +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile when \ will reference a plain text delimited file. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=)` \. | +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` when \ will reference a comma-separated value file. | +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)` (write) and `(formatter='pxfwritable_import)` (read) \. + +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification. + +## Custom Options + +The `HdfsTextSimple` and `SequenceWritable` profiles support the following \: + +| Keyword | Value Description | +|---|-| +| COMPRESSION_CODEC| The compression codec Java class name. If this option is not provided, no data compression is performed. Supported compression codecs include: `org.apache.hadoop.io.compress.DefaultCodec`, `org.apache.hadoop.io.compress.BZip2Codec`, and `org.apache.hadoop.io.compress.GzipCodec` (`HdfsTextSimple` profile only) | +| COMPRESSION_TYPE| The compression type to employ; supported values are `RECORD` (the default) or `BLOCK`. | +| DATA-SCHEMA| (`SequenceWritable` profile only) The name of the writer serialization/deserialization class. The jar file in which this class resides must be in the PXF class path. This option has no default value. | +| THREAD-SAFE | Boolean value determining if a table query can run in multi-thread mode. Default value is `TRUE`, requests run in multi-threaded mode. When set to `FALSE`, requests will be handled in a single thread. `THREAD-SAFE` should be set appropriately when operations that are not thread-safe are performed (i.e. compression). | + +## HdfsTextSimple Profile + +Use the `HdfsTextSimple` profile when writing delimited data to a plain text file where each row is a single record. + +Writable tables created using the `HdfsTextSimple` profile can use no, record, or block compression. When compression is used, the default, gzip, and bzip2 Hadoop compression codecs are supported: + +- org.apache.hadoop.io.compress.DefaultCodec +- org.apache.hadoop.io.compres
[jira] [Commented] (HAWQ-1119) create new documentation topic for PXF writable profiles
[ https://issues.apache.org/jira/browse/HAWQ-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623628#comment-15623628 ] ASF GitHub Bot commented on HAWQ-1119: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85813812 --- Diff: pxf/HDFSWritablePXF.html.md.erb --- @@ -0,0 +1,410 @@ +--- +title: Writing Data to HDFS +--- + +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles. You might create a writable table to export data from a HAWQ internal table to HDFS. + +This section describes how to use these PXF profiles to create writable external tables. + +**Note**: You cannot directly query data in a HAWQ writable table. After creating the external writable table, you must create a HAWQ readable external table accessing the HDFS file, then query that table. ??You can also create a Hive table to access the HDFS file.?? + +## Prerequisites + +Before working with HDFS file data using HAWQ and PXF, ensure that: + +- The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. + +## Writing to PXF External Tables +The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and `SequenceWritable`. + +Use the following syntax to create a HAWQ external writable table representing HDFS data: + +``` sql +CREATE EXTERNAL WRITABLE TABLE +( [, ...] | LIKE ) +LOCATION ('pxf://[:]/ + ?PROFILE=HdfsTextSimple|SequenceWritable[&=[...]]') +FORMAT '[TEXT|CSV|CUSTOM]' (); +``` + +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below. + +| Keyword | Value | +|---|-| +| \[:\]| The HDFS NameNode and port. | +| \| The path to the file in the HDFS data store. | +| PROFILE| The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. | +| \ | \ is profile-specific. These options are discussed in the next topic.| +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile when \ will reference a plain text delimited file. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=)` \. | +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` when \ will reference a comma-separated value file. | +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)` (write) and `(formatter='pxfwritable_import)` (read) \. + +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification. + +## Custom Options + +The `HdfsTextSimple` and `SequenceWritable` profiles support the following \: + +| Keyword | Value Description | +|---|-| +| COMPRESSION_CODEC| The compression codec Java class name. If this option is not provided, no data compression is performed. Supported compression codecs include: `org.apache.hadoop.io.compress.DefaultCodec`, `org.apache.hadoop.io.compress.BZip2Codec`, and `org.apache.hadoop.io.compress.GzipCodec` (`HdfsTextSimple` profile only) | +| COMPRESSION_TYPE| The compression type to employ; supported values are `RECORD` (the default) or `BLOCK`. | +| DATA-SCHEMA| (`SequenceWritable` profile only) The name of the writer serialization/deserialization class. The jar file in which this class resides must be in the PXF class path. This option has no default value. | +| THREAD-SAFE | Boolean value determining if a table query can run in multi-thread mode. Default value is `TRUE`, requests run in multi-threaded mode. When set to `FALSE`, requests will be handled in a single thread. `THREAD-SAFE` should be set appropriately when operations that are not thread-safe are performed (i.e. compression). | + +## HdfsTextSimple Profile + +Use the `HdfsTextSimple` profile when writing delimited data to a plain text file where each row is a single record. + +Writable tables created using the `HdfsTextSimple` profile can use no, record, or block compression. When compression is used, the default, gzip, and bzip2 Hadoop compression codecs are supported: + +- org.apache.hadoop.io.compress.DefaultCodec +- org.apache.hadoop.io.compres