[jira] [Commented] (HIVE-951) Selectively include EXTERNAL TABLE source files via REGEX

2013-08-29 Thread indrajit (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753454#comment-13753454
 ] 

indrajit commented on HIVE-951:
---

External table really gives power to use the different tools on top of table . 
So you can get chance to do data mining. Its really very fast and easy to create

 Selectively include EXTERNAL TABLE source files via REGEX
 -

 Key: HIVE-951
 URL: https://issues.apache.org/jira/browse/HIVE-951
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Carl Steinbach
 Attachments: HIVE-951.patch


 CREATE EXTERNAL TABLE should allow users to cherry-pick files via regular 
 expression. 
 CREATE EXTERNAL TABLE was designed to allow users to access data that exists 
 outside of Hive, and
 currently makes the assumption that all of the files located under the 
 supplied path should be included
 in the new table. Users frequently encounter directories containing multiple
 datasets, or directories that contain data in heterogeneous schemas, and it's 
 often
 impractical or impossible to adjust the layout of the directory to meet the 
 requirements of 
 CREATE EXTERNAL TABLE. A good example of this problem is creating an external 
 table based
 on the contents of an S3 bucket. 
 One way to solve this problem is to extend the syntax of CREATE EXTERNAL TABLE
 as follows:
 CREATE EXTERNAL TABLE
 ...
 LOCATION path [file_regex]
 ...
 For example:
 {code:sql}
 CREATE EXTERNAL TABLE mytable1 ( a string, b string, c string )
 STORED AS TEXTFILE
 LOCATION 's3://my.bucket/' 'folder/2009.*\.bz2$';
 {code}
 Creates mytable1 which includes all files in s3:/my.bucket with a filename 
 matching 'folder/2009*.bz2'
 {code:sql}
 CREATE EXTERNAL TABLE mytable2 ( d string, e int, f int, g int )
 STORED AS TEXTFILE 
 LOCATION 'hdfs://data/' 'xyz.*2009.bz2$';
 {code}
 Creates mytable2 including all files matching 'xyz*2009.bz2' located 
 under hdfs://data/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-951) Selectively include EXTERNAL TABLE source files via REGEX

2013-08-28 Thread indrajit (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753163#comment-13753163
 ] 

indrajit commented on HIVE-951:
---

CREATE EXTERNAL TABLE allow users to us the table on the top of HDFS , 
Its good feature and it does not look for the path whether it is created or not 
,
After creation of table you can lazily create the path 

 Selectively include EXTERNAL TABLE source files via REGEX
 -

 Key: HIVE-951
 URL: https://issues.apache.org/jira/browse/HIVE-951
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Attachments: HIVE-951.patch


 CREATE EXTERNAL TABLE should allow users to cherry-pick files via regular 
 expression. 
 CREATE EXTERNAL TABLE was designed to allow users to access data that exists 
 outside of Hive, and
 currently makes the assumption that all of the files located under the 
 supplied path should be included
 in the new table. Users frequently encounter directories containing multiple
 datasets, or directories that contain data in heterogeneous schemas, and it's 
 often
 impractical or impossible to adjust the layout of the directory to meet the 
 requirements of 
 CREATE EXTERNAL TABLE. A good example of this problem is creating an external 
 table based
 on the contents of an S3 bucket. 
 One way to solve this problem is to extend the syntax of CREATE EXTERNAL TABLE
 as follows:
 CREATE EXTERNAL TABLE
 ...
 LOCATION path [file_regex]
 ...
 For example:
 {code:sql}
 CREATE EXTERNAL TABLE mytable1 ( a string, b string, c string )
 STORED AS TEXTFILE
 LOCATION 's3://my.bucket/' 'folder/2009.*\.bz2$';
 {code}
 Creates mytable1 which includes all files in s3:/my.bucket with a filename 
 matching 'folder/2009*.bz2'
 {code:sql}
 CREATE EXTERNAL TABLE mytable2 ( d string, e int, f int, g int )
 STORED AS TEXTFILE 
 LOCATION 'hdfs://data/' 'xyz.*2009.bz2$';
 {code}
 Creates mytable2 including all files matching 'xyz*2009.bz2' located 
 under hdfs://data/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira