[jira] [Commented] (DRILL-4659) Specify, as part of the query, table information: data format (CSV, parquet, JSON. etc.), field delimiter, etc.

2016-05-10 Thread Roger Dielrton (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277783#comment-15277783
 ] 

Roger Dielrton commented on DRILL-4659:
---

Thank you, Jason for the information, and sorry for not realize that Drill can 
do what I needed.
I'm agree with put some examples of this feauture in the "Querying Data" 
section; it would be very useful.

But, however, I continue with problems relative to "query parametrization 
enrichment". Then I pass to explain it.

The contents of the source data (JSON type) file is (I show the partial ouput 
of {{$ less -N /tmp/foojson1}}):
{noformat}
...
5132 { "city" : "WYNCOTE", "loc" : [ -75.152417, 40.086673 ], "pop" : 6164, 
"state" : "PA", "_id" : "19095" }
5133 { "city" : "WYNNEWOOD", "loc" : [ -75.275983, 40 ], "pop" : 8285, 
"state" : "PA", "_id" : "19096" }
5134 { "city" : "PHILADELPHIA", "loc" : [ -75.1661090001, 39.948908 ], 
"pop" : 3623, "state" : "PA", "_id" : "19102" }
...
{noformat}

The query:
{code:sql}
select
columns
from
table(dfs.`/tmp/foojson1`(type => 'json'))
{code}


The result (error):
{noformat}
UNSUPPORTED_OPERATION ERROR:
In a list of type FLOAT8, encountered a value of type BIGINT.
Drill does not support lists of different types.
File /tmp/foojson1
Record 5133
Line 5133
Column 58
Field loc
Fragment 0:0
{noformat}

I know this problem can be avoided executing {{alter session set 
`store.json.all_text_mode` = true;}} before
issuing the query, but, it would be useful to do something like this:
{code:sql}
select
columns
from
table(dfs.`/tmp/foojson1`(type => 'json', 'store.json.all_text_mode' => 
true))
{code}

That is: extends table function parameters to any useful parametrization for 
the issued query like, in this case, the {{store.json.all_text_mode}} parameter.

> Specify, as part of the query, table information: data format (CSV, parquet, 
> JSON. etc.), field delimiter, etc.
> ---
>
> Key: DRILL-4659
> URL: https://issues.apache.org/jira/browse/DRILL-4659
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization, SQL Parser
>Reporter: Roger Dielrton
>Priority: Minor
>
> I have a file, that I would like to use in a query, and it can have one or 
> more of the following properties:
> * Has not extension ==> Drill is unable to handle it.
> * I know it contains data in CSV format, but the field separator is a non 
> standard character ==> Drill is unable to parse it (without modify the 
> storage plugin configuration).
> * Is located in an Amazon S3 bucket ==> I can't rename it.
> * Has a big size ==> It would be expensive to make a copy of it. 
> It would be nice if you can specify, as part of the "select" query, as 
> metadata, relevant table information as:
> * Data format (CSV, parquet, JSON. etc.)
> * Field delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4659) Specify, as part of the query, table information: data format (CSV, parquet, JSON. etc.), field delimiter, etc.

2016-05-09 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276627#comment-15276627
 ] 

Jason Altekruse commented on DRILL-4659:


This feature was added last fall, I think we may want to duplicate this 
information in the section about "Querying Data" to make it easier to find, but 
the feature is documented here.

https://drill.apache.org/docs/plugin-configuration-basics/#using-the-formats-attributes-as-table-function-parameters

If you would like to see more examples of usage or information about the 
features development this was the JIRA for the feature: 
https://issues.apache.org/jira/browse/DRILL-4047

> Specify, as part of the query, table information: data format (CSV, parquet, 
> JSON. etc.), field delimiter, etc.
> ---
>
> Key: DRILL-4659
> URL: https://issues.apache.org/jira/browse/DRILL-4659
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization, SQL Parser
>Reporter: Roger Dielrton
>Priority: Minor
>
> I have a file, that I would like to use in a query, and it can have one or 
> more of the following properties:
> * Has not extension ==> Drill is unable to handle it.
> * I know it contains data in CSV format, but the field separator is a non 
> standard character ==> Drill is unable to parse it (without modify the 
> storage plugin configuration).
> * Is located in an Amazon S3 bucket ==> I can't rename it.
> * Has a big size ==> It would be expensive to make a copy of it. 
> It would be nice if you can specify, as part of the "select" query, as 
> metadata, relevant table information as:
> * Data format (CSV, parquet, JSON. etc.)
> * Field delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)