Re: RE: Fixed-width files
Andries's solution is quite handy for the occasional use. But, having a storage plugin available can be more convenient and will perform better. When used with table functions, the format plugin allows specifying fields and column names per-query if you find yourself querying multiple different files. Maybe start with the simple approach and grow to the custom approach if performance and convenience justify the extra work. Thanks, - Paul On Tuesday, February 20, 2018, 2:09:02 PM PST, Kunal Khatua wrote: I agree... Using Andries' solution in combination with a view is probably the best approach. -Original Message- From: Flavio Pompermaier [mailto:pomperma...@okkam.it] Sent: Tuesday, February 20, 2018 1:47 PM To: user@drill.apache.org Subject: Re: Fixed-width files Actually what I'd like to achieve, in the end, is to remember how to read a fixed-width file. After considering all your opinions, the best way to achieve this will be probably to create a VIEW and then extract through a DESCRIBE query the columns definition. What do you think? On 20 Feb 2018 20:25, "Arjun kr" wrote: If you have Hive storage plugin enabled, You can create Hive table with regex serde and query the same in Drill. -- Table contents $ hadoop fs -cat /tmp/regex_test/* 112123 $ -- Hive DDL with regex '(.{1})(.{2})(.{3})' - column1 of width 1,column2 of width 2 and column3 of width 3 CREATE EXTERNAL TABLE `hive_regex_test`( `column1` string COMMENT 'from deserializer', `column2` string COMMENT 'from deserializer', `column3` string COMMENT 'from deserializer') ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'input.regex'='(.{1})(.{2})(.{3})') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/tmp/regex_test'; hive> > select * from hive_regex_test; OK hive_regex_test.column1 hive_regex_test.column2 hive_regex_test.column3 1 12 123 Time taken: 0.235 seconds, Fetched: 1 row(s) hive> -- Drill 0: jdbc:drill:schema=dfs> select * from `hive_regex_test`; +--+--+--+ | column1 | column2 | column3 | +--+--+--+ | 1 | 12 | 123 | +--+--+--+ 1 row selected (0.587 seconds) 0: jdbc:drill:schema=dfs> Thanks, Arjun ________ From: Kunal Khatua Sent: Wednesday, February 21, 2018 12:37 AM To: user@drill.apache.org Subject: RE: Fixed-width files This might be a better option, since DRILL-6170 will introduce a rigid parsing definition. So, different fixed-width files can't leverage the same definition, though they might share the same extension. Thanks, Andries! -Original Message----- From: Andries Engelbrecht [mailto:aengelbre...@mapr.com] Sent: Tuesday, February 20, 2018 7:39 AM To: user@drill.apache.org Subject: Re: Fixed-width files You can also try and see if you can just use the CSV plugin to read a line as columns[0] and then use the substr function to pull out the fields in the line. https://urldefense.proofpoint.com/v2/url?u=http-3A__drill.ap ache.org_docs_string-2Dmanipulation_-23substr&d=DwIGaQ&c=csk dkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=oItppN_rkOKe_ pgJb06T71ul6__8GsXmWQzTOQlCvBc&s=u6-Tx7rmfJQDa3_W3hg7YxojXP3 Hf60YPLGHMnD8yLg&e= Here is a simple example Simple csv file [test]$ cat test.csv col1col2col3 jdbc:drill:zk=localhost:5181> select substr(columns[0],1,4), substr(columns[0],5,4), substr(columns[0],9,4) from dfs.root.`/data/csv/test/test.csv`; +-+-+-+ | EXPR$0 | EXPR$1 | EXPR$2 | +-+-+-+ | col1 | col2 | col3 | +-+-+-+ --Andries On 2/20/18, 1:17 AM, "Flavio Pompermaier" wrote: For the moment I've created an improvement issue about this: https://urldefense.proofpoint.com/v2/url?u=https-3A__issues. apache.org_jira_browse_DRILL-2D6170&d=DwIBaQ&c=cskdkSMqhcnjZ xdQVpwTXg&r=Q3Oz5l4W5TvDHNLpOqMYE2AgtKWFE937v89GEHyOVDU&m=69 ohaJkyhIdPzNBy3ZsqNCTa19XysjZzgmn_XPJ2yXQ&s=ajRYDHKrMFgV1AMW 2Q8weYDZtzb7-U5CqR9fML7ihno&e= On Tue, Feb 20, 2018 at 9:23 AM, Flavio Pompermaier < pomperma...@okkam.it> wrote: > Thanks Paul for this suggestion, I think I'm going to give it a try. > Once I've created my EasyFormatPlugin where should I put the produced jar? > in which folder within jars directory? > > On Tue, Feb 20, 2018 at 2:57 AM, Paul Rogers > wrote: > >> It may be that by "fixed width text", Flavio means a file in which the >> text columns are of fixed w
Re: Fixed-width files
Hi Flavio, Great question! I've not yet experimented with the solution myself, but I believe that the plugin can be placed into a jar, along with the needed Drill config file, and then placed into the jars/3rd-party directory if you keep your config information in the Drill product directory. Perhaps Charles can offer more details based on his experience. You may find it more convenient to use the "site" directory added in Drill 1.8. With a site directory, you separate your config files and custom jars from the Drill product files. Launch drill with the "--site" flag: > drillbit.sh --site /my/site/dir start For convenience, you can set the DRILL_SITE_DIR env var instead of using the --site flag. If using a site directory, put your jar in the "jars" folder. All that said, while you develop your plugin, you'll want to put the sources inside the Drill java-exec project. Why? Doing so allows you to very rapidly build and debug your library using your favorite IDE. The test file mentioned in the PR shows how to use the test framework to run a query, start an in-process Drillbit, and immediately step through (or set breakpoints in) your plugin code. If you build the plugin as a jar file, then for each edit/compile/debug cycle, you'll need to build your jar, copy it to the proper location, restart the Drill server, attach the remote debugger, start a client tool, and finally submit a query. This works, but is quite slow; the above technique is faster for us impatient types... Once the storage plugin works, then you can move the code to a new project from which you can build and deploy your jar. Or, you can do as Charles did: offer your plugin to the Drill project via a PR so others can use it. Thanks, - Paul On Tuesday, February 20, 2018, 12:24:10 AM PST, Flavio Pompermaier wrote: Thanks Paul for this suggestion, I think I'm going to give it a try. Once I've created my EasyFormatPlugin where should I put the produced jar? in which folder within jars directory? On Tue, Feb 20, 2018 at 2:57 AM, Paul Rogers wrote: > It may be that by "fixed width text", Flavio means a file in which the > text columns are of fixed width: kind of like old-school punch cards. > Drill has no reader for this use case, but if you are a Java programmer, > you can create one. See Drill Pull Request #1114 [1] for one example of a > regex reader along with pointers to a second example I'm building for a > book. Should be easy to adopt this code to take a list of column widths in > place of the regex. Actually, you could use the regex with a pattern that > just picks out a fixed number of characters. > Thanks, > - Paul > > [1] https://github.com/apache/drill/pull/1114 > > > > > On Monday, February 19, 2018, 12:52:42 PM PST, Kunal Khatua < > kkha...@mapr.com> wrote: > > As long as you have delimiters, you should be able to import it as a > regular CSV file. Using views that define the fixed-width nature should > help operators downstream work more efficiently. > > -Original Message- > From: Flavio Pompermaier [mailto:pomperma...@okkam.it] > Sent: Monday, February 19, 2018 6:50 AM > To: user@drill.apache.org > Subject: Fixed-width files > > Hi to all, > I'm currently looking for the best solution to load a fixed-width text > file into Drill. > Is there any way right now to do that? Is there anyone that already have a > working connector? > Is it better to implement a brand new FormatPluginConfig or > StoragePluginConfig? > > Best, > Flavio > >
RE: Fixed-width files
I agree... Using Andries' solution in combination with a view is probably the best approach. -Original Message- From: Flavio Pompermaier [mailto:pomperma...@okkam.it] Sent: Tuesday, February 20, 2018 1:47 PM To: user@drill.apache.org Subject: Re: Fixed-width files Actually what I'd like to achieve, in the end, is to remember how to read a fixed-width file. After considering all your opinions, the best way to achieve this will be probably to create a VIEW and then extract through a DESCRIBE query the columns definition. What do you think? On 20 Feb 2018 20:25, "Arjun kr" wrote: If you have Hive storage plugin enabled, You can create Hive table with regex serde and query the same in Drill. -- Table contents $ hadoop fs -cat /tmp/regex_test/* 112123 $ -- Hive DDL with regex '(.{1})(.{2})(.{3})' - column1 of width 1,column2 of width 2 and column3 of width 3 CREATE EXTERNAL TABLE `hive_regex_test`( `column1` string COMMENT 'from deserializer', `column2` string COMMENT 'from deserializer', `column3` string COMMENT 'from deserializer') ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'input.regex'='(.{1})(.{2})(.{3})') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/tmp/regex_test'; hive> > select * from hive_regex_test; OK hive_regex_test.column1 hive_regex_test.column2 hive_regex_test.column3 1 12 123 Time taken: 0.235 seconds, Fetched: 1 row(s) hive> -- Drill 0: jdbc:drill:schema=dfs> select * from `hive_regex_test`; +--+--+--+ | column1 | column2 | column3 | +--+--+--+ | 1| 12 | 123 | +--+--+--+ 1 row selected (0.587 seconds) 0: jdbc:drill:schema=dfs> Thanks, Arjun ________ From: Kunal Khatua Sent: Wednesday, February 21, 2018 12:37 AM To: user@drill.apache.org Subject: RE: Fixed-width files This might be a better option, since DRILL-6170 will introduce a rigid parsing definition. So, different fixed-width files can't leverage the same definition, though they might share the same extension. Thanks, Andries! -Original Message----- From: Andries Engelbrecht [mailto:aengelbre...@mapr.com] Sent: Tuesday, February 20, 2018 7:39 AM To: user@drill.apache.org Subject: Re: Fixed-width files You can also try and see if you can just use the CSV plugin to read a line as columns[0] and then use the substr function to pull out the fields in the line. https://urldefense.proofpoint.com/v2/url?u=http-3A__drill.ap ache.org_docs_string-2Dmanipulation_-23substr&d=DwIGaQ&c=csk dkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=oItppN_rkOKe_ pgJb06T71ul6__8GsXmWQzTOQlCvBc&s=u6-Tx7rmfJQDa3_W3hg7YxojXP3 Hf60YPLGHMnD8yLg&e= Here is a simple example Simple csv file [test]$ cat test.csv col1col2col3 jdbc:drill:zk=localhost:5181> select substr(columns[0],1,4), substr(columns[0],5,4), substr(columns[0],9,4) from dfs.root.`/data/csv/test/test.csv`; +-+-+-+ | EXPR$0 | EXPR$1 | EXPR$2 | +-+-+-+ | col1| col2| col3| +-+-+-+ --Andries On 2/20/18, 1:17 AM, "Flavio Pompermaier" wrote: For the moment I've created an improvement issue about this: https://urldefense.proofpoint.com/v2/url?u=https-3A__issues. apache.org_jira_browse_DRILL-2D6170&d=DwIBaQ&c=cskdkSMqhcnjZ xdQVpwTXg&r=Q3Oz5l4W5TvDHNLpOqMYE2AgtKWFE937v89GEHyOVDU&m=69 ohaJkyhIdPzNBy3ZsqNCTa19XysjZzgmn_XPJ2yXQ&s=ajRYDHKrMFgV1AMW 2Q8weYDZtzb7-U5CqR9fML7ihno&e= On Tue, Feb 20, 2018 at 9:23 AM, Flavio Pompermaier < pomperma...@okkam.it> wrote: > Thanks Paul for this suggestion, I think I'm going to give it a try. > Once I've created my EasyFormatPlugin where should I put the produced jar? > in which folder within jars directory? > > On Tue, Feb 20, 2018 at 2:57 AM, Paul Rogers > wrote: > >> It may be that by "fixed width text", Flavio means a file in which the >> text columns are of fixed width: kind of like old-school punch cards. >> Drill has no reader for this use case, but if you are a Java programmer, >> you can create one. See Drill Pull Request #1114 [1] for one example of a >> regex reader along with pointers to a second example I'm building for a >> book. Should be easy to adopt this code to take a list of column widths in >> place of the regex. Actually, you could use the regex with a pattern that >> just picks out a fixed number
Re: Fixed-width files
Actually what I'd like to achieve, in the end, is to remember how to read a fixed-width file. After considering all your opinions, the best way to achieve this will be probably to create a VIEW and then extract through a DESCRIBE query the columns definition. What do you think? On 20 Feb 2018 20:25, "Arjun kr" wrote: If you have Hive storage plugin enabled, You can create Hive table with regex serde and query the same in Drill. -- Table contents $ hadoop fs -cat /tmp/regex_test/* 112123 $ -- Hive DDL with regex '(.{1})(.{2})(.{3})' - column1 of width 1,column2 of width 2 and column3 of width 3 CREATE EXTERNAL TABLE `hive_regex_test`( `column1` string COMMENT 'from deserializer', `column2` string COMMENT 'from deserializer', `column3` string COMMENT 'from deserializer') ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'input.regex'='(.{1})(.{2})(.{3})') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/tmp/regex_test'; hive> > select * from hive_regex_test; OK hive_regex_test.column1 hive_regex_test.column2 hive_regex_test.column3 1 12 123 Time taken: 0.235 seconds, Fetched: 1 row(s) hive> -- Drill 0: jdbc:drill:schema=dfs> select * from `hive_regex_test`; +--+--+--+ | column1 | column2 | column3 | +--+--+--+ | 1| 12 | 123 | +--+--+--+ 1 row selected (0.587 seconds) 0: jdbc:drill:schema=dfs> Thanks, Arjun ________ From: Kunal Khatua Sent: Wednesday, February 21, 2018 12:37 AM To: user@drill.apache.org Subject: RE: Fixed-width files This might be a better option, since DRILL-6170 will introduce a rigid parsing definition. So, different fixed-width files can't leverage the same definition, though they might share the same extension. Thanks, Andries! -Original Message- From: Andries Engelbrecht [mailto:aengelbre...@mapr.com] Sent: Tuesday, February 20, 2018 7:39 AM To: user@drill.apache.org Subject: Re: Fixed-width files You can also try and see if you can just use the CSV plugin to read a line as columns[0] and then use the substr function to pull out the fields in the line. https://urldefense.proofpoint.com/v2/url?u=http-3A__drill.ap ache.org_docs_string-2Dmanipulation_-23substr&d=DwIGaQ&c=csk dkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=oItppN_rkOKe_ pgJb06T71ul6__8GsXmWQzTOQlCvBc&s=u6-Tx7rmfJQDa3_W3hg7YxojXP3 Hf60YPLGHMnD8yLg&e= Here is a simple example Simple csv file [test]$ cat test.csv col1col2col3 jdbc:drill:zk=localhost:5181> select substr(columns[0],1,4), substr(columns[0],5,4), substr(columns[0],9,4) from dfs.root.`/data/csv/test/test.csv`; +-+-+-+ | EXPR$0 | EXPR$1 | EXPR$2 | +-+-+-+ | col1| col2| col3| +-+-+-+ --Andries On 2/20/18, 1:17 AM, "Flavio Pompermaier" wrote: For the moment I've created an improvement issue about this: https://urldefense.proofpoint.com/v2/url?u=https-3A__issues. apache.org_jira_browse_DRILL-2D6170&d=DwIBaQ&c=cskdkSMqhcnjZ xdQVpwTXg&r=Q3Oz5l4W5TvDHNLpOqMYE2AgtKWFE937v89GEHyOVDU&m=69 ohaJkyhIdPzNBy3ZsqNCTa19XysjZzgmn_XPJ2yXQ&s=ajRYDHKrMFgV1AMW 2Q8weYDZtzb7-U5CqR9fML7ihno&e= On Tue, Feb 20, 2018 at 9:23 AM, Flavio Pompermaier < pomperma...@okkam.it> wrote: > Thanks Paul for this suggestion, I think I'm going to give it a try. > Once I've created my EasyFormatPlugin where should I put the produced jar? > in which folder within jars directory? > > On Tue, Feb 20, 2018 at 2:57 AM, Paul Rogers > wrote: > >> It may be that by "fixed width text", Flavio means a file in which the >> text columns are of fixed width: kind of like old-school punch cards. >> Drill has no reader for this use case, but if you are a Java programmer, >> you can create one. See Drill Pull Request #1114 [1] for one example of a >> regex reader along with pointers to a second example I'm building for a >> book. Should be easy to adopt this code to take a list of column widths in >> place of the regex. Actually, you could use the regex with a pattern that >> just picks out a fixed number of characters. >> Thanks, >> - Paul >> >> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__github. com_apache_drill_pull_1114&d=DwIBaQ&c=cskdkSMqhcnjZxdQVpwTXg &r=Q3Oz5l4W5TvDHNLpOqMYE2AgtKWFE937v89GEHyOVDU&m=69ohaJkyhId PzNBy3Zsq
Re: Fixed-width files
If you have Hive storage plugin enabled, You can create Hive table with regex serde and query the same in Drill. -- Table contents $ hadoop fs -cat /tmp/regex_test/* 112123 $ -- Hive DDL with regex '(.{1})(.{2})(.{3})' - column1 of width 1,column2 of width 2 and column3 of width 3 CREATE EXTERNAL TABLE `hive_regex_test`( `column1` string COMMENT 'from deserializer', `column2` string COMMENT 'from deserializer', `column3` string COMMENT 'from deserializer') ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'input.regex'='(.{1})(.{2})(.{3})') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/tmp/regex_test'; hive> > select * from hive_regex_test; OK hive_regex_test.column1 hive_regex_test.column2 hive_regex_test.column3 1 12 123 Time taken: 0.235 seconds, Fetched: 1 row(s) hive> -- Drill 0: jdbc:drill:schema=dfs> select * from `hive_regex_test`; +--+--+--+ | column1 | column2 | column3 | +--+--+--+ | 1| 12 | 123 | +--+--+--+ 1 row selected (0.587 seconds) 0: jdbc:drill:schema=dfs> Thanks, Arjun ____ From: Kunal Khatua Sent: Wednesday, February 21, 2018 12:37 AM To: user@drill.apache.org Subject: RE: Fixed-width files This might be a better option, since DRILL-6170 will introduce a rigid parsing definition. So, different fixed-width files can't leverage the same definition, though they might share the same extension. Thanks, Andries! -Original Message- From: Andries Engelbrecht [mailto:aengelbre...@mapr.com] Sent: Tuesday, February 20, 2018 7:39 AM To: user@drill.apache.org Subject: Re: Fixed-width files You can also try and see if you can just use the CSV plugin to read a line as columns[0] and then use the substr function to pull out the fields in the line. https://urldefense.proofpoint.com/v2/url?u=http-3A__drill.apache.org_docs_string-2Dmanipulation_-23substr&d=DwIGaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=oItppN_rkOKe_pgJb06T71ul6__8GsXmWQzTOQlCvBc&s=u6-Tx7rmfJQDa3_W3hg7YxojXP3Hf60YPLGHMnD8yLg&e= Here is a simple example Simple csv file [test]$ cat test.csv col1col2col3 jdbc:drill:zk=localhost:5181> select substr(columns[0],1,4), substr(columns[0],5,4), substr(columns[0],9,4) from dfs.root.`/data/csv/test/test.csv`; +-+-+-+ | EXPR$0 | EXPR$1 | EXPR$2 | +-+-+-+ | col1| col2| col3| +-+-+-+ --Andries On 2/20/18, 1:17 AM, "Flavio Pompermaier" wrote: For the moment I've created an improvement issue about this: https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6170&d=DwIBaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=Q3Oz5l4W5TvDHNLpOqMYE2AgtKWFE937v89GEHyOVDU&m=69ohaJkyhIdPzNBy3ZsqNCTa19XysjZzgmn_XPJ2yXQ&s=ajRYDHKrMFgV1AMW2Q8weYDZtzb7-U5CqR9fML7ihno&e= On Tue, Feb 20, 2018 at 9:23 AM, Flavio Pompermaier wrote: > Thanks Paul for this suggestion, I think I'm going to give it a try. > Once I've created my EasyFormatPlugin where should I put the produced jar? > in which folder within jars directory? > > On Tue, Feb 20, 2018 at 2:57 AM, Paul Rogers > wrote: > >> It may be that by "fixed width text", Flavio means a file in which the >> text columns are of fixed width: kind of like old-school punch cards. >> Drill has no reader for this use case, but if you are a Java programmer, >> you can create one. See Drill Pull Request #1114 [1] for one example of a >> regex reader along with pointers to a second example I'm building for a >> book. Should be easy to adopt this code to take a list of column widths in >> place of the regex. Actually, you could use the regex with a pattern that >> just picks out a fixed number of characters. >> Thanks, >> - Paul >> >> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_pull_1114&d=DwIBaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=Q3Oz5l4W5TvDHNLpOqMYE2AgtKWFE937v89GEHyOVDU&m=69ohaJkyhIdPzNBy3ZsqNCTa19XysjZzgmn_XPJ2yXQ&s=-0LdlBnmAXaipanP87yJezn5HPEHQIQVX5izxnNTYFY&e= >> >> >> >> >> On Monday, February 19, 2018, 12:52:42 PM PST, Kunal Khatua < >> kkha...@mapr.com> wrote: >> >> As long as you have delimiters, you should be able to import it as a
RE: Fixed-width files
This might be a better option, since DRILL-6170 will introduce a rigid parsing definition. So, different fixed-width files can't leverage the same definition, though they might share the same extension. Thanks, Andries! -Original Message- From: Andries Engelbrecht [mailto:aengelbre...@mapr.com] Sent: Tuesday, February 20, 2018 7:39 AM To: user@drill.apache.org Subject: Re: Fixed-width files You can also try and see if you can just use the CSV plugin to read a line as columns[0] and then use the substr function to pull out the fields in the line. https://urldefense.proofpoint.com/v2/url?u=http-3A__drill.apache.org_docs_string-2Dmanipulation_-23substr&d=DwIGaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A&m=oItppN_rkOKe_pgJb06T71ul6__8GsXmWQzTOQlCvBc&s=u6-Tx7rmfJQDa3_W3hg7YxojXP3Hf60YPLGHMnD8yLg&e= Here is a simple example Simple csv file [test]$ cat test.csv col1col2col3 jdbc:drill:zk=localhost:5181> select substr(columns[0],1,4), substr(columns[0],5,4), substr(columns[0],9,4) from dfs.root.`/data/csv/test/test.csv`; +-+-+-+ | EXPR$0 | EXPR$1 | EXPR$2 | +-+-+-+ | col1| col2| col3| +-+-+-+ --Andries On 2/20/18, 1:17 AM, "Flavio Pompermaier" wrote: For the moment I've created an improvement issue about this: https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6170&d=DwIBaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=Q3Oz5l4W5TvDHNLpOqMYE2AgtKWFE937v89GEHyOVDU&m=69ohaJkyhIdPzNBy3ZsqNCTa19XysjZzgmn_XPJ2yXQ&s=ajRYDHKrMFgV1AMW2Q8weYDZtzb7-U5CqR9fML7ihno&e= On Tue, Feb 20, 2018 at 9:23 AM, Flavio Pompermaier wrote: > Thanks Paul for this suggestion, I think I'm going to give it a try. > Once I've created my EasyFormatPlugin where should I put the produced jar? > in which folder within jars directory? > > On Tue, Feb 20, 2018 at 2:57 AM, Paul Rogers > wrote: > >> It may be that by "fixed width text", Flavio means a file in which the >> text columns are of fixed width: kind of like old-school punch cards. >> Drill has no reader for this use case, but if you are a Java programmer, >> you can create one. See Drill Pull Request #1114 [1] for one example of a >> regex reader along with pointers to a second example I'm building for a >> book. Should be easy to adopt this code to take a list of column widths in >> place of the regex. Actually, you could use the regex with a pattern that >> just picks out a fixed number of characters. >> Thanks, >> - Paul >> >> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_pull_1114&d=DwIBaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=Q3Oz5l4W5TvDHNLpOqMYE2AgtKWFE937v89GEHyOVDU&m=69ohaJkyhIdPzNBy3ZsqNCTa19XysjZzgmn_XPJ2yXQ&s=-0LdlBnmAXaipanP87yJezn5HPEHQIQVX5izxnNTYFY&e= >> >> >> >> >> On Monday, February 19, 2018, 12:52:42 PM PST, Kunal Khatua < >> kkha...@mapr.com> wrote: >> >> As long as you have delimiters, you should be able to import it as a >> regular CSV file. Using views that define the fixed-width nature should >> help operators downstream work more efficiently. >> >> -Original Message- >> From: Flavio Pompermaier [mailto:pomperma...@okkam.it] >> Sent: Monday, February 19, 2018 6:50 AM >> To: user@drill.apache.org >> Subject: Fixed-width files >> >> Hi to all, >> I'm currently looking for the best solution to load a fixed-width text >> file into Drill. >> Is there any way right now to do that? Is there anyone that already have >> a working connector? >> Is it better to implement a brand new FormatPluginConfig or >> StoragePluginConfig? >> >> Best, >> Flavio >> >> >
Re: Fixed-width files
You can also try and see if you can just use the CSV plugin to read a line as columns[0] and then use the substr function to pull out the fields in the line. http://drill.apache.org/docs/string-manipulation/#substr Here is a simple example Simple csv file [test]$ cat test.csv col1col2col3 jdbc:drill:zk=localhost:5181> select substr(columns[0],1,4), substr(columns[0],5,4), substr(columns[0],9,4) from dfs.root.`/data/csv/test/test.csv`; +-+-+-+ | EXPR$0 | EXPR$1 | EXPR$2 | +-+-+-+ | col1| col2| col3| +-+-+-+ --Andries On 2/20/18, 1:17 AM, "Flavio Pompermaier" wrote: For the moment I've created an improvement issue about this: https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6170&d=DwIBaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=Q3Oz5l4W5TvDHNLpOqMYE2AgtKWFE937v89GEHyOVDU&m=69ohaJkyhIdPzNBy3ZsqNCTa19XysjZzgmn_XPJ2yXQ&s=ajRYDHKrMFgV1AMW2Q8weYDZtzb7-U5CqR9fML7ihno&e= On Tue, Feb 20, 2018 at 9:23 AM, Flavio Pompermaier wrote: > Thanks Paul for this suggestion, I think I'm going to give it a try. > Once I've created my EasyFormatPlugin where should I put the produced jar? > in which folder within jars directory? > > On Tue, Feb 20, 2018 at 2:57 AM, Paul Rogers > wrote: > >> It may be that by "fixed width text", Flavio means a file in which the >> text columns are of fixed width: kind of like old-school punch cards. >> Drill has no reader for this use case, but if you are a Java programmer, >> you can create one. See Drill Pull Request #1114 [1] for one example of a >> regex reader along with pointers to a second example I'm building for a >> book. Should be easy to adopt this code to take a list of column widths in >> place of the regex. Actually, you could use the regex with a pattern that >> just picks out a fixed number of characters. >> Thanks, >> - Paul >> >> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_pull_1114&d=DwIBaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=Q3Oz5l4W5TvDHNLpOqMYE2AgtKWFE937v89GEHyOVDU&m=69ohaJkyhIdPzNBy3ZsqNCTa19XysjZzgmn_XPJ2yXQ&s=-0LdlBnmAXaipanP87yJezn5HPEHQIQVX5izxnNTYFY&e= >> >> >> >> >> On Monday, February 19, 2018, 12:52:42 PM PST, Kunal Khatua < >> kkha...@mapr.com> wrote: >> >> As long as you have delimiters, you should be able to import it as a >> regular CSV file. Using views that define the fixed-width nature should >> help operators downstream work more efficiently. >> >> -Original Message- >> From: Flavio Pompermaier [mailto:pomperma...@okkam.it] >> Sent: Monday, February 19, 2018 6:50 AM >> To: user@drill.apache.org >> Subject: Fixed-width files >> >> Hi to all, >> I'm currently looking for the best solution to load a fixed-width text >> file into Drill. >> Is there any way right now to do that? Is there anyone that already have >> a working connector? >> Is it better to implement a brand new FormatPluginConfig or >> StoragePluginConfig? >> >> Best, >> Flavio >> >> >
Re: RE: Fixed-width files
For the moment I've created an improvement issue about this: https://issues.apache.org/jira/browse/DRILL-6170 On Tue, Feb 20, 2018 at 9:23 AM, Flavio Pompermaier wrote: > Thanks Paul for this suggestion, I think I'm going to give it a try. > Once I've created my EasyFormatPlugin where should I put the produced jar? > in which folder within jars directory? > > On Tue, Feb 20, 2018 at 2:57 AM, Paul Rogers > wrote: > >> It may be that by "fixed width text", Flavio means a file in which the >> text columns are of fixed width: kind of like old-school punch cards. >> Drill has no reader for this use case, but if you are a Java programmer, >> you can create one. See Drill Pull Request #1114 [1] for one example of a >> regex reader along with pointers to a second example I'm building for a >> book. Should be easy to adopt this code to take a list of column widths in >> place of the regex. Actually, you could use the regex with a pattern that >> just picks out a fixed number of characters. >> Thanks, >> - Paul >> >> [1] https://github.com/apache/drill/pull/1114 >> >> >> >> >> On Monday, February 19, 2018, 12:52:42 PM PST, Kunal Khatua < >> kkha...@mapr.com> wrote: >> >> As long as you have delimiters, you should be able to import it as a >> regular CSV file. Using views that define the fixed-width nature should >> help operators downstream work more efficiently. >> >> -Original Message- >> From: Flavio Pompermaier [mailto:pomperma...@okkam.it] >> Sent: Monday, February 19, 2018 6:50 AM >> To: user@drill.apache.org >> Subject: Fixed-width files >> >> Hi to all, >> I'm currently looking for the best solution to load a fixed-width text >> file into Drill. >> Is there any way right now to do that? Is there anyone that already have >> a working connector? >> Is it better to implement a brand new FormatPluginConfig or >> StoragePluginConfig? >> >> Best, >> Flavio >> >> >
Re: RE: Fixed-width files
Thanks Paul for this suggestion, I think I'm going to give it a try. Once I've created my EasyFormatPlugin where should I put the produced jar? in which folder within jars directory? On Tue, Feb 20, 2018 at 2:57 AM, Paul Rogers wrote: > It may be that by "fixed width text", Flavio means a file in which the > text columns are of fixed width: kind of like old-school punch cards. > Drill has no reader for this use case, but if you are a Java programmer, > you can create one. See Drill Pull Request #1114 [1] for one example of a > regex reader along with pointers to a second example I'm building for a > book. Should be easy to adopt this code to take a list of column widths in > place of the regex. Actually, you could use the regex with a pattern that > just picks out a fixed number of characters. > Thanks, > - Paul > > [1] https://github.com/apache/drill/pull/1114 > > > > > On Monday, February 19, 2018, 12:52:42 PM PST, Kunal Khatua < > kkha...@mapr.com> wrote: > > As long as you have delimiters, you should be able to import it as a > regular CSV file. Using views that define the fixed-width nature should > help operators downstream work more efficiently. > > -Original Message- > From: Flavio Pompermaier [mailto:pomperma...@okkam.it] > Sent: Monday, February 19, 2018 6:50 AM > To: user@drill.apache.org > Subject: Fixed-width files > > Hi to all, > I'm currently looking for the best solution to load a fixed-width text > file into Drill. > Is there any way right now to do that? Is there anyone that already have a > working connector? > Is it better to implement a brand new FormatPluginConfig or > StoragePluginConfig? > > Best, > Flavio > >
Re: RE: Fixed-width files
It may be that by "fixed width text", Flavio means a file in which the text columns are of fixed width: kind of like old-school punch cards. Drill has no reader for this use case, but if you are a Java programmer, you can create one. See Drill Pull Request #1114 [1] for one example of a regex reader along with pointers to a second example I'm building for a book. Should be easy to adopt this code to take a list of column widths in place of the regex. Actually, you could use the regex with a pattern that just picks out a fixed number of characters. Thanks, - Paul [1] https://github.com/apache/drill/pull/1114 On Monday, February 19, 2018, 12:52:42 PM PST, Kunal Khatua wrote: As long as you have delimiters, you should be able to import it as a regular CSV file. Using views that define the fixed-width nature should help operators downstream work more efficiently. -Original Message- From: Flavio Pompermaier [mailto:pomperma...@okkam.it] Sent: Monday, February 19, 2018 6:50 AM To: user@drill.apache.org Subject: Fixed-width files Hi to all, I'm currently looking for the best solution to load a fixed-width text file into Drill. Is there any way right now to do that? Is there anyone that already have a working connector? Is it better to implement a brand new FormatPluginConfig or StoragePluginConfig? Best, Flavio
Re: Fixed-width files
Do you have any real example of this (apart the one reported at [1])? [1] https://drill.apache.org/docs/text-files-csv-tsv-psv/ On Mon, Feb 19, 2018 at 9:52 PM, Kunal Khatua wrote: > As long as you have delimiters, you should be able to import it as a > regular CSV file. Using views that define the fixed-width nature should > help operators downstream work more efficiently. > > -Original Message- > From: Flavio Pompermaier [mailto:pomperma...@okkam.it] > Sent: Monday, February 19, 2018 6:50 AM > To: user@drill.apache.org > Subject: Fixed-width files > > Hi to all, > I'm currently looking for the best solution to load a fixed-width text > file into Drill. > Is there any way right now to do that? Is there anyone that already have a > working connector? > Is it better to implement a brand new FormatPluginConfig or > StoragePluginConfig? > > Best, > Flavio > -- Flavio Pompermaier Development Department OKKAM S.r.l. Tel. +(39) 0461 041809
RE: Fixed-width files
As long as you have delimiters, you should be able to import it as a regular CSV file. Using views that define the fixed-width nature should help operators downstream work more efficiently. -Original Message- From: Flavio Pompermaier [mailto:pomperma...@okkam.it] Sent: Monday, February 19, 2018 6:50 AM To: user@drill.apache.org Subject: Fixed-width files Hi to all, I'm currently looking for the best solution to load a fixed-width text file into Drill. Is there any way right now to do that? Is there anyone that already have a working connector? Is it better to implement a brand new FormatPluginConfig or StoragePluginConfig? Best, Flavio
Fixed-width files
Hi to all, I'm currently looking for the best solution to load a fixed-width text file into Drill. Is there any way right now to do that? Is there anyone that already have a working connector? Is it better to implement a brand new FormatPluginConfig or StoragePluginConfig? Best, Flavio