RE: Not Able To Access Drill Web Console In Multiple tabs Using Drill 1.13

2018-07-17 Thread Surneni Tilak
Thanks For your response Kunal.

I tried in both the way but the outcome is same the window where I submitted 
the query is loading and other one is getting stuck. But the same query is 
working well with 1.12 version in which I could monitor the query status and 
submit another query.
I am submitting my query and trying to observe it's progress using the 
web-console of same drillbit as in our cluster we have only one web-console 
that we can access. Thanks for your suggestion regarding the other tools which 
is one option that I could try.

Please let me know your response.

Best regards, 
_
Tilak 


-Original Message-
From: Kunal Khatua [mailto:ku...@apache.org] 
Sent: Tuesday, July 17, 2018 8:36 PM
To: user@drill.apache.org
Subject: Re: Not Able To Access Drill Web Console In Multiple tabs Using Drill 
1.13

You could try the reverse. Monitor in the initial window, while submitting the 
query in another window.

That said, the reason your console is getting stuck is by design. The browser 
tab from which you submit the query is the window where you'll receive the 
results of the query. Hence, the window is "stuck" as it is waiting for results 
to come back.

With regards to why you are not able to monitor the current query status is 
because you might be having a fairly large result-set that the server is 
formatting for the web-console, resulting in the WebServer threads being 
saturated. A simpler workaround is to monitor the system through a second 
Drillbit's web-console. If the first Drillbit (from which you launched the 
query) is very busy, you'll see the status updates not coming in as frequently.

As a thumb rule, use the WebConsole for quick exploration (i.e. experimental 
queries with LIMIT to just glance at the data). Otherwise, there are a number 
of good JDBC based tools like SQuirrel and DBeaver (the latter also downloads 
the drivers automatically), that you can use. 


On 7/17/2018 3:40:06 AM, Surneni Tilak  wrote:

Hi Team,

I am using Drill 1.13.0 version. I am facing below issues which were not there 
in 1.12.0


1. When I am submitting query I am not able to open Drill web-console in 
another window to monitor the currently running query status.

2. Not able to submit another query once a query is under running status as the 
console is getting stuck in running the first query.

Please guide me how I could come out of these issues as I would like to use the 
latest version of Drill.

Best regards,
_
Tilak



Re: Array Index Out of Bounds in String Binary

2018-07-17 Thread Vlad Rozov

A. +1.

B. Every byte in a binary data may require up to 4 bytes (0xXX) in the 
string representation, so 80 may work, 60 should reliably work.


Thank you,

Vlad

On 7/17/18 13:14, John Omernik wrote:

Yet this works?

string_binary(byte_substr(`data`, 1, 80))

On Tue, Jul 17, 2018 at 3:12 PM, John Omernik  wrote:


So on B. I found the BYTE_SUBSTR and only send 200 bytes to the
string_binary function, I still get an error.  Something else is happening
here...

select `type`, `timestamp`, `src_ip`, `dst_ip`, `src_port`, `dst_port`,
string_binary(byte_substr(`data`, 1, 200)) as mydata from
`user/jomernik/bf2_7306.pcap` limit 10

I get the same

Error Id: 213075e7-378a-437f-a5dc-408326f123f3 on
zeta3.brewingintel.com:20005]

org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
IndexOutOfBoundsException: index: 0, length: 379 (expected: range(0, 256))





On Tue, Jul 17, 2018 at 12:56 PM, John Omernik  wrote:


Thanks Vlad a couple of thoughts.


A. I think that should be fixed. That seems like a limitation that is
both unexpected and undocumented.

B.  Is there a way, if my data in the table is returned as binary to
start with, for me to return the first 256 bytes? I tried substring, and
tries to force to UTF-8 and I am getting some issues there.

On Tue, Jul 17, 2018 at 10:33 AM, Vlad Rozov  wrote:


In case of DRILL-6607 the issue lies in the implementation of
"string_binary" function: it is not prepared to handle incoming data that
when converted to a binary string would exceed 256 bytes as it does not
reallocate the output buffer. Until the function code is fixed, the only
way to avoid the error is either not to use "string_binary" or to use it
with the data that meets "string_binary" limitation.

Thank you,

Vlad


On 7/13/18 14:01, Ted Dunning wrote:


There are bounds for acceptable behavior for a function like this.
Array
index out of bounds is not acceptable. Aborting with a clean message
about
to true problem might be fine, as would be to return a null.

On Fri, Jul 13, 2018, 13:46 John Omernik  wrote:

So, as to the actual problem, I opened a JIRA here:

https://issues.apache.org/jira/browse/DRILL-6607

The reason I brought this here is my own curiosity:  Does an issue in
using
this function most likely lie in the function code itself not handling
good
data, or is the issue in the pcap pluglin which produces the data for
this
function to consume, I am just curious on how something like this
could be
avoided.

John






Re: Array Index Out of Bounds in String Binary

2018-07-17 Thread John Omernik
Yet this works?

string_binary(byte_substr(`data`, 1, 80))

On Tue, Jul 17, 2018 at 3:12 PM, John Omernik  wrote:

> So on B. I found the BYTE_SUBSTR and only send 200 bytes to the
> string_binary function, I still get an error.  Something else is happening
> here...
>
> select `type`, `timestamp`, `src_ip`, `dst_ip`, `src_port`, `dst_port`,
> string_binary(byte_substr(`data`, 1, 200)) as mydata from
> `user/jomernik/bf2_7306.pcap` limit 10
>
> I get the same
>
> Error Id: 213075e7-378a-437f-a5dc-408326f123f3 on
> zeta3.brewingintel.com:20005]
>
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
> IndexOutOfBoundsException: index: 0, length: 379 (expected: range(0, 256))
>
>
>
>
>
> On Tue, Jul 17, 2018 at 12:56 PM, John Omernik  wrote:
>
>>
>> Thanks Vlad a couple of thoughts.
>>
>>
>> A. I think that should be fixed. That seems like a limitation that is
>> both unexpected and undocumented.
>>
>> B.  Is there a way, if my data in the table is returned as binary to
>> start with, for me to return the first 256 bytes? I tried substring, and
>> tries to force to UTF-8 and I am getting some issues there.
>>
>> On Tue, Jul 17, 2018 at 10:33 AM, Vlad Rozov  wrote:
>>
>>> In case of DRILL-6607 the issue lies in the implementation of
>>> "string_binary" function: it is not prepared to handle incoming data that
>>> when converted to a binary string would exceed 256 bytes as it does not
>>> reallocate the output buffer. Until the function code is fixed, the only
>>> way to avoid the error is either not to use "string_binary" or to use it
>>> with the data that meets "string_binary" limitation.
>>>
>>> Thank you,
>>>
>>> Vlad
>>>
>>>
>>> On 7/13/18 14:01, Ted Dunning wrote:
>>>
 There are bounds for acceptable behavior for a function like this.
 Array
 index out of bounds is not acceptable. Aborting with a clean message
 about
 to true problem might be fine, as would be to return a null.

 On Fri, Jul 13, 2018, 13:46 John Omernik  wrote:

 So, as to the actual problem, I opened a JIRA here:
>
> https://issues.apache.org/jira/browse/DRILL-6607
>
> The reason I brought this here is my own curiosity:  Does an issue in
> using
> this function most likely lie in the function code itself not handling
> good
> data, or is the issue in the pcap pluglin which produces the data for
> this
> function to consume, I am just curious on how something like this
> could be
> avoided.
>
> John
>
>
>>>
>>
>


Re: Array Index Out of Bounds in String Binary

2018-07-17 Thread John Omernik
So on B. I found the BYTE_SUBSTR and only send 200 bytes to the
string_binary function, I still get an error.  Something else is happening
here...

select `type`, `timestamp`, `src_ip`, `dst_ip`, `src_port`, `dst_port`,
string_binary(byte_substr(`data`, 1, 200)) as mydata from
`user/jomernik/bf2_7306.pcap` limit 10

I get the same

Error Id: 213075e7-378a-437f-a5dc-408326f123f3 on
zeta3.brewingintel.com:20005]

org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
IndexOutOfBoundsException: index: 0, length: 379 (expected: range(0, 256))





On Tue, Jul 17, 2018 at 12:56 PM, John Omernik  wrote:

>
> Thanks Vlad a couple of thoughts.
>
>
> A. I think that should be fixed. That seems like a limitation that is both
> unexpected and undocumented.
>
> B.  Is there a way, if my data in the table is returned as binary to start
> with, for me to return the first 256 bytes? I tried substring, and tries to
> force to UTF-8 and I am getting some issues there.
>
> On Tue, Jul 17, 2018 at 10:33 AM, Vlad Rozov  wrote:
>
>> In case of DRILL-6607 the issue lies in the implementation of
>> "string_binary" function: it is not prepared to handle incoming data that
>> when converted to a binary string would exceed 256 bytes as it does not
>> reallocate the output buffer. Until the function code is fixed, the only
>> way to avoid the error is either not to use "string_binary" or to use it
>> with the data that meets "string_binary" limitation.
>>
>> Thank you,
>>
>> Vlad
>>
>>
>> On 7/13/18 14:01, Ted Dunning wrote:
>>
>>> There are bounds for acceptable behavior for a function like this.  Array
>>> index out of bounds is not acceptable. Aborting with a clean message
>>> about
>>> to true problem might be fine, as would be to return a null.
>>>
>>> On Fri, Jul 13, 2018, 13:46 John Omernik  wrote:
>>>
>>> So, as to the actual problem, I opened a JIRA here:

 https://issues.apache.org/jira/browse/DRILL-6607

 The reason I brought this here is my own curiosity:  Does an issue in
 using
 this function most likely lie in the function code itself not handling
 good
 data, or is the issue in the pcap pluglin which produces the data for
 this
 function to consume, I am just curious on how something like this could
 be
 avoided.

 John


>>
>


Re: Not Able To Access Drill Web Console In Multiple tabs Using Drill 1.13

2018-07-17 Thread Kunal Khatua
You could try the reverse. Monitor in the initial window, while submitting the 
query in another window.

That said, the reason your console is getting stuck is by design. The browser 
tab from which you submit the query is the window where you'll receive the 
results of the query. Hence, the window is "stuck" as it is waiting for results 
to come back.

With regards to why you are not able to monitor the current query status is 
because you might be having a fairly large result-set that the server is 
formatting for the web-console, resulting in the WebServer threads being 
saturated. A simpler workaround is to monitor the system through a second 
Drillbit's web-console. If the first Drillbit (from which you launched the 
query) is very busy, you'll see the status updates not coming in as frequently.

As a thumb rule, use the WebConsole for quick exploration (i.e. experimental 
queries with LIMIT to just glance at the data). Otherwise, there are a number 
of good JDBC based tools like SQuirrel and DBeaver (the latter also downloads 
the drivers automatically), that you can use. 


On 7/17/2018 3:40:06 AM, Surneni Tilak  wrote:

Hi Team,

I am using Drill 1.13.0 version. I am facing below issues which were not there 
in 1.12.0


1. When I am submitting query I am not able to open Drill web-console in 
another window to monitor the currently running query status.

2. Not able to submit another query once a query is under running status as the 
console is getting stuck in running the first query.

Please guide me how I could come out of these issues as I would like to use the 
latest version of Drill.

Best regards,
_
Tilak



Re: Array Index Out of Bounds in String Binary

2018-07-17 Thread John Omernik
Thanks Vlad a couple of thoughts.


A. I think that should be fixed. That seems like a limitation that is both
unexpected and undocumented.

B.  Is there a way, if my data in the table is returned as binary to start
with, for me to return the first 256 bytes? I tried substring, and tries to
force to UTF-8 and I am getting some issues there.

On Tue, Jul 17, 2018 at 10:33 AM, Vlad Rozov  wrote:

> In case of DRILL-6607 the issue lies in the implementation of
> "string_binary" function: it is not prepared to handle incoming data that
> when converted to a binary string would exceed 256 bytes as it does not
> reallocate the output buffer. Until the function code is fixed, the only
> way to avoid the error is either not to use "string_binary" or to use it
> with the data that meets "string_binary" limitation.
>
> Thank you,
>
> Vlad
>
>
> On 7/13/18 14:01, Ted Dunning wrote:
>
>> There are bounds for acceptable behavior for a function like this.  Array
>> index out of bounds is not acceptable. Aborting with a clean message about
>> to true problem might be fine, as would be to return a null.
>>
>> On Fri, Jul 13, 2018, 13:46 John Omernik  wrote:
>>
>> So, as to the actual problem, I opened a JIRA here:
>>>
>>> https://issues.apache.org/jira/browse/DRILL-6607
>>>
>>> The reason I brought this here is my own curiosity:  Does an issue in
>>> using
>>> this function most likely lie in the function code itself not handling
>>> good
>>> data, or is the issue in the pcap pluglin which produces the data for
>>> this
>>> function to consume, I am just curious on how something like this could
>>> be
>>> avoided.
>>>
>>> John
>>>
>>>
>


Re: Array Index Out of Bounds in String Binary

2018-07-17 Thread Vlad Rozov
In case of DRILL-6607 the issue lies in the implementation of 
"string_binary" function: it is not prepared to handle incoming data 
that when converted to a binary string would exceed 256 bytes as it does 
not reallocate the output buffer. Until the function code is fixed, the 
only way to avoid the error is either not to use "string_binary" or to 
use it with the data that meets "string_binary" limitation.


Thank you,

Vlad

On 7/13/18 14:01, Ted Dunning wrote:

There are bounds for acceptable behavior for a function like this.  Array
index out of bounds is not acceptable. Aborting with a clean message about
to true problem might be fine, as would be to return a null.

On Fri, Jul 13, 2018, 13:46 John Omernik  wrote:


So, as to the actual problem, I opened a JIRA here:

https://issues.apache.org/jira/browse/DRILL-6607

The reason I brought this here is my own curiosity:  Does an issue in using
this function most likely lie in the function code itself not handling good
data, or is the issue in the pcap pluglin which produces the data for this
function to consume, I am just curious on how something like this could be
avoided.

John





Re: CT from parquet to CSV seems to not properly encode to UTF8

2018-07-17 Thread Carlos Derich
Hey guys,

Adding this JVM flag to the drill-env.sh file made it to work.

export JAVA_TOOL_OPTIONS="-Dfile.encoding=UTF8"

Thank you very much.


On Tue, Jul 17, 2018 at 1:49 AM, Kunal Khatua  wrote:

> Hi Carlos
>
> It looks similar to an issue reported previously:
> https://lists.apache.org/thread.html/1f3d4c427690c06f1992bc5070f355
> 689ccc5b1ed8cc3678ad8e9106@
>
> Could you try setting the JVM's file encoding to UTF-8 and retry? If it
> does not work, please file a JIRA in https://issues.apache.org
>
> Thanks
> Kunal
> On 7/16/2018 1:25:45 PM, Carlos Derich  wrote:
> It seems to be an issue only with CSV/TSV files.
>
> Tried writing the output as JSON and it handles the encoding properly.
>
> alter session set `store.format`='json'
> create table dfs.tmp.test3 as select `city` from dfs.parquets.`file`
>
> Returns:
>
> {"city": "Montréal"}
>
>
> additional info:
>
> parquet-tools schema:
>
> message root {
> optional binary city (UTF8);
> }
>
>
> On Mon, Jul 16, 2018 at 2:49 PM, Carlos Derich
> wrote:
>
> > Hello guys, hope everyone is well.
> >
> > I am having an encoding issue when converting a table from parquet into
> > csv files, I wonder if someone could shed some light on it ?
> >
> > One of my data sets has data in French with lots of accentuation, and it
> > is persisted in HDFS as parquet.
> >
> >
> > When I query the parquet table with: *select `city` from
> > dfs.parquets.`file` , *it properly return the data encoded.
> >
> >
> > *city*
> >
> > *Montréal*
> >
> >
> > Then I convert this table into a CSV file with the following query:
> >
> > *alter session set `store.format`='csv'*
> > *create table dfs.csvs.`converted` as select * from dfs.parquets.`file`*
> >
> >
> > Then when I run a select query on it, it returns data not properly
> encoded:
> >
> > *select columns[0] from dfs.csvs.`converted`*
> >
> > Returns:
> >
> > *Montr?al*
> >
> >
> > My storage plugin is pretty standard:
> >
> > "csv" : {
> > "type" : "text",
> > "extensions" : [ "csv" ],
> > "delimiter" : ",",
> > "skipFirstLine": true
> > },
> >
> > Should I explicitly add an charset option somewhere ? Couldn't find
> > anything helpful on the docs.
> >
> > Tried adding *export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS
> > -Dsaffron.default.charset=UTF-8"* to drill-env.sh file, but no luck.
> >
> > Have anyone ran into similar issues ?
> >
> > Thank you !
> >
>


Not Able To Access Drill Web Console In Multiple tabs Using Drill 1.13

2018-07-17 Thread Surneni Tilak

Hi Team,

I am using Drill 1.13.0 version. I am facing below issues which were not there 
in 1.12.0


1.   When I am submitting query I am not able to open Drill web-console in 
another window to monitor the currently running query status.

2.   Not able to submit another query once a query is under running status 
as the console is getting stuck in running the first query.

Please guide me how I could come out of these issues as I would like to use the 
latest version of Drill.

Best regards,
_
Tilak