Max Buffer Size Reached

2016-02-05 Thread John Omernik
Any thoughts on how to troubleshoot this (I have some fat json data going
into the buffers apparently) It's not huge data, just wide/complex (total
size is 1.4 GB)  Any thoughts on how to troubleshoot or settings I can use
to work through these errors?


Thanks!


John




Error: SYSTEM ERROR: OversizedAllocationException: Unable to expand the
buffer. Max allowed buffer size is reached.



Fragment 1:11



[Error Id: db21dea0-ddd7-4fcf-9fea-b5031e358dad on node1



  (org.apache.drill.exec.exception.OversizedAllocationException) Unable to
expand the buffer. Max allowed buffer size is reached.

org.apache.drill.exec.vector.UInt1Vector.reAlloc():214

org.apache.drill.exec.vector.UInt1Vector$Mutator.setValueCount():469


org.apache.drill.exec.vector.complex.ListVector$Mutator.setValueCount():324

org.apache.drill.exec.physical.impl.ScanBatch.next():247

org.apache.drill.exec.record.AbstractRecordBatch.next():119

org.apache.drill.exec.record.AbstractRecordBatch.next():109

org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51


org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():132

org.apache.drill.exec.record.AbstractRecordBatch.next():162

org.apache.drill.exec.record.AbstractRecordBatch.next():119


org.apache.drill.exec.test.generated.StreamingAggregatorGen1931.doWork():172


org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():167

org.apache.drill.exec.record.AbstractRecordBatch.next():162

org.apache.drill.exec.physical.impl.BaseRootExec.next():104


org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93

org.apache.drill.exec.physical.impl.BaseRootExec.next():94

org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256

org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250

java.security.AccessController.doPrivileged():-2

javax.security.auth.Subject.doAs():415

org.apache.hadoop.security.UserGroupInformation.doAs():1595

org.apache.drill.exec.work.fragment.FragmentExecutor.run():250

org.apache.drill.common.SelfCleaningRunnable.run():38

java.util.concurrent.ThreadPoolExecutor.runWorker():1145

java.util.concurrent.ThreadPoolExecutor$Worker.run():615

java.lang.Thread.run():745 (state=,code=0)


Please vote for proposed Drill talks for the Hadoop Summit

2016-02-05 Thread Jason Altekruse
Hello Drillers,

There are some great proposed talks for this year's Hadoop summit related
to Drill. Please help to promote Drill in the wider Big Data community by
taking a look through the list and voting for talks that sound good.

You don't need to register or anything to vote, it just asks for an e-mail
address.

http://hadoopsummit.uservoice.com/search?filter=ideas=drill

Thanks!
Jason


Re: REGEX search Operator

2016-02-05 Thread Nicolas Paris
John,

Sorry for that, this already work as expected.
Give it a try, this is so easy to deploy

SELECT first_name FROM cp.`employee.json` WHERE contains(first_name,'\w+')
LIMIT 5;
first_name |
---|
Sheri  |
Derrick|
Michael|
Maya   |
Roberta|


2016-02-04 20:41 GMT+01:00 John Omernik :

> Ya, do you see where I am coming from here? Let's let the users submit
> regex in the pure form if possible, and code the nuances of java regex
> behind the scenes. I think it would be a great way to make Drill very
> accessible and desirable.  I think what happened in Hive is the regex
> commands started with the users having the escape and now there are just to
> many things that using the escaped regex and the project doesn't want to
> adjust.
>
>
>
>
> On Thu, Feb 4, 2016 at 1:38 PM, Nicolas Paris  wrote:
>
> > You mean:
> > userRegex=>javaRegex
> > "\d" => "\\d"
> > "\w" => "\\w"
> > "\n" => "\n"
> > I can do that thanks to regex I guess.
> > I will give a try
> >
> >
> > 2016-02-04 19:37 GMT+01:00 John Omernik :
> >
> > > So my question on the double escape, is there no way to handle that so
> > the
> > > user can use single escaped regex? I know many folks who use big data
> > > platform to test large complex regexes for things like security
> > appliances,
> > > and having to convert the regex seems like a lot of work if you
> consider
> > > every user has to do that.  If there was a way to do it in Drill, that
> > > would save countless people hours and save many mistakes.
> > >
> > > On Thu, Feb 4, 2016 at 12:03 PM, Nicolas Paris 
> > > wrote:
> > >
> > > > John, Jason,
> > > >
> > > > 2016-02-04 18:47 GMT+01:00 John Omernik :
> > > >
> > > > > I'd be curios on how you are implemeting the regex... using Java's
> > > regex
> > > > > libraries? etc.
> > > > >
> > > > ​Yeah, I use
> > > > java.util.regex
> > > > ​
> > > >
> > > >
> > > > > I know one thing with Hive that always bothered me was the need to
> > > double
> > > > > escape things.
> > > > >
> > > > > '\d\d\d\d-\d\d-\d\d'  needed to be '\\d\\d\\d\\d-\\d\\d-\\d\\d' of
> we
> > > can
> > > > > avoid that it would be AWESOME.
> > > > >
> > > > ​My guess is this comes from java way to handle strings. All
> langages I
> > > > have used need to double escape.​
> > > >
> > > >
> > > > > On Thu, Feb 4, 2016 at 11:37 AM, Jason Altekruse <
> > > > altekruseja...@gmail.com
> > > > > >
> > > > > wrote:
> > > >
> > > > ​code is here: https://github.com/parisni/drill-simple-contains
> > > > It's disturbing how it is simple...
> > > > ​
> > > >
> > > >
> > > > > > I think you should actually just put the function in
> > > > > ​​
> > > > > Drill itself. System
> > > > > > native functions are implemented in the same interface as UDFs,
> > > because
> > > > > our
> > > > > > mechanism for evaluating them is very efficient (we code generate
> > > code
> > > > > > blocks by linking together the bodies of the individual functions
> > to
> > > > > > evaluate a complete expression).
> > > > >
> > > > ​well the folder tree is quite impressive (
> > > https://github.com/apache/drill
> > > > ).
> > > > ​
> > > >
> > > > ​what folder is supposed to be "
> > > > ​
> > > > Drill itself"
> > > > ​ ?​
> > > > ​
> > > >
> > > > > > You can open a JIRA, marking it a feature request. You can open a
> > > poll
> > > > > > request against the apache github repo, making sure you follow
> the
> > > > > standard
> > > > > > format for your commit message, prefixing with the JIRA number in
> > the
> > > > > > format
> > > > > > Example:
> > > > > > DRILL-: Feature description
> > > > > >
> > > > > > This will automatically link the PR to your JIRA.
> > > > >
> > > > ​Ok I will try thanks​
> > > >
> > > > ​a lot​
> > > >
> > > > > > - Jason
> > > > > >
> > > > > > On Thu, Feb 4, 2016 at 8:44 AM, Nicolas Paris <
> nipari...@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Jason, I have it working,
> > > > > > >
> > > > > > > Just tell me the way to proceed to PR.
> > > > > > > 1. where do I put my maven project ? Witch folder in my drill
> > > github
> > > > > > fork?
> > > > > > > 2. do I need a jira ? how proceed ?
> > > > > > >
> > > > > > > For now, I only published it on my github account in a separate
> > > > project
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > 2016-02-04 16:52 GMT+01:00 Jason Altekruse <
> > > altekruseja...@gmail.com
> > > > >:
> > > > > > >
> > > > > > > > Awesome, thanks!
> > > > > > > >
> > > > > > > > On Thu, Feb 4, 2016 at 7:44 AM, Nicolas Paris <
> > > nipari...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Well I am creating a udf
> > > > > > > > > good exercise
> > > > > > > > > I hope a PR soon
> > > > > > > > >
> > > > > > > > > 2016-02-04 16:37 GMT+01:00 Jason Altekruse <
> > > > > altekruseja...@gmail.com
> > > > > > >:
> > > > > > > > >
> > > > > > > > > > I didn't realize 

Re: A field reference identifier must not have the form of a qualified name

2016-02-05 Thread John Omernik
Jacques  there is one very similar JIRA here
https://issues.apache.org/jira/browse/DRILL-3922  I know this issue still
vexes me.

John

On Wed, Dec 30, 2015 at 2:38 PM, Jacques Nadeau  wrote:

> We don't currently have a way to do something equivalent to SELECT KVGEN(*)
>  FROM T. Would you file a new feature request in JIRA for this?
>
> With regards to the dot issue, can you file a JIRA bug. We have internal
> protection against accidentally passing through complex identifiers and
> apparently Drill is not correctly escaping your dotted identifier
> throughout the engine.
>
> thanks,
> Jacques
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Tue, Dec 29, 2015 at 4:30 PM, AleCaste  wrote:
>
> > Indeed, a period in a key is still valid JSON.
> > So this should be fixed somehow within Drill.
> >
> > But also I still don’t see how to apply FLATTEN+KVGEN to a whole record
> > and not just a property/subproperty of it.
> > Anyone?
> >
> >
> >
> >
> > From: Christopher Matta
> > Sent: Wednesday, December 23, 2015 8:13 AM
> > To: user@drill.apache.org
> > Subject: Re: A field reference identifier must not have the form of a
> > qualified name
> >
> > Seems like Drill is explicitly checking for a period in the key and
> failing
> > (from FieldReference.java
> > <
> >
> https://github.com/apache/drill/blob/master/logical/src/main/java/org/apache/drill/common/expression/FieldReference.java#L54
> > >
> > ):
> >
> >
> > private void checkSimpleString(CharSequence value) { if
> (value.toString().
> > contains(".")) { throw new UnsupportedOperationException(
> > String.format( "Unhandled
> > field reference \"%s\"; a field reference identifier" + " must not have
> the
> > form of a qualified name (i.e., with \".\").", value)); } }
> >
> > Devs, is there any reason for this? As far as I know a period in a key is
> > still valid JSON.
> >
> > Chris Matta
> > cma...@mapr.com
> > 215-701-3146
> >
> > On Tue, Dec 22, 2015 at 6:10 PM, AleCaste  wrote:
> >
> > > I have a json file with the following structure:
> > >
> > > {
> > >   "0.0.1":{
> > > "version":"0.0.1",
> > > "date_created":"2014-03-15"
> > >   },
> > >   "0.1.2":{
> > > "version":"0.1.2",
> > > "date_created":"2014-05-21"
> > >   }
> > > }
> > >
> > > As you can see the whole json file contains just one object which is a
> > map
> > > in which each key is the version number and the value is a new map with
> > > version and date_created properties.
> > >
> > > I want to use Apache Drill to get a list with two columns: version and
> > > date_created
> > >
> > > But since the keys contain dots (e.g. "0.0.1") Drill throws the
> following
> > > error:
> > >
> > > Error: SYSTEM ERROR: UnsupportedOperationException: Unhandled field
> > > reference "0.0.1"; a field reference identifier must not have the form
> > of a
> > > qualified name (i.e., with ".").
> > >
> > > ... when running a query like this:
> > >
> > > SELECT KVGEN(t.*) FROM dfs.`D:/drill/sample-data/myjsonfile.json`
> AS
> > t;
> > >
> > > By the way, how do you tell KVGEN to process the WHOLE ROW since the
> row
> > > object is the actual map we want to convert?
> > >
> > > Any ideas about how to overcome this problem?
> > >
> >
>


Re: Max Buffer Size Reached

2016-02-05 Thread Hanifi Gunes
You see this exception because one of the columns in your dataset is larger
than an individual DrillBuf could store. The hard limit
is Integer.MAX_VALUE bytes. Around the time we are trying to expand one of
the buffers, we notice the allocation request is oversized and fail the
query. It would be nice if error message contained the column that raised
this issue though.

On Fri, Feb 5, 2016 at 1:39 PM, John Omernik  wrote:

> Any thoughts on how to troubleshoot this (I have some fat json data going
> into the buffers apparently) It's not huge data, just wide/complex (total
> size is 1.4 GB)  Any thoughts on how to troubleshoot or settings I can use
> to work through these errors?
>
>
> Thanks!
>
>
> John
>
>
>
>
> Error: SYSTEM ERROR: OversizedAllocationException: Unable to expand the
> buffer. Max allowed buffer size is reached.
>
>
>
> Fragment 1:11
>
>
>
> [Error Id: db21dea0-ddd7-4fcf-9fea-b5031e358dad on node1
>
>
>
>   (org.apache.drill.exec.exception.OversizedAllocationException) Unable to
> expand the buffer. Max allowed buffer size is reached.
>
> org.apache.drill.exec.vector.UInt1Vector.reAlloc():214
>
> org.apache.drill.exec.vector.UInt1Vector$Mutator.setValueCount():469
>
>
> org.apache.drill.exec.vector.complex.ListVector$Mutator.setValueCount():324
>
> org.apache.drill.exec.physical.impl.ScanBatch.next():247
>
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
>
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
>
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>
>
>
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():132
>
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
>
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
>
>
>
> org.apache.drill.exec.test.generated.StreamingAggregatorGen1931.doWork():172
>
>
>
> org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():167
>
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
>
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>
>
>
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
>
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256
>
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250
>
> java.security.AccessController.doPrivileged():-2
>
> javax.security.auth.Subject.doAs():415
>
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
>
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():250
>
> org.apache.drill.common.SelfCleaningRunnable.run():38
>
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
>
> java.lang.Thread.run():745 (state=,code=0)
>


Re: Max Buffer Size Reached

2016-02-05 Thread John Omernik
Excuse my basic questions, when you say we are you reference Drill coders?
So what is Integer.MAX_VALUE bytes? Is that a query time setting? Drillbit
setting? Is it editable?  How does that value get interpreted for complex
data types (objects and arrays).

Not only would the column be helpful, but the source file as well.  ( if
this is an individual record issueor is this a cumulative error where
the max size of the sum of the length of multiple records of a column is at
issue).


Thoughts on how as a user I could address this in my dataset?

Thanks!

On Friday, February 5, 2016, Hanifi Gunes  wrote:

> You see this exception because one of the columns in your dataset is larger
> than an individual DrillBuf could store. The hard limit
> is Integer.MAX_VALUE bytes. Around the time we are trying to expand one of
> the buffers, we notice the allocation request is oversized and fail the
> query. It would be nice if error message contained the column that raised
> this issue though.
>
> On Fri, Feb 5, 2016 at 1:39 PM, John Omernik  > wrote:
>
> > Any thoughts on how to troubleshoot this (I have some fat json data going
> > into the buffers apparently) It's not huge data, just wide/complex (total
> > size is 1.4 GB)  Any thoughts on how to troubleshoot or settings I can
> use
> > to work through these errors?
> >
> >
> > Thanks!
> >
> >
> > John
> >
> >
> >
> >
> > Error: SYSTEM ERROR: OversizedAllocationException: Unable to expand the
> > buffer. Max allowed buffer size is reached.
> >
> >
> >
> > Fragment 1:11
> >
> >
> >
> > [Error Id: db21dea0-ddd7-4fcf-9fea-b5031e358dad on node1
> >
> >
> >
> >   (org.apache.drill.exec.exception.OversizedAllocationException) Unable
> to
> > expand the buffer. Max allowed buffer size is reached.
> >
> > org.apache.drill.exec.vector.UInt1Vector.reAlloc():214
> >
> > org.apache.drill.exec.vector.UInt1Vector$Mutator.setValueCount():469
> >
> >
> >
> org.apache.drill.exec.vector.complex.ListVector$Mutator.setValueCount():324
> >
> > org.apache.drill.exec.physical.impl.ScanBatch.next():247
> >
> > org.apache.drill.exec.record.AbstractRecordBatch.next():119
> >
> > org.apache.drill.exec.record.AbstractRecordBatch.next():109
> >
> > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> >
> >
> >
> >
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():132
> >
> > org.apache.drill.exec.record.AbstractRecordBatch.next():162
> >
> > org.apache.drill.exec.record.AbstractRecordBatch.next():119
> >
> >
> >
> >
> org.apache.drill.exec.test.generated.StreamingAggregatorGen1931.doWork():172
> >
> >
> >
> >
> org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():167
> >
> > org.apache.drill.exec.record.AbstractRecordBatch.next():162
> >
> > org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> >
> >
> >
> >
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
> >
> > org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> >
> > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256
> >
> > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250
> >
> > java.security.AccessController.doPrivileged():-2
> >
> > javax.security.auth.Subject.doAs():415
> >
> > org.apache.hadoop.security.UserGroupInformation.doAs():1595
> >
> > org.apache.drill.exec.work.fragment.FragmentExecutor.run():250
> >
> > org.apache.drill.common.SelfCleaningRunnable.run():38
> >
> > java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> >
> > java.lang.Thread.run():745 (state=,code=0)
> >
>


-- 
Sent from my iThing