Re: New Topic: Drill Visualization and Use Support

Tom Barber Sat, 24 Oct 2015 04:08:09 -0700

Hi chaps

If you are looking for open source data vis you could use Pentaho BI
server, Spago BI, Saiku Analytics, to name a few.


They are all open source just not under the Apache umbrella.

(I do write saiku, this is not an endorsement just a selection of tools we
use in the open source business intelligence market regularly )

Regards

Tom
On 24 Oct 2015 11:27, "ganesh" <[email protected]> wrote:

> Hi,
>
> Regarding apache zeppelin: I think currently it does not support Apache
> drill connectivity by default. I saw following on a site:
>
> https://wiki.apache.org/incubator/ZeppelinProposal
> Initial Goals The initial goals will be to move the existing codebase to
> Apache and integrate with the Apache development process. This includes
> moving all infrastructure that we currently maintain, such as: a website, a
> mailing list, an issues tracker and a Jenkins CI, as mentioned in “Required
> Resources” section of current proposal. Once this is accomplished, *we plan
> for incremental development and releases that follow the Apache guidelines.
> To increase adoption the major goal for the project would be to provide
> integration with as much projects from Apache data ecosystem as possible,
> including new interpreters for Apache Hive, Apache Drill and adding
> Zeppelin distribution to Apache Bigtop*.
>
> Currently I was not able to test much with Tableau .. Didnt get much time
> to spend + whatever I tried till now hasnt been fruitfull with Tableau.
> Will spend some time more ...
>
>
>
> On Thu, Oct 22, 2015 at 8:16 PM, Andries Engelbrecht <
> [email protected]> wrote:
>
> > See if the videos on this page helps you.
> >
> > https://www.mapr.com/products/apache-drill <
> > https://www.mapr.com/products/apache-drill>
> >
> >
> >
> >
> > —Andries
> >
> >
> > > On Oct 22, 2015, at 7:38 AM, ganesh <[email protected]> wrote:
> > >
> > > Hello,
> > >
> > > Are there any other links as tutorial for TABLEAU v/s HIVE
> > >
> > > I have already gone through the one in apache-drill site. I am not able
> > to
> > > proceed with those.
> > >
> > > On Thu, Oct 22, 2015 at 8:03 PM, Andries Engelbrecht <
> > > [email protected]> wrote:
> > >
> > >> Hive should be visible and usable in Tableau. You can use Drill Views
> > for
> > >> dfs data, or you can Tableau Custom SQL to access the data.
> > >>
> > >> Make sure to install the Tableau TDC file that comes with the ODBC
> > driver.
> > >> https://drill.apache.org/docs/installing-the-tdc-file-on-windows/ <
> > >> https://drill.apache.org/docs/installing-the-tdc-file-on-windows/>
> > >>
> > https://drill.apache.org/docs/using-apache-drill-with-tableau-9-desktop/
> <
> > >>
> > https://drill.apache.org/docs/using-apache-drill-with-tableau-9-desktop/
> >
> > >>
> > >>
> > >> —Andries
> > >>
> > >>
> > >>> On Oct 22, 2015, at 6:40 AM, ganesh <[email protected]>
> wrote:
> > >>>
> > >>> Hi John,
> > >>>
> > >>> Thanks for suggesting new name:  Apache Zeppelin
> > >>>
> > >>> I was currently trying 14 Days trial version of TABLEAU with not much
> > >> success.
> > >>> Today only I knew that for files in Hadoop or local file system, I
> > would
> > >> need to create view.
> > >>>
> > >>> Still, though I can see my Tables from HIVE in Tableau, I cannot see
> > any
> > >> data.
> > >>> Nor I am able to use TABLEAU from the links help given in
> apache-drill
> > >> currently (http://drill.apache.org/docs/tableau-examples/ <
> > >> http://drill.apache.org/docs/tableau-examples/>)
> > >>>
> > >>> Snapshot attached, Incase you have worked over TABLEAU
> > >>>
> > >>> I will look into Zeppelin also.
> > >>>
> > >>>
> > >>> On Thu, Oct 22, 2015 at 6:43 PM, John Omernik <[email protected]
> > <mailto:
> > >> [email protected]>> wrote:
> > >>> I separated my response from the original topic to keep any responses
> > >> there
> > >>> focused on the design document.
> > >>>
> > >>> As to ways to use Drill, I have been working with SQL Squirrel quite
> > >>> successfully.  On my list of things to do, if you want to stay in the
> > >>> Apache world is looking at Apache Zeppelin.  In the Git Repo, there
> is
> > a
> > >>> Drill Plugin so you can run SQL again Drill, look at results, and do
> > >> basic
> > >>> visualizations, I have been trying to way until the PR is merged on
> > >>> Zeppelin, but for your use case, you may want to grab the plugin code
> > and
> > >>> run with it.
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Thu, Oct 22, 2015 at 6:06 AM, ganesh <[email protected]
> > >> <mailto:[email protected]>> wrote:
> > >>>
> > >>>> Hello,
> > >>>>
> > >>>> John, you seem to be quite impressed with apache drill .. nice.
> > >>>> I am new to un-structured world and just started 1 week back on
> APACHE
> > >>>> DRILL after suggestion from my collegues. We have a semi structured
> > >> data
> > >>>> where we have constraint that we do not know number of columns
> > >>>>
> > >>>> I heard that APACHE DRILL is column free applicationa nd with
> support
> > >> of
> > >>>> JSON format, it allows to create columns on-fly,
> > >>>> I converted my data from CSV-like-format to JSON and trying to
> figure
> > >> out
> > >>>> if it will work for me.
> > >>>>
> > >>>> Here I hit two issues :-
> > >>>> 1) My column were like : 3100.2.1.2 <tel:3100.2.1.2>  and values
> like
> > >> '-2303" or
> > >>>> '01/01/2015
> > >>>> 02:02:00"
> > >>>>
> > >>>> Challenge was that column cant be started with Numeric value. So I
> had
> > >> to
> > >>>> change key as: "t3100.2.1.2"
> > >>>> After that things were quite OK,
> > >>>>
> > >>>> Now I need some help from you guys. To proceed I have to present my
> > >> work to
> > >>>> management as an example.
> > >>>> But querying on apache drill console, doesnt seem to be an
> attractive
> > >> way
> > >>>> to present things.
> > >>>>
> > >>>> I tried drill explorer too.But didnt find that so good.
> > >>>> One thing to note, I am playing with files on Hadoop standalone mode
> > in
> > >>>> UBUNTU.
> > >>>>
> > >>>> To make it appear more good looking, I started with QLIK SENSE ..
> but
> > >> was
> > >>>> unable to connect it with hadoop file system. It only showed me HIVE
> > >> FILES.
> > >>>> Then I downloaded TABLEAU Trial version ... but I am unable to get
> > >> Hadoop
> > >>>> data here too...
> > >>>>
> > >>>> Please help me how to proceed. I have presentation on coming Monday.
> > >>>> Queries are quite ready .. I just need to show in visualization form
> > >>>> ........ using OPEN SOURCE applications only.
> > >>>>
> > >>>>
> > >>>> Guys please help me.
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Wed, Oct 21, 2015 at 6:43 PM, John Omernik <[email protected]
> > >> <mailto:[email protected]>> wrote:
> > >>>>
> > >>>>> AWESOME!
> > >>>>>
> > >>>>> I had just been in the process of writing up a long user story to
> > >> ask for
> > >>>>> and support exactly this.   I modified it and included it here:
> > >>>>>
> > >>>>>
> > >>>>> To start out, I want to say how much I love the Drill project, and
> > >> the
> > >>>>> potential it has. I've put this together based on my experiences
> and
> > >> want
> > >>>>> to contribute a perspective as a user, not just put a bunch of
> > >> critiques
> > >>>> in
> > >>>>> an email.  I hope it's all taken in that spirit.  Additional note,
> I
> > >>>> wrote
> > >>>>> this prior to seeing the Design Document share by Hsuan Yi Chu
> > >> yesterday.
> > >>>>> If you are reading it, and think to yourself “that wording is odd…”
> > >>>> please
> > >>>>> consider it from the “I didn’t want to throw away the user story”
> > >>>>> perspective and the “I wrote it before the design doc” perspective.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> Additionally, I understand that some of what I am suggesting may
> not
> > >> be
> > >>>>> easy from a development perspective.  I am just being upfront with
> my
> > >>>>> experience, so we can look to determine what can be done; I am not
> > >>>> looking
> > >>>>> for a silver bullet here, just looking for improvement.  Some may
> be
> > >> as
> > >>>>> simple as better documentation, other suggestions may be harder to
> > >>>>> implement.  Either way, I thought a verbose user story might be
> > >> useful to
> > >>>>> the community as a whole.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> John
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> *User Story*
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> As I have been working with Drill for data exploration, I came
> across
> > >>>>> multiple "things" that just were hard.  In dealing with some data,
> > >>>>> especially JSON data, it can be ugly, and scaled ugly is even
> worse!
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> For this story, I am working with a JSON dump from MongoDB, and you
> > >> would
> > >>>>> think it would be well structured, and for the most part it is.
> > >> There
> > >>>> are
> > >>>>> some application level mistakes that were made (I will go into that
> > >> in a
> > >>>>> moment), but in general Drill handles this well.  So with this data
> > >> set,
> > >>>>> there are a few main challenges I am seeing:
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> 1.     When there is a field that has a float, and then a later
> > >> record
> > >>>> has
> > >>>>> the number 0 in it (which Drill takes as a INT). This is a known
> > >> problem
> > >>>>> and one that Drill has a solution for.
> > >>>>>
> > >>>>> 2.     When there is a field is of one type (a map) and then a
> later
> > >>>> record
> > >>>>> has a string in it.  No easy solution here.
> > >>>>>
> > >>>>> 3.     Select * where there is a json field with a . in the name. I
> > >> won’t
> > >>>>> go into details here, but I feel this factors into data
> exploration,
> > >>>>> because it changes the ability to “stay in Drill” to explore their
> > >> data (
> > >>>>> https://issues.apache.org/jira/browse/DRILL-3922 <
> > >> https://issues.apache.org/jira/browse/DRILL-3922>)
> > >>>>>
> > >>>>> 4.     Error reporting challenges
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> With the problem summary laid out, I wanted to walk through my
> > >> process in
> > >>>>> working with this data, and where, if I were a user Drill could
> have
> > >> been
> > >>>>> much more helpful to the process.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> Here is a description of the process I went through:
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> 1.     Copy data into filesystem
> > >>>>>
> > >>>>> 2.     Use drill to “Select * from `path_to/dump.json` limit 1
> > >>>>>
> > >>>>> 3.     (I just want to see what it looks like!)
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> Here I get this error:
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>> select * from `path_to/ dump.json` limit 1;
> > >>>>>
> > >>>>> Error: DATA_READ ERROR: You tried to write a BigInt type when you
> are
> > >>>> using
> > >>>>> a ValueWriter of type NullableFloat8WriterImpl.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> File  /data/dev/path_to/dump.json
> > >>>>>
> > >>>>> Record  1
> > >>>>>
> > >>>>> Line  1
> > >>>>>
> > >>>>> Column  9054
> > >>>>>
> > >>>>> Field  entropy
> > >>>>>
> > >>>>> Fragment 0:0
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> This isn’t incredibly helpful from a user perspective.  I.e. When I
> > >>>> Google
> > >>>>> around, I realize now that in the docs it talks about “Schema
> > >> Changes”
> > >>>> and
> > >>>>> one possible item is use the setting below. However, examples of
> the
> > >> data
> > >>>>> that was trying to be displayed (with it’s implied type) may help
> > >> users
> > >>>>> grok what is happening.  At least in this case it showed me the
> field
> > >>>> name!
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> ALTER SYSTEM SET `store.json.read_numbers_as_double` = true;
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> This is a great example where since we have known use case (when
> > >> numbers
> > >>>>> are doubles but someone tries to store 0 an INT) it fails, thus
> dev’s
> > >>>> have
> > >>>>> added a setting to allow a user to get through that, that the error
> > >>>> message
> > >>>>> could be more helpful.   In this case, Showing two record numbers
> > >> (line
> > >>>>> numbers) with different types, the field values with their implied
> > >> types,
> > >>>>> and perhaps a suggestion about using the setting to address the
> > >> problem.
> > >>>>> This could make it more intuitive for the user to stay in Drill,
> and
> > >> stay
> > >>>>> in the data.   In this case, I looked at a head of the file, and
> saw
> > >> the
> > >>>>> issue and was able to proceed.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> Also, as a corollary here, the user documentation does not show
> this
> > >>>> error
> > >>>>> related to the schema change problem. This would be a great place
> to
> > >>>> state,
> > >>>>> “if you see an error that looks like X, this is what is happening
> and
> > >>>> what
> > >>>>> you can do for it.”
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> *Side node on documentation*
> > >>>>>
> > >>>>> We should look to have documentation try to be role based.   In
> this
> > >>>> case,
> > >>>>> the documentation says use “ALTER SYSTEM” I would argue, and I am
> > >>>> guessing
> > >>>>> others would concur, that for this use case, “ALTER SESSION” may
> be a
> > >>>>> better suggestion as this is specific alteration to address the use
> > >> case
> > >>>> of
> > >>>>> loading/querying a specific data set, and is likely done by a user
> > >> of the
> > >>>>> system.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> If a user is doing self-serve data, then in an enterprise
> > >> environment,
> > >>>> they
> > >>>>> may not have the ability to use ALTER SYSTEM and get an error, thus
> > >> may
> > >>>> be
> > >>>>> confused on how to proceed.   In addition ALTER SYSTEM by a user
> who
> > >>>>> doesn’t understand that they are changing, yet have the rights to
> > >> change,
> > >>>>> may introduce future data problems they didn’t expect.   I like
> that
> > >> the
> > >>>>> default is a more constrictive method, because it makes people be
> > >>>> explicit
> > >>>>> about data, yet the documentation should also aim to be explicit
> > >> about
> > >>>>> something like a system wide change.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> *Back to the story*
> > >>>>>
> > >>>>> Ok so now I will do ALTER SESSION SET on the read_numbers_as_double
> > >>>> setting
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> I run the query again.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>> select * from `path_to/dump.json` limit 1;
> > >>>>>
> > >>>>> Error: DATA_READ ERROR: Error parsing JSON - You tried to write a
> > >> VarChar
> > >>>>> type when you are using a ValueWriter of type SingleMapWriter.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> File  /data/dev/path_to/dump.json
> > >>>>>
> > >>>>> Record  4009
> > >>>>>
> > >>>>> Fragment 0:0
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> Another error   But what does this one mean? Ok, now that I have
> been
> > >>>>> living in the docs and in the Drill user list, and because it’s
> > >> similar
> > >>>> to
> > >>>>> the schema change issue, that that is what we are looking at here.
> > >>>> Instead
> > >>>>> of double to int, we have one field that is map most of the time,
> > >> and in
> > >>>>> some cases it’s a string.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> But this doesn’t really help me as a user.  To troubleshoot this
> > >> Drill
> > >>>>> doesn’t offer any options. This file is 500 MB of dense and nested
> > >> JSON
> > >>>>> data with 51k records.   My solution? I took the record number,
> then
> > >> I
> > >>>> went
> > >>>>> to my NFS mounted clustered file system (thank goodness I had MapR
> > >> here,
> > >>>> I
> > >>>>> am not sure how I would have done this with Posix tools)
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> My command: $ head -4009 dump.json|tail -1
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> That (I hoped) showed me the record in question, note the error
> from
> > >>>> Drill
> > >>>>> didn’t tell me which field was at fault here, so I had to visually
> > >> align
> > >>>>> things to address that.  However, I was able to spot the difference
> > >> and
> > >>>>> work with the dev to understand why that happened. I removed those
> > >>>> records,
> > >>>>> and things worked correctly.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> Could there have been a way to identify that within drill? My
> > >> solution
> > >>>> was
> > >>>>> to take a python script and read through, and discard those records
> > >> that
> > >>>>> were not a map, however, on 500MB that can work, but what about 500
> > >> GB?
> > >>>> I
> > >>>>> guess a Spark job could clean the data…. But could Drill be given
> > >> some
> > >>>>> tools to help with this situation?
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> For example, the first thing I said was: What field is at issue?  I
> > >> had
> > >>>> no
> > >>>>> way to see what was up there.  I had to use other tools to see the
> > >> data
> > >>>> so
> > >>>>> I could understand the problem. Then when I understood the problem,
> > >> I had
> > >>>>> to use Python to produce data that was queryable.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> Based on the design document Hsuan Yi Chu just posted to the
> mailing
> > >>>> list,
> > >>>>> at this point my post is just a user story to support the design
> > >>>> document.
> > >>>>> To summarize the points I’d like to see included in the design
> > >> document
> > >>>>> (from a user perspective), not understanding “how or why”:
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> *1.     **Error messages that are more verbose in explaining the
> > >> problem*
> > >>>>>
> > >>>>> a.     Filename, row number, column number or name
> > >>>>>
> > >>>>> b.     Option to output the “offending row”
> > >>>>>
> > >>>>> c.     Showing the data that is causing the error WITH the type
> Drill
> > >>>>> inferred.
> > >>>>>
> > >>>>> d.     If there are options to help work through dirty data,
> perhaps
> > >> the
> > >>>>> error message could include those: “Data was an double, then drill
> > >> found
> > >>>>> this data: 0 that was a int in File x, at row 24 in column
> > >>>> “myfloatingdata”
> > >>>>> consider using store.json.read_numbers_as_double to address the
> > >> issue.
> > >>>>>
> > >>>>> 2.     *A way to determine how common this exception is*
> > >>>>>
> > >>>>> a.     If I am playing with a messy data set, and this error
> happens,
> > >>>> does
> > >>>>> it happen on 1 record? 2? 5000?  Knowing that information would:
> > >>>>>
> > >>>>>                                               i.     Help users
> > >>>> understand
> > >>>>> how Drill is seeing that particular column
> > >>>>>
> > >>>>>                                             ii.     Make decisions
> > >> on
> > >>>>> excluding data rather than just removing it. What if the first 10
> > >> records
> > >>>>> were errors, and then you excluded the remaining 10 million because
> > >> they
> > >>>>> were correct yet different from the first 10?
> > >>>>>
> > >>>>> b.     Perhaps there could be a “stats” function that only works if
> > >> it’s
> > >>>>> the only selected item or if the select is all those functions
> (stats
> > >>>>> functions)?
> > >>>>>
> > >>>>>                                               i.     Select
> > >>>>> type_stats(fieldsname) from data
> > >>>>>
> > >>>>>                                             ii.      (that wouldn’t
> > >>>> error
> > >>>>> on different types)
> > >>>>>
> > >>>>> 3.     *An ability to set a “return null on this field if error or
> > >> if non
> > >>>>> castable to X type, especially in a view, perhaps in a function.*
> > >>>>>
> > >>>>> a.     Allow them to not have to reparse data outside drill
> > >>>>>
> > >>>>> b.     Load it into a sane format (one time loads/ETL to clean
> data)
> > >>>>>
> > >>>>> c.     Not be system or session wide exception.
> > >>>>>
> > >>>>>                                               i.     I think this
> is
> > >>>>> important because I may have a field where I want it to read the
> > >> numbers
> > >>>> as
> > >>>>> double, but what if I have another field in the same dataset where
> I
> > >>>> don’t
> > >>>>> want it to read the numbers as double? A SYSTEM or SESSION level
> > >> variable
> > >>>>> takes away that granularity
> > >>>>>
> > >>>>> d.     Select field1, CASTORNULL(field2, int) as field2,
> > >>>> CASTORNULL(field3,
> > >>>>> double) as field3 from ugly_data.
> > >>>>>
> > >>>>> e.     That’s an example when it’s in the select, but I Could see a
> > >> where
> > >>>>> clause
> > >>>>>
> > >>>>> f.      Select field1, field2, field3 from ugly data where
> > >> ISTYPE(field2,
> > >>>>> int) and ISTYPE(field3, double)
> > >>>>>
> > >>>>> 4.     *Updating of the documentation related to ALTER SESSION vs
> > >> ALTER
> > >>>>> SYSTEM with an eye to the context of the majority use case of the
> > >>>>> documented feature*
> > >>>>>
> > >>>>> a.     For data loads, the documentation uses ALTER SYSTEM and
> that’s
> > >>>>> problematic because:
> > >>>>>
> > >>>>>                                               i.     Not all users
> > >> have
> > >>>>> the privileges to issue an ALTER SYSTEM. Thus a new user trying to
> > >> figure
> > >>>>> things out may not realize they can just ALTER SESSION after
> getting
> > >> an
> > >>>>> ALTER SYSTEM error.
> > >>>>>
> > >>>>>                                             ii.     ALTER SYSTEM on
> > >> data
> > >>>>> loading items, especially in areas that make Drill’s data
> > >> interpretation
> > >>>>> more permissive can lead to unintended consequences later. An
> admin,
> > >> who
> > >>>>> may be a good systems admin, and helps a data user troubleshoot and
> > >> error
> > >>>>> may issue an ALTER SYSTEM not realizing this changes all future
> data
> > >>>>> imports.
> > >>>>>
> > >>>>> b.     Note, I found a few cases, but I would suggest a thorough
> > >> review
> > >>>> of
> > >>>>> the various use cases throughout the documentation, and in areas
> > >> where it
> > >>>>> really could be either, have a small paragraph indicating the
> > >>>> ramifications
> > >>>>> of either command.
> > >>>>>
> > >>>>> *5.     **A Philosophy within the Drill Community to “Stay in
> Drill”
> > >> for
> > >>>>> data exploration*
> > >>>>>
> > >>>>> a.     This is obviously not as much of a development thing as a
> > >> mindset.
> > >>>>> If someone says “I tried to do X, and I got and error” and the
> > >>>> communities
> > >>>>> response is Y where Y is “Look through your data and do Z to it so
> > >> Drill
> > >>>>> can read it” then we should reconsider that scenario and try to
> > >> provide
> > >>>> and
> > >>>>> option within Drill to intuitively handle the edge case.  This is
> > >>>>> difficult.
> > >>>>>
> > >>>>> b.     There are cases even in the documentation where this is the
> > >> case:
> > >>>>> https://drill.apache.org/docs/json-data-model/ <
> > >> https://drill.apache.org/docs/json-data-model/> talking about arrays
> at
> > >>>> the
> > >>>>> root level or reading some empty arrays.  In these cases, we have
> to
> > >>>> leave
> > >>>>> drill to fix the problem. This works on small data, but may not
> work
> > >> on
> > >>>>> large or wide data. Consider the  array at root level limitation.
> > >> What
> > >>>> if
> > >>>>> some process out of the users control produces 1000 100mb json
> files
> > >> and
> > >>>> we
> > >>>>> want to read that. To fix it, we have to address those files. Lots
> of
> > >>>> work
> > >>>>> there, either manual or automated.
> > >>>>>
> > >>>>> c.     Once again I know this isn’t easy, but we shouldn’t answer
> > >>>> questions
> > >>>>> about how to do something by saying “fix this outside of Drill so
> > >> Drill
> > >>>> can
> > >>>>> read your data” if at all possible.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> I hope this story helps support the design document presented.  I
> am
> > >>>> happy
> > >>>>> to participate in more discussion around these topics as I have
> > >> enjoying
> > >>>>> digging into the internals of Drill
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> John Omernik
> > >>>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> *Name: Ganesh Semalty*
> > >>>> *Location: Gurgaon,Haryana(India)*
> > >>>> *Email Id: [email protected] <mailto:
> > [email protected]>
> > >> <[email protected] <mailto:[email protected]>>*
> > >>>>
> > >>>>
> > >>>> P
> > >>>>
> > >>>> *Please consider the environment before printing this e-mail - SAVE
> > >> TREE.*
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Name: Ganesh Semalty
> > >>> Location: Gurgaon,Haryana(India)
> > >>> Email Id: [email protected] <mailto:
> [email protected]>
> > >>>
> > >>> P
> > >>> Please consider the environment before printing this e-mail - SAVE
> > TREE.
> > >>
> > >>
> > >
> > >
> > > --
> > > *Name: Ganesh Semalty*
> > > *Location: Gurgaon,Haryana(India)*
> > > *Email Id: [email protected] <[email protected]>*
> > >
> > >
> > > P
> > >
> > > *Please consider the environment before printing this e-mail - SAVE
> > TREE.*
> >
> >
>
>
> --
> *Name: Ganesh Semalty*
> *Location: Gurgaon,Haryana(India)*
> *Email Id: [email protected] <[email protected]>*
>
>
> P
>
> *Please consider the environment before printing this e-mail - SAVE TREE.*
>

Re: New Topic: Drill Visualization and Use Support

Reply via email to