Re: Impala Sorter just sort small partition?

2017-08-04 Thread Tim Armstrong
It does sort both left and right partitions - it just recurses on the small
partition and the next iteration of the loop processes the large partition.

This is a pretty common optimisation. This page has a nice explanation:
http://www.geeksforgeeks.org/quicksort-tail-call-optimization-reducing-worst-case-space-log-n/

On Fri, Aug 4, 2017 at 6:12 PM, 俊杰陈  wrote:

> Thanks for your detail description.
>
> My question should be more specific to quicksort part. This line
>  be/src/runtime/sorter.cc#L1258>
> say
> recurse on the small partition due to stack consideration, while as my
> understanding quicksort should recurse on both left partition and right
> partition, so I'm curious how it keep one run sorted, does it sort in later
> merge sort or somewhere else?   But the merge process should take sorted
> runs as input.
>
> 2017-08-05 0:18 GMT+08:00 Tim Armstrong :
>
> > The Sorter does a 3-level hybrid sort with merge sort, quicksort and
> > insertion sort.
> >
> > SortHelper implements a 2-level hybrid in-memory sort. It fully sorts an
> > arbitrarily sized in-memory input. E.g. if 'begin' and 'end' point to the
> > begin and end of the sorted run, it will sort the full run. It does
> > quicksort recursively then switches to insertion sort once the partitions
> > are less than INSERTION_THRESHOLD = 16.
> >
> > Sorter also supports an external merge sort - if the full input doesn't
> fit
> > in memory, it sorts in-memory runs with SortHelper() then does merge sort
> > with the sorted runs.
> >
> > On Thu, Aug 3, 2017 at 11:13 PM, 俊杰陈  wrote:
> >
> > > Hi
> > > I'm looking Sorter.cc and found that Sorter::SortHelper just sort
> smaller
> > > partition. Is there anything I missed?
> > >
> > > --
> > > Thanks & Best Regards
> > >
> >
>
>
>
> --
> Thanks & Best Regards
>


Re: Impala Sorter just sort small partition?

2017-08-04 Thread 俊杰陈
Thanks for your detail description.

My question should be more specific to quicksort part. This line

say
recurse on the small partition due to stack consideration, while as my
understanding quicksort should recurse on both left partition and right
partition, so I'm curious how it keep one run sorted, does it sort in later
merge sort or somewhere else?   But the merge process should take sorted
runs as input.

2017-08-05 0:18 GMT+08:00 Tim Armstrong :

> The Sorter does a 3-level hybrid sort with merge sort, quicksort and
> insertion sort.
>
> SortHelper implements a 2-level hybrid in-memory sort. It fully sorts an
> arbitrarily sized in-memory input. E.g. if 'begin' and 'end' point to the
> begin and end of the sorted run, it will sort the full run. It does
> quicksort recursively then switches to insertion sort once the partitions
> are less than INSERTION_THRESHOLD = 16.
>
> Sorter also supports an external merge sort - if the full input doesn't fit
> in memory, it sorts in-memory runs with SortHelper() then does merge sort
> with the sorted runs.
>
> On Thu, Aug 3, 2017 at 11:13 PM, 俊杰陈  wrote:
>
> > Hi
> > I'm looking Sorter.cc and found that Sorter::SortHelper just sort smaller
> > partition. Is there anything I missed?
> >
> > --
> > Thanks & Best Regards
> >
>



-- 
Thanks & Best Regards


Re: Impala Sorter just sort small partition?

2017-08-04 Thread Tim Armstrong
The Sorter does a 3-level hybrid sort with merge sort, quicksort and
insertion sort.

SortHelper implements a 2-level hybrid in-memory sort. It fully sorts an
arbitrarily sized in-memory input. E.g. if 'begin' and 'end' point to the
begin and end of the sorted run, it will sort the full run. It does
quicksort recursively then switches to insertion sort once the partitions
are less than INSERTION_THRESHOLD = 16.

Sorter also supports an external merge sort - if the full input doesn't fit
in memory, it sorts in-memory runs with SortHelper() then does merge sort
with the sorted runs.

On Thu, Aug 3, 2017 at 11:13 PM, 俊杰陈  wrote:

> Hi
> I'm looking Sorter.cc and found that Sorter::SortHelper just sort smaller
> partition. Is there anything I missed?
>
> --
> Thanks & Best Regards
>


Re: Loading tpc-ds

2017-08-04 Thread Matthew Jacobs
Yes yours might have been different. Looks like Tim's gvo and mine
failed with very similar looking errors though.

On Thu, Aug 3, 2017 at 9:52 PM, Jim Apple  wrote:
> When I saw this, there was a "FATAL" in hive.log, so perhaps they are
> different.
>
> https://issues.apache.org/jira/browse/IMPALA-5663
>
> https://jenkins.impala.io/job/ubuntu-14.04-from-scratch/1827/artifact/Impala/logs_static/logs/cluster/hive/hive.log/*view*/
>
> On Thu, Aug 3, 2017 at 9:09 PM, Matthew Jacobs  wrote:
>
>> Just saw this error again. I filed IMPALA-5765.
>>
>> On Mon, Jul 31, 2017 at 8:05 PM, Tim Armstrong 
>> wrote:
>> > It looks like the same error:
>> >
>> > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
>> > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>> > /test-warehouse/tpcds.store_sales/.hive-staging_hive_2017-
>> 07-31_23-55-05_306_8385818677737494274-760/_task_
>> tmp.-ext-1/ss_sold_date_sk=2450988/_tmp.00_0
>> > could only be replicated to 0 nodes instead of minReplication (=1).
>> There
>> > are 3 datanode(s) running and no node(s) are excluded in this operation.
>> > at
>> > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.
>> chooseTarget4NewBlock(BlockManager.java:1724)
>> > at
>> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(
>> FSNamesystem.java:3385)
>> > at
>> > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.
>> addBlock(NameNodeRpcServer.java:683)
>> > at
>> > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClie
>> ntProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:214)
>> > at
>> > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSi
>> deTranslatorPB.addBlock(ClientNamenodeProtocolServerSi
>> deTranslatorPB.java:495)
>> > at
>> > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$
>> ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.
>> java)
>> > at
>> > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(
>> ProtobufRpcEngine.java:617)
>> > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
>> > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
>> > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> > org.apache.hadoop.security.UserGroupInformation.doAs(
>> UserGroupInformation.java:1917)
>> > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)
>> >
>> > at
>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator.
>> processOp(FileSinkOperator.java:751)
>> > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>> > at
>> > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(
>> SelectOperator.java:84)
>> > at
>> > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(
>> ExecReducer.java:244)
>> > ... 8 more
>> > Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException):
>> File
>> > /test-warehouse/tpcds.store_sales/.hive-staging_hive_2017-
>> 07-31_23-55-05_306_8385818677737494274-760/_task_
>> tmp.-ext-1/ss_sold_date_sk=2450988/_tmp.00_0
>> > could only be replicated to 0 nodes instead of minReplication (=1).
>> There
>> > are 3 datanode(s) running and no node(s) are excluded in this operation.
>> > at
>> > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.
>> chooseTarget4NewBlock(BlockManager.java:1724)
>> > at
>> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(
>> FSNamesystem.java:3385)
>> > at
>> > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.
>> addBlock(NameNodeRpcServer.java:683)
>> > at
>> > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClie
>> ntProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:214)
>> > at
>> > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSi
>> deTranslatorPB.addBlock(ClientNamenodeProtocolServerSi
>> deTranslatorPB.java:495)
>> > at
>> > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$
>> ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.
>> java)
>> > at
>> > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(
>> ProtobufRpcEngine.java:617)
>> > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
>> > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
>> > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> > org.apache.hadoop.security.UserGroupInformation.doAs(
>> UserGroupInformation.java:1917)
>> > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)
>> >
>> > at org.apache.hadoop.ipc.Client.call(Client.java:1502)
>> > at org.apache.hadoop.ipc.Client.call(Client.java:1439)

Re: Re: Impala Java API

2017-08-04 Thread Silvius Rus
I'm not sure what you mean by "cached" data.

Check out the docs.  You can run these commands as JDBC statements.
http://impala.apache.org/docs/build/html/topics/impala_refresh.html#refresh
http://impala.apache.org/docs/build/html/topics/impala_invalidate_metadata.html#invalidate_metadata

On Thu, Aug 3, 2017 at 7:01 PM, zhangwenyang 
wrote:

> Hi, Silvius
>
> there are three level refresh: clear cache, increment update metadata,
> invalidate metadata,
> 1) the data get by jdbc, is the cached data or real data?
> 2)  how to refresh the impala data source at the three level?
>
> Thanks
>
>
>
> zhangwenyang
>
> From: Silvius Rus
> Date: 2017-08-03 23:47
> To: dev@impala.incubator.apache.org
> CC: zhangwenyang; dev
> Subject: Re: Impala Java API
> Can you use JDBC?
>
> Silvius
>
> > On Aug 2, 2017, at 9:24 PM, Henry Robinson  wrote:
> >
> > Impala's clients all communicate with Impala via Apache Thrift, which is
> a
> > serialization and RPC format that has bindings for multiple languages. In
> > fact, Impala generates Thrift stubs for Java when the frontend is built
> > (see for example
> > ${IMPALA_HOME}/fe/generated-sources/gen-java/org/apache/impala/thrift
> > ImpalaHiveServer2Service.java).
> >
> > Your best bet is to write a client to the HiveServer2Service.
> >
> > Henry
> >
> >> On 2 August 2017 at 19:44, zhangwenyang 
> wrote:
> >>
> >> Hi,
> >>
> >> we team want use impala for in-time query.
> >> But we can't find java API, "refresh" for example.
> >> If we want to refresh and get data from java coding backend-service,
> what
> >> we should do?
> >>
> >> Thanks
> >>
> >>
> >>
> >>
> >> zhangwenyang
> >>
> >>
> >> 
> >> ---
> >> Confidentiality Notice: The information contained in this e-mail and any
> >> accompanying attachment(s)
> >> is intended only for the use of the intended recipient and may be
> >> confidential and/or privileged of
> >> Neusoft Corporation, its subsidiaries and/or its affiliates. If any
> reader
> >> of this communication is
> >> not the intended recipient, unauthorized use, forwarding, printing,
> >> storing, disclosure or copying
> >> is strictly prohibited, and may be unlawful.If you have received this
> >> communication in error,please
> >> immediately notify the sender by return e-mail, and delete the original
> >> message and all copies from
> >> your system. Thank you.
> >> 
> >> ---
>
>
> 
> ---
> Confidentiality Notice: The information contained in this e-mail and any
> accompanying attachment(s)
> is intended only for the use of the intended recipient and may be
> confidential and/or privileged of
> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader
> of this communication is
> not the intended recipient, unauthorized use, forwarding, printing,
> storing, disclosure or copying
> is strictly prohibited, and may be unlawful.If you have received this
> communication in error,please
> immediately notify the sender by return e-mail, and delete the original
> message and all copies from
> your system. Thank you.
> 
> ---
>


Impala Sorter just sort small partition?

2017-08-04 Thread 俊杰陈
Hi
I'm looking Sorter.cc and found that Sorter::SortHelper just sort smaller
partition. Is there anything I missed?

-- 
Thanks & Best Regards