Re: [Discuss] Removing search mode

2018-11-06 Thread Jacky Li
Currenlty, presto-carbon is missing the direct vector fill implementation which 
is implemented for spark in latest master branch.
I think we can plan this feature in next version.

Since presto’s architecture is similar to MPP, so it will execute in a pipeline 
manner, which it is good for interactive query.

Regards,
Jacky

> 在 2018年11月6日,下午8:56,xuchuanyin  写道:
> 
> +1
> 
> Q1: When will we start and finish the optimization in carbon-presto
> integration? Any plan for this?
> 
> Another question:
> Q2: Is it possible to use carbon reader to implement the similar function
> of search mode?
> 
> 
> 
> 
> --
> Sent from: 
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> 



Re: Throw NullPointerException occasionally when query from stream table

2018-11-06 Thread xm_zzc
Hi David:
  please see the call stack: 

 



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Throw NullPointerException occasionally when query from stream table

2018-11-06 Thread David CaiQiang
Where do we call SegmentPropertiesAndSchemaHolder.invalidate in handoff
thread?



-
Best Regards
David Cai
--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [Discuss] Removing search mode

2018-11-06 Thread xuchuanyin
+1

Q1: When will we start and finish the optimization in carbon-presto
integration? Any plan for this?

Another question:
 Q2: Is it possible to use carbon reader to implement the similar function
of search mode?




--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [Discuss] Removing search mode

2018-11-06 Thread Liang Chen
Hi

+1, but one suggestion,  in the future we can first try these alpha features
in the separate branch . once it is confirmed, then merge into master.

Regards
Liang


akashrn5 wrote
> +1
> yes, after search mode implementation we didnt get much advantage as
> expected and simply code will be complex, i agree with likun.
> 
> 
> 
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


How carbondata handle greater than with global dict column?

2018-11-06 Thread carbondata-newuser
For example:
version column is a dict column

explain select A  from test_carbondata.table where date='2018-09-05' and
version >= "1.8.5" ;
| == Physical Plan ==
*(1) CarbonDictionaryDecoder
[test_carbondata_m_device_distinct_for_bdindex],
ExcludeProfile(ArrayBuffer()), CarbonAliasDecoderRelation(),
org.apache.spark.sql.CarbonSession@7e1b4f54
+- *(1) Project [A#41]
   +- *(1) FileScan test_carbondata.table[A#41] PushedFilters:
[IsNotNull(date), IsNotNull(version), EqualTo(date,2018-09-05),
GreaterThanOrEqual(versio...  |

How carbon know the version greater the value in file scan when the data
save in the file is just the dict encoding?



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Throw NullPointerException occasionally when query from stream table

2018-11-06 Thread xm_zzc
Hi:
  The root cause is that when execute select sql, BlockDataMap will call
'SegmentPropertiesAndSchemaHolder.addSegmentProperties ' to add segment info
one by one, meanwhile if there are some segments updated, for example,
stream segment is handoff , handoff thread will call
'SegmentPropertiesAndSchemaHolder.invalidate' to delete segment info one by
one, if segmentIdAndSegmentPropertiesIndexWrapper.segmentIdSet.isEmpty() is
true, it will remove segmentPropertiesIndex, but select thread is still
using segmentPropertiesIndex to add/get segment info, and then NPE occur. 



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [Discuss] Removing search mode

2018-11-06 Thread akashrn5
+1
yes, after search mode implementation we didnt get much advantage as
expected and simply code will be complex, i agree with likun.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


[Discuss] Removing search mode

2018-11-06 Thread Jacky Li
Hi,

Search mode was introduced as an alpha feature in spark integration during 
CarbonData 1.4.0, its goal was to increase the concurrent query performance. 
However, after several experiments, it only show very little improvement. The 
main reason is that SparkSQL is designed based on DAG/MR compute model, thus it 
is not good at low latency high concurrent queries. A possibly better approach 
is to provide such query using presto-carbon integration. So, I propose to make 
this alpha feature obsolete and remove from master code base to reduce the 
complexity of the overall project.

Regards,
Jacky

Re: [Feature Proposal] Proposal for offline and DDL local dictionary support

2018-11-06 Thread Jacky Li
+1
Yes, I think SDK should provide local dictionary support also.

Regards,
Jacky

> 在 2018年11月5日,下午2:14,manish gupta  写道:
> 
> Hi Dev
> 
> Currently we are supporting LOCAL DICTIONARY feature during data load
> operation. The feature is very helpful in terms that it reduces the store
> size which helps is reducing the IO thereby enhancing the query performance.
> *This proposal is to extend LOCAL DICTIONARY feature and provide a separate
> DDL and offline support for this feature. This is will make this feature
> usage more flexible. The reason for proposing this feature is*:
> 
> 1. DDL support which can enable stores without local dictionary to add this
> feature for the already loaded data. This can be helpful for customers to
> leverage the functionality of LOCAL  DICTIONARY  feature for their data
> which is written in carbondata format without local dictionary.
> 2. We know that when Local dictionary is enabled, though small but there is
> degrade in data load performance. So there can be applications/customers
> who want to fine tune the loaded data in off-peak time. This feature can be
> helpful for those kind of scenarios.
> 3. Offline support is proposed for SDK like features where In we do not
> have spark driver executor model and there can be only a single thread used
> for loading data. So for this scenario we can provide an offline support
> thereby not impacting the existing data load performance.
> 
> Please let me know your suggestions for this proposal. If most of the
> community members feel the idea is good and it will make the usage of this
> feature more flexible I can come up with a design and further discuss on
> this platform.
> 
> Regards
> Manish Gupta
>