I loaded the data with the timestamp field unsuccessful

2017-03-07 Thread kex
I loaded the data with the timestamp field unsuccessful,and timestamp field
is null.

my sql:
carbon.sql("create TABLE IF NOT EXISTS test1 (date timestamp,id string)
STORED BY 'carbondata' TBLPROPERTIES
('DICTIONARY_INCLUDE'='date','DATEFORMAT'='date:/MM/dd')")

carbon.sql("LOAD DATA inpath 'hdfs://myha/user/carbon/testdata/test4.csv'
INTO TABLE test1 options('FILEHEADER'='date,id')")

my data test4.csv:
2017/3/23,2
2017/1/11,1
2017/9/17,3

when i select:
++---+
|date| id|
++---+
|null|  1|
|null|  2|
|null|  3|
++---+

i print time format is correct:
println(CarbonProperties.getInstance().getProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT))
/MM/dd

What could be the reason for it?
thx.




--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/I-loaded-the-data-with-the-timestamp-field-unsuccessful-tp8417.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.


subscribing

2017-03-07 Thread Yong Zhang
subscribing


Re: Improving Non-dictionary storage & performance.

2017-03-07 Thread bill.zhou
hi Jacky 
I think this is not easy for user to control if cabron is online
running. May be for one table two different load can be different
cardinality for the same column but user cannot give different dictionary
columns for one table.

Regards


Jacky Li wrote
> Hi Ravindra,
> 
> Another suggestion is that, to avoid creating trouble for user while
> loading, for single-pass, if dictionary key generated for certain column
> is more than the configured value, then the loading process should stop
> and log this error explicitly telling the cardinality of all columns. 
> By doing this, user should know what is the reason causing data load
> failure.
> How about this idea?
> 
> Regards,
> Jacky
> 
>> 在 2017年3月3日,上午1:26,Ravindra Pesala 

> ravi.pesala@

>  写道:
>> 
>> Hi Likun,
>> 
>> Yes, Likun we better keep dictionary as default until we optimize
>> no-dictionary columns.
>> As you mentioned we can suggest 2-pass for first load and subsequent
>> loads
>> will use single-pass to improve the performance.
>> 
>> Regards,
>> Ravindra.
>> 
>> On 2 March 2017 at 06:48, Jacky Li 

> jacky.likun@

>  wrote:
>> 
>>> Hi Ravindra & Vishal,
>>> 
>>> Yes, I think these works need to be done before switching no-dictionary
>>> as
>>> default. So as of now, we should use dictionary as default.
>>> I think we can suggest user to do loading as:
>>> 1. First load: use 2-pass mode to load, the first scan should discover
>>> the
>>> cardinality, and check with user specified option. We should define
>>> rules
>>> to pass or fail the validation, and finalize the load option for
>>> subsequent
>>> load.
>>> 2. Subsequent load: use single-pass mode to load, use the options
>>> defined
>>> by first load
>>> 
>>> What is your idea?
>>> 
>>> Regards,
>>> Jacky
>>> 
 在 2017年3月1日,下午11:31,Ravindra Pesala 

> ravi.pesala@

>  写道:
 
 Hi Vishal,
 
 You are right, thats why we can do no-dictionary only for String
>>> datatype.
 Please look at my first point. we can always use direct dictionary for
 possible data types like short, int, long, double & float for
>>> sort_columns.
 
 Regards,
 Ravindra.
 
 On 1 March 2017 at 18:18, Kumar Vishal 

> kumarvishal1802@

> 
>>> wrote:
 
> Hi Ravi,
> Sorting of data for no dictionary should be based on data type + same
>>> for
> filter . Please add this point.
> 
> -Regards
> Kumar Vishal
> 
> On Wed, Mar 1, 2017 at 8:34 PM, Ravindra Pesala 

> ravi.pesala@

> 
> wrote:
> 
>> Hi,
>> 
>> In order to make non-dictionary columns storage and performance more
>> efficient, I am suggesting following improvements.
>> 
>> 1. Make always SHORT, INT, BIGINT, DOUBLE & FLOAT always  direct
>> dictionary.
>>  Right now only date and timestamp are direct dictionary columns. We
> can
>> make SHORT, INT, BIGINT, DOUBLE & FLOAT Direct dictionary if these
> columns
>> are included in SORT_COLUMNS
>> 
>> 2. Consider delta/value compression while storing direct dictionary
> values.
>> Right now it always uses INT datatype to store direct dictionary
>>> values.
> So
>> we can consider value/Delta compression to compact the storage.
>> 
>> 3. Use the Separator instead of LV format to store String value in
>> no-dictionary format.
>> Currently String datatypes for non-dictionary colums are stored as
>> LV(length value) format, here we are using Short(2 bytes) as length
> always.
>> In order to keep storage compact we can use separator (0 byte as
> separator)
>> it just takes single byte. And while reading we can traverse through
>>> data
>> and get the offsets like we are doing now.
>> 
>> 4. Add Range filters for no-dictionary columns.
>> Currently range filters like greater/ less than filters are not
> implemented
>> for no-dictionary columns. So we should implement them to avoid row
>>> level
>> filter and improve the performance.
>> 
>> Regards,
>> Ravindra.
>> 
> 
 
 
 --
 Thanks & Regards,
 Ravi
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> -- 
>> Thanks & Regards,
>> Ravi





--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Improving-Non-dictionary-storage-performance-tp8146p8402.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.


Re: [DISCUSS] Apache CarbonData podling graduation

2017-03-07 Thread Niclas Hedhman
I think the wording needs tuning, and my own interpretation of the intent
has been,

Project claims itself to comply, and list any exceptions and why that is
the case.

I think that can deal with the invariance that exists.

Cheers
Niclas


On Mar 7, 2017 14:48, "Jim Apple"  wrote:

> > Actually, the Maturity Model can be a very nice framework to organize
> > incubation around.
>
> If you're talking about a platonic MM, then I agree. If you are
> talking about the MM at
> https://community.apache.org/apache-way/apache-project-maturity-model.html
> ,
> I think it needs to be much more carefully written and much more
> accurate to be a "nice framework". In particular, "aims to capture the
> invariants of Apache projects ... A mature Apache project complies
> with all the elements of this model" is wishful thinking or outdated
> or both.
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [DISCUSS] Apache CarbonData podling graduation

2017-03-07 Thread Jim Apple
> Actually, the Maturity Model can be a very nice framework to organize
> incubation around.

If you're talking about a platonic MM, then I agree. If you are
talking about the MM at
https://community.apache.org/apache-way/apache-project-maturity-model.html,
I think it needs to be much more carefully written and much more
accurate to be a "nice framework". In particular, "aims to capture the
invariants of Apache projects ... A mature Apache project complies
with all the elements of this model" is wishful thinking or outdated
or both.


Re:Re: "between and" filter query is very slow

2017-03-07 Thread 马云


Hi Dev,


I have created the jira named CARBONDATA-748 a few days ago.
Today I have fixed it for version 0.2. And created a new pull request.
Please help to confirm. thanks











At 2017-03-03 20:47:51, "Kumar Vishal"  wrote:
>Hi,
>
>Currently In include and exclude filter case when dimension column does not
>have inverted index it is doing linear search , We can add binary search
>when data for that column is sorted, to get this information we can check
>in carbon table for that column whether user has selected no inverted index
>or not. If user has selected No inverted index while creating a column this
>code is fine, if user has not selected then data will be sorted so we can
>add binary search which will improve the performance.
>
>Please raise a Jira for this improvement
>
>-Regards
>Kumar Vishal
>
>
>On Fri, Mar 3, 2017 at 7:42 PM, 马云  wrote:
>
>> Hi Dev,
>>
>>
>> I used carbondata version 0.2 in my local machine, and found that the
>> "between and" filter query is very slow.
>> the root caused is by the below code in IncludeFilterExecuterImpl.java.
>> It takes about 20s in my test.
>>  The code's  time complexity is O(n*m). I think it needs to optimized,
>> please confirm. thanks
>>
>>
>>
>>
>>
>>   private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunkdimens
>> ionColumnDataChunk,
>>
>>   intnumerOfRows) {
>>
>> BitSet bitSet = new BitSet(numerOfRows);
>>
>> if (dimensionColumnDataChunkinstanceof FixedLengthDimensionDataChunk)
>> {
>>
>>   FixedLengthDimensionDataChunk fixedDimensionChunk =
>>
>>   (FixedLengthDimensionDataChunk) dimensionColumnDataChunk;
>>
>>   byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
>>
>>
>>
>>   longstart = System.currentTimeMillis();
>>
>>   for (intk = 0; k < filterValues.length; k++) {
>>
>> for (intj = 0; j < numerOfRows; j++) {
>>
>>   if (ByteUtil.UnsafeComparer.INSTANCE
>>
>>   .compareTo(fixedDimensionChunk.getCompleteDataChunk(), j *
>> filterValues[k].length,
>>
>>   filterValues[k].length, filterValues[k], 0,
>> filterValues[k].length) == 0) {
>>
>> bitSet.set(j);
>>
>>   }
>>
>> }
>>
>>   }
>>
>>   System.out.println("loop time: "+(System.currentTimeMillis() -
>> start));
>>
>> }
>>
>>
>>
>>
>> returnbitSet;
>>
>>   }