date:20161021

Re: questions about carbondata

2016-10-21 Thread weijie tong

tks for the reply, for 3,I still want to know that whether all the  blocklets
of all the blocks store sequence according to the sorted mdk  key? if so ,
the global sequence mdk key of the carbon table would behave like what
hbase rowkey does . or the sequence is block local ,the block index file
manage the block level index?

On Fri, Oct 21, 2016 at 5:48 PM, 杰 <2550062...@qq.com> wrote:

> hi,
> 1. correct.
>one carbon file is same as one block, one block has many blocklets as
> well as one file footer which has metadata(btree index) of blocklets.
>one load makes one segment,in one segment has many blocks.
> 2. carbon will sort dim column data in one blocklet,  then the row
> sequence will lost, so carbon will store  dim column data as will as row id
> together,
>and dim column data sorted and row id sequence changed correspondingly
> , so the matchup(like Array: index => data) is kept.
>when query, carbon will first get  the expected dim column data (based
> on filter), then accorfing to matchup to get row id.
>then based on the row id, we can get measure data.
>so the column data is called as inverted index, which means data =>
> index, not index => data.
> 3. yes.
>
>
>
>
> -- 原始邮件 --
> 发件人: "weijie tong";;
> 发送时间: 2016年10月21日(星期五) 下午4:01
> 收件人: "dev";
>
> 主题: questions about carbondata
>
>
>
> 1,what's the relation ship between these term?
>  carbondata file ,block, blocklet ,carbondata file footer ? once we have a
> batch job to load data into a carbondata table, does that mean the table
> file will be composed by different blocks ,and each block is a carbondata
> file  which is composed by many blocklets ,and one FileFooter  according to
> the carbondata file format ?
>
> 2, how does the column data store as inverted index?
>  invert the dim column data to what ? how does inverted index affect a
> query ?
>
> 3. does all the blocklets store sequence according to the sorted mdk  key ?
>
> hope someone can give a detail answer.
>

[GitHub] incubator-carbondata pull request #249: [CARBONDATA-329] constant final clas...

2016-10-21 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/249#discussion_r84567557
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -19,866 +19,859 @@
 
 package org.apache.carbondata.core.constants;
 
-public final class CarbonCommonConstants {
+public interface CarbonCommonConstants {
--- End diff --

@ravipesala @gvramana @Vimal-Das 
What is your opinion. Refer to 
[this](http://stackoverflow.com/questions/320588/interfaces-with-static-fields-in-java-for-sharing-constants)
 saying constant in interface is not a good practice. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #253: trivial fix in documentation

2016-10-21 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/253


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #253: trivial fix in documentation

2016-10-21 Thread phalodi

GitHub user phalodi opened a pull request:

https://github.com/apache/incubator-carbondata/pull/253

trivial fix in documentation



## What changes were proposed in this pull request?

This is too trivial so i didn't create JIRA task.
In Quick Start Guide there is use of string manipulation to create the file 
but while we are using it in SQL without using 's' before the string to apply 
string manipulation.

## How was this patch tested?
Manually tested it by run the string manipulation on carbon-spark-shell

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/phalodi/incubator-carbondata CARBONDATA-DOCS

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/253.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #253






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #244: [CARBONDATA-300] Added Encoder proce...

2016-10-21 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/244#discussion_r84518288
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/converter/impl/RowConverterImpl.java
 ---
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.processing.newflow.converter.impl;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.core.cache.Cache;
+import org.apache.carbondata.core.cache.CacheProvider;
+import org.apache.carbondata.core.cache.CacheType;
+import org.apache.carbondata.core.cache.dictionary.Dictionary;
+import 
org.apache.carbondata.core.cache.dictionary.DictionaryColumnUniqueIdentifier;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import org.apache.carbondata.processing.newflow.converter.FieldConverter;
+import org.apache.carbondata.processing.newflow.converter.RowConverter;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+/**
+ * It converts the complete row if necessary, dictionary columns are 
encoded with dictionary values
+ * and nondictionary values are converted to binary.
+ */
+public class RowConverterImpl implements RowConverter {
+
+  private CarbonDataLoadConfiguration configuration;
+
+  private FieldConverter[] fieldConverters;
+
+  public RowConverterImpl(DataField[] fields, CarbonDataLoadConfiguration 
configuration) {
+this.configuration = configuration;
+CacheProvider cacheProvider = CacheProvider.getInstance();
+Cache cache =
+cacheProvider.createCache(CacheType.REVERSE_DICTIONARY,
+configuration.getTableIdentifier().getStorePath());
+List fieldConverterList = new ArrayList<>();
+
+long lruCacheStartTime = System.currentTimeMillis();
+
+for (int i = 0; i < fields.length; i++) {
+  FieldConverter fieldConverter = FieldEncoderFactory.getInstance()
+  .createFieldEncoder(fields[i], cache,
+  
configuration.getTableIdentifier().getCarbonTableIdentifier(), i);
+  if (fieldConverter != null) {
+fieldConverterList.add(fieldConverter);
+  }
+}
+CarbonTimeStatisticsFactory.getLoadStatisticsInstance()
+.recordLruCacheLoadTime((System.currentTimeMillis() - 
lruCacheStartTime) / 1000.0);
+fieldConverters = fieldConverterList.toArray(new 
FieldConverter[fieldConverterList.size()]);
+  }
+
+  @Override
+  public CarbonRow convert(CarbonRow row) throws 
CarbonDataLoadingException {
+
+for (int i = 0; i < fieldConverters.length; i++) {
+  fieldConverters[i].convert(row);
--- End diff --

Yes here fields are same as input, the 3 elements code is moved to sort 
step.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #247: [CARBONDATA-301] Added Sort processo...

2016-10-21 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/247#discussion_r84517789
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/RemoveDictionaryUtil.java
 ---
@@ -123,6 +123,60 @@ private static int calculateTotalBytes(ByteBuffer[] 
byteBufferArr) {
   }
 
   /**
+   * This method will form one single byte [] for all the high card dims.
--- End diff --

Added description.
This is just duplicate function to take `byte[][]` instead `ByteBuffer[]'. 
The other method would be removed at the time of removing kettle.
The reverse function for this is `splitNoDictionaryKey` method


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #247: [CARBONDATA-301] Added Sort processo...

2016-10-21 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/247#discussion_r84515278
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/sort/CarbonSorter.java
 ---
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.processing.newflow.sort;
+
+import java.util.Iterator;
+
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRowBatch;
+import 
org.apache.carbondata.processing.sortandgroupby.sortdata.SortParameters;
+
+/**
+ * This interface sorts all the data of iterators.
+ * The life cycle of this interface is initialize -> sort -> close
+ */
+public interface CarbonSorter {
--- End diff --

a thread-safe insertRow should work right?
I think it is not good to couple the pulling logic of CarbonRow and sorting 
logic. The execution of `Iterator[]` should be in 
`SortProcessorStepImpl.execute()`, while sorting logic should be in Sorter. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #230: [CARBONDATA-306]Add block size info ...

2016-10-21 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/230


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #230: [CARBONDATA-306]Add block size info ...

2016-10-21 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/230


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #244: [CARBONDATA-300] Added Encoder proce...

2016-10-21 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/244#discussion_r84513301
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/converter/impl/RowConverterImpl.java
 ---
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.processing.newflow.converter.impl;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.core.cache.Cache;
+import org.apache.carbondata.core.cache.CacheProvider;
+import org.apache.carbondata.core.cache.CacheType;
+import org.apache.carbondata.core.cache.dictionary.Dictionary;
+import 
org.apache.carbondata.core.cache.dictionary.DictionaryColumnUniqueIdentifier;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import org.apache.carbondata.processing.newflow.converter.FieldConverter;
+import org.apache.carbondata.processing.newflow.converter.RowConverter;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+/**
+ * It converts the complete row if necessary, dictionary columns are 
encoded with dictionary values
+ * and nondictionary values are converted to binary.
+ */
+public class RowConverterImpl implements RowConverter {
+
+  private CarbonDataLoadConfiguration configuration;
+
+  private FieldConverter[] fieldConverters;
+
+  public RowConverterImpl(DataField[] fields, CarbonDataLoadConfiguration 
configuration) {
+this.configuration = configuration;
+CacheProvider cacheProvider = CacheProvider.getInstance();
+Cache cache =
+cacheProvider.createCache(CacheType.REVERSE_DICTIONARY,
+configuration.getTableIdentifier().getStorePath());
+List fieldConverterList = new ArrayList<>();
+
+long lruCacheStartTime = System.currentTimeMillis();
+
+for (int i = 0; i < fields.length; i++) {
+  FieldConverter fieldConverter = FieldEncoderFactory.getInstance()
+  .createFieldEncoder(fields[i], cache,
+  
configuration.getTableIdentifier().getCarbonTableIdentifier(), i);
+  if (fieldConverter != null) {
+fieldConverterList.add(fieldConverter);
+  }
+}
+CarbonTimeStatisticsFactory.getLoadStatisticsInstance()
+.recordLruCacheLoadTime((System.currentTimeMillis() - 
lruCacheStartTime) / 1000.0);
+fieldConverters = fieldConverterList.toArray(new 
FieldConverter[fieldConverterList.size()]);
+  }
+
+  @Override
+  public CarbonRow convert(CarbonRow row) throws 
CarbonDataLoadingException {
+
--- End diff --

remove empty line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #244: [CARBONDATA-300] Added Encoder proce...

2016-10-21 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/244#discussion_r84513452
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/converter/impl/RowConverterImpl.java
 ---
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.processing.newflow.converter.impl;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.core.cache.Cache;
+import org.apache.carbondata.core.cache.CacheProvider;
+import org.apache.carbondata.core.cache.CacheType;
+import org.apache.carbondata.core.cache.dictionary.Dictionary;
+import 
org.apache.carbondata.core.cache.dictionary.DictionaryColumnUniqueIdentifier;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import org.apache.carbondata.processing.newflow.converter.FieldConverter;
+import org.apache.carbondata.processing.newflow.converter.RowConverter;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+/**
+ * It converts the complete row if necessary, dictionary columns are 
encoded with dictionary values
+ * and nondictionary values are converted to binary.
+ */
+public class RowConverterImpl implements RowConverter {
+
+  private CarbonDataLoadConfiguration configuration;
+
+  private FieldConverter[] fieldConverters;
+
+  public RowConverterImpl(DataField[] fields, CarbonDataLoadConfiguration 
configuration) {
+this.configuration = configuration;
+CacheProvider cacheProvider = CacheProvider.getInstance();
+Cache cache =
+cacheProvider.createCache(CacheType.REVERSE_DICTIONARY,
+configuration.getTableIdentifier().getStorePath());
+List fieldConverterList = new ArrayList<>();
+
+long lruCacheStartTime = System.currentTimeMillis();
+
+for (int i = 0; i < fields.length; i++) {
+  FieldConverter fieldConverter = FieldEncoderFactory.getInstance()
+  .createFieldEncoder(fields[i], cache,
+  
configuration.getTableIdentifier().getCarbonTableIdentifier(), i);
+  if (fieldConverter != null) {
+fieldConverterList.add(fieldConverter);
+  }
+}
+CarbonTimeStatisticsFactory.getLoadStatisticsInstance()
+.recordLruCacheLoadTime((System.currentTimeMillis() - 
lruCacheStartTime) / 1000.0);
+fieldConverters = fieldConverterList.toArray(new 
FieldConverter[fieldConverterList.size()]);
+  }
+
+  @Override
+  public CarbonRow convert(CarbonRow row) throws 
CarbonDataLoadingException {
+
+for (int i = 0; i < fieldConverters.length; i++) {
+  fieldConverters[i].convert(row);
--- End diff --

so, after convert, it is different from old approach which just have 3 
elements, right?
here, the number of fields remain the same as input.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #244: [CARBONDATA-300] Added Encoder proce...

2016-10-21 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/244#discussion_r84512153
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/converter/impl/RowConverterImpl.java
 ---
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.processing.newflow.converter.impl;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.core.cache.Cache;
+import org.apache.carbondata.core.cache.CacheProvider;
+import org.apache.carbondata.core.cache.CacheType;
+import org.apache.carbondata.core.cache.dictionary.Dictionary;
+import 
org.apache.carbondata.core.cache.dictionary.DictionaryColumnUniqueIdentifier;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import org.apache.carbondata.processing.newflow.converter.FieldConverter;
+import org.apache.carbondata.processing.newflow.converter.RowConverter;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+/**
+ * It converts the complete row if necessary, dictionary columns are 
encoded with dictionary values
+ * and nondictionary values are converted to binary.
+ */
+public class RowConverterImpl implements RowConverter {
+
+  private CarbonDataLoadConfiguration configuration;
+
+  private FieldConverter[] fieldConverters;
+
+  public RowConverterImpl(DataField[] fields, CarbonDataLoadConfiguration 
configuration) {
+this.configuration = configuration;
+CacheProvider cacheProvider = CacheProvider.getInstance();
+Cache cache =
+cacheProvider.createCache(CacheType.REVERSE_DICTIONARY,
+configuration.getTableIdentifier().getStorePath());
+List fieldConverterList = new ArrayList<>();
+
+long lruCacheStartTime = System.currentTimeMillis();
+
+for (int i = 0; i < fields.length; i++) {
+  FieldConverter fieldConverter = FieldEncoderFactory.getInstance()
+  .createFieldEncoder(fields[i], cache,
+  
configuration.getTableIdentifier().getCarbonTableIdentifier(), i);
+  if (fieldConverter != null) {
+fieldConverterList.add(fieldConverter);
+  }
+}
+CarbonTimeStatisticsFactory.getLoadStatisticsInstance()
+.recordLruCacheLoadTime((System.currentTimeMillis() - 
lruCacheStartTime) / 1000.0);
+fieldConverters = fieldConverterList.toArray(new 
FieldConverter[fieldConverterList.size()]);
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: load data error

2016-10-21 Thread 仲景武

sorry, it  can’t run….

0: jdbc:hive2://taonongyuan.com:10099/default> load 
data inpath 'hdfs:///name001:9000/carbondata/sample.csv' into table test_table3;
Error: java.lang.IllegalArgumentException: Pathname 
/name001:9000/carbondata/sample.csv from 
hdfs:/name001:9000/carbondata/sample.csv is not a valid DFS filename. 
(state=,code=0)
0: jdbc:hive2://taonongyuan.com:10099/default> load 
data inpath 'hdfs:///name001:9000/carbondata/sample.csv' into table test_table3;
Error: java.lang.IllegalArgumentException: Pathname 
/name001:9000/carbondata/sample.csv from 
hdfs:/name001:9000/carbondata/sample.csv is not a valid DFS filename. 
(state=,code=0)
0: jdbc:hive2://taonongyuan.com:10099/default> load 
data inpath '/carbondata/sample.csv' into table test_table3;
Error: org.apache.carbondata.processing.etl.DataLoadingException: The input 
file does not exist: 
hdfs://name001:9000hdfs://name001:9000/opt/data/carbondata/sample.csv 
(state=,code=0)
0: jdbc:hive2://taonongyuan.com:10099/default>
在 2016年10月20日，下午8:19，foryou2030 > 
写道：

try hdfs://name001:9000/carbondata/sample.csv
 Instead of
hdfs:///name001:9000/carbondata/sample.csv

发自我的 iPhone

在 2016年10月20日，上午10:52，仲景武  写道：


when run command (thrift sever):

jdbc:hive2://taonongyuan.com:10099/default> load data 
inpath 'hdfs://name001:9000/carbondata/sample.csv' into table test_table3;


throw exception:

Driver stacktrace: (state=,code=0)
0: jdbc:hive2://taonongyuan.com:10099/default> load 
data inpath 'hdfs:///name001:9000/carbondata/sample.csv' into table test_table3;
Error: java.lang.IllegalArgumentException: Pathname 
/name001:9000/carbondata/sample.csv from 
hdfs:/name001:9000/carbondata/sample.csv is not a valid DFS filename. 
(state=,code=0)
0: jdbc:hive2://taonongyuan.com:10099/default> load 
data inpath 'hdfs://name001:9000/carbondata/sample.csv' into table test_table3;
Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 
0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 
(TID 18, data002): java.lang.IllegalArgumentException: Wrong FS: 
hdfs://name001:9000/user/hive/warehouse/carbon.store/default/test_table3/Metadata/fdd8c8c4-5cdd-4542-aab1-785be20b9f36.dictmeta,
 expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
at 
org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80)
at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:529)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409)
at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:140)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341)
at 
org.apache.carbondata.core.datastorage.store.impl.FileFactory.getDataInputStream(FileFactory.java:146)
at org.apache.carbondata.core.reader.ThriftReader.open(ThriftReader.java:79)
at 
org.apache.carbondata.core.reader.CarbonDictionaryMetadataReaderImpl.openThriftReader(CarbonDictionaryMetadataReaderImpl.java:181)
at 
org.apache.carbondata.core.reader.CarbonDictionaryMetadataReaderImpl.readLastEntryOfDictionaryMetaChunk(CarbonDictionaryMetadataReaderImpl.java:128)
at 
org.apache.carbondata.core.cache.dictionary.AbstractDictionaryCache.readLastChunkFromDictionaryMetadataFile(AbstractDictionaryCache.java:129)
at 
org.apache.carbondata.core.cache.dictionary.AbstractDictionaryCache.checkAndLoadDictionaryData(AbstractDictionaryCache.java:204)
at 
org.apache.carbondata.core.cache.dictionary.ReverseDictionaryCache.getDictionary(ReverseDictionaryCache.java:181)
at 
org.apache.carbondata.core.cache.dictionary.ReverseDictionaryCache.get(ReverseDictionaryCache.java:69)
at 
org.apache.carbondata.core.cache.dictionary.ReverseDictionaryCache.get(ReverseDictionaryCache.java:40)
at 
org.apache.carbondata.spark.load.CarbonLoaderUtil.getDictionary(CarbonLoaderUtil.java:508)
at 
org.apache.carbondata.spark.load.CarbonLoaderUtil.getDictionary(CarbonLoaderUtil.java:514)
at 
org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:362)
at 
org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD.compute(CarbonGlobalDictionaryRDD.scala:293)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at

[GitHub] incubator-carbondata pull request #247: [CARBONDATA-301] Added Sort processo...

2016-10-21 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/247#discussion_r84510612
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/SortDataRows.java
 ---
@@ -334,24 +151,24 @@ public void startSorting() throws 
CarbonSortKeyAndGroupByException {
   toSort = new Object[entryCount][];
   System.arraycopy(recordHolderList, 0, toSort, 0, entryCount);
 
-  if (noDictionaryCount > 0) {
-Arrays.sort(toSort, new RowComparator(noDictionaryDimnesionColumn, 
noDictionaryCount));
+  if (parameters.getNoDictionaryCount() > 0) {
+Arrays.sort(toSort, new 
RowComparator(parameters.getNoDictionaryDimnesionColumn(),
+parameters.getNoDictionaryCount()));
   } else {
 
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #247: [CARBONDATA-301] Added Sort processo...

2016-10-21 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/247#discussion_r84510589
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/SortProcessorStepImpl.java
 ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.processing.newflow.steps;
+
+import java.util.Iterator;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import 
org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+import org.apache.carbondata.processing.newflow.row.CarbonRowBatch;
+import org.apache.carbondata.processing.newflow.sort.CarbonSorter;
+import 
org.apache.carbondata.processing.newflow.sort.impl.CarbonParallelReadMergeSorterImpl;
+import 
org.apache.carbondata.processing.sortandgroupby.sortdata.SortParameters;
+
+/**
+ * It sorts the data and write them to intermediate temp files. These 
files will be further read
+ * by next step for writing to carbondata files.
+ */
+public class SortProcessorStepImpl extends AbstractDataLoadProcessorStep {
+
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(SortProcessorStepImpl.class.getName());
+
+  private CarbonSorter carbonSorter;
+
+  public SortProcessorStepImpl(CarbonDataLoadConfiguration configuration,
+  AbstractDataLoadProcessorStep child) {
+super(configuration, child);
+  }
+
+  @Override public DataField[] getOutput() {
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #247: [CARBONDATA-301] Added Sort processo...

2016-10-21 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/247#discussion_r84510340
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/sort/impl/CarbonParallelReadMergeSorterImpl.java
 ---
@@ -0,0 +1,223 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.processing.newflow.sort.impl;
+
+import java.io.File;
+import java.util.Iterator;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.carbondata.common.CarbonIterator;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+import org.apache.carbondata.processing.newflow.row.CarbonRowBatch;
+import org.apache.carbondata.processing.newflow.sort.CarbonSorter;
+import 
org.apache.carbondata.processing.sortandgroupby.exception.CarbonSortKeyAndGroupByException;
+import 
org.apache.carbondata.processing.sortandgroupby.sortdata.SortDataRows;
+import 
org.apache.carbondata.processing.sortandgroupby.sortdata.SortIntermediateFileMerger;
+import 
org.apache.carbondata.processing.sortandgroupby.sortdata.SortParameters;
+import 
org.apache.carbondata.processing.store.SingleThreadFinalSortFilesMerger;
+import 
org.apache.carbondata.processing.store.writer.exception.CarbonDataWriterException;
+import org.apache.carbondata.processing.util.CarbonDataProcessorUtil;
+
+/**
+ * It parallely reads data from array of iterates and do merge sort.
+ * First it sorts the data and write to temp files. These temp files will 
be merge sorted to get
+ * final merge sort result.
+ */
+public class CarbonParallelReadMergeSorterImpl implements CarbonSorter {
+
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(CarbonParallelReadMergeSorterImpl.class.getName());
+
+  private SortParameters sortParameters;
+
+  private SortIntermediateFileMerger intermediateFileMerger;
+
+  private ExecutorService executorService;
+
+  private SingleThreadFinalSortFilesMerger finalMerger;
+
+  private DataField[] inputDataFields;
+
+  public CarbonParallelReadMergeSorterImpl(DataField[] inputDataFields) {
+this.inputDataFields = inputDataFields;
+  }
+
+  @Override
+  public void initialize(SortParameters sortParameters) {
+this.sortParameters = sortParameters;
+intermediateFileMerger = new 
SortIntermediateFileMerger(sortParameters);
+String storeLocation = CarbonDataProcessorUtil
--- End diff --

I guess PR 217 is not merged and once it is merged those changes would be 
reflected here. And jira 287 would be sufficient for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #247: [CARBONDATA-301] Added Sort processo...

2016-10-21 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/247#discussion_r84510002
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/sort/CarbonSorter.java
 ---
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.processing.newflow.sort;
+
+import java.util.Iterator;
+
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRowBatch;
+import 
org.apache.carbondata.processing.sortandgroupby.sortdata.SortParameters;
+
+/**
+ * This interface sorts all the data of iterators.
+ * The life cycle of this interface is initialize -> sort -> close
+ */
+public interface CarbonSorter {
--- End diff --

Ok, I will update the name.
`insertRow` type of interface may not satisfy our need as we required 
parallel read and sort. Even in current interface also we can sort it on fly. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #244: [CARBONDATA-300] Added Encoder proce...

2016-10-21 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/244#discussion_r84508220
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/converter/impl/FieldEncoderFactory.java
 ---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.processing.newflow.converter.impl;
+
+import org.apache.carbondata.core.cache.Cache;
+import org.apache.carbondata.core.cache.dictionary.Dictionary;
+import 
org.apache.carbondata.core.cache.dictionary.DictionaryColumnUniqueIdentifier;
+import org.apache.carbondata.core.carbon.CarbonTableIdentifier;
+import org.apache.carbondata.core.carbon.metadata.encoder.Encoding;
+import org.apache.carbondata.processing.newflow.DataField;
+import org.apache.carbondata.processing.newflow.converter.FieldConverter;
+
+public class FieldEncoderFactory {
+
+  private static FieldEncoderFactory instance;
+
+  private FieldEncoderFactory() {
+
+  }
+
+  public static FieldEncoderFactory getInstance() {
+if (instance == null) {
+  instance = new FieldEncoderFactory();
+}
+return instance;
+  }
+
+  public FieldConverter createFieldEncoder(DataField dataField,
--- End diff --

added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #244: [CARBONDATA-300] Added Encoder proce...

2016-10-21 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/244#discussion_r84508170
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/converter/impl/FieldEncoderFactory.java
 ---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.processing.newflow.converter.impl;
+
+import org.apache.carbondata.core.cache.Cache;
+import org.apache.carbondata.core.cache.dictionary.Dictionary;
+import 
org.apache.carbondata.core.cache.dictionary.DictionaryColumnUniqueIdentifier;
+import org.apache.carbondata.core.carbon.CarbonTableIdentifier;
+import org.apache.carbondata.core.carbon.metadata.encoder.Encoding;
+import org.apache.carbondata.processing.newflow.DataField;
+import org.apache.carbondata.processing.newflow.converter.FieldConverter;
+
+public class FieldEncoderFactory {
+
+  private static FieldEncoderFactory instance;
+
+  private FieldEncoderFactory() {
+
+  }
+
+  public static FieldEncoderFactory getInstance() {
+if (instance == null) {
+  instance = new FieldEncoderFactory();
+}
+return instance;
+  }
+
+  public FieldConverter createFieldEncoder(DataField dataField,
+  Cache cache,
+  CarbonTableIdentifier carbonTableIdentifier, int index) {
+if (dataField.hasDictionaryEncoding()) {
+  return new DictionaryFieldConverterImpl(dataField, cache, 
carbonTableIdentifier, index);
+} else if 
(dataField.getColumn().hasEncoding(Encoding.DIRECT_DICTIONARY)) {
+  return new DirectDictionaryFieldConverterImpl(dataField, index);
+} else if (dataField.getColumn().isComplex()) {
+  return new ComplexFieldConverterImpl();
+} else if ((dataField.getColumn().hasEncoding(Encoding.DICTIONARY) ||
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #244: [CARBONDATA-300] Added Encoder proce...

2016-10-21 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/244#discussion_r84504071
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/converter/impl/RowConverterImpl.java
 ---
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.processing.newflow.converter.impl;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.core.cache.Cache;
+import org.apache.carbondata.core.cache.CacheProvider;
+import org.apache.carbondata.core.cache.CacheType;
+import org.apache.carbondata.core.cache.dictionary.Dictionary;
+import 
org.apache.carbondata.core.cache.dictionary.DictionaryColumnUniqueIdentifier;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import org.apache.carbondata.processing.newflow.converter.FieldConverter;
+import org.apache.carbondata.processing.newflow.converter.RowConverter;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+/**
+ * It converts the complete row if necessary, dictionary columns are 
encoded with dictionary values
+ * and nondictionary values are converted to binary.
+ */
+public class RowConverterImpl implements RowConverter {
+
+  private CarbonDataLoadConfiguration configuration;
+
+  private FieldConverter[] fieldConverters;
+
+  public RowConverterImpl(DataField[] fields, CarbonDataLoadConfiguration 
configuration) {
+this.configuration = configuration;
+CacheProvider cacheProvider = CacheProvider.getInstance();
+Cache cache =
+cacheProvider.createCache(CacheType.REVERSE_DICTIONARY,
+configuration.getTableIdentifier().getStorePath());
+List fieldConverterList = new ArrayList<>();
+
+long lruCacheStartTime = System.currentTimeMillis();
+
+for (int i = 0; i < fields.length; i++) {
+  FieldConverter fieldConverter = FieldEncoderFactory.getInstance()
+  .createFieldEncoder(fields[i], cache,
+  
configuration.getTableIdentifier().getCarbonTableIdentifier(), i);
+  if (fieldConverter != null) {
+fieldConverterList.add(fieldConverter);
+  }
+}
+CarbonTimeStatisticsFactory.getLoadStatisticsInstance()
+.recordLruCacheLoadTime((System.currentTimeMillis() - 
lruCacheStartTime) / 1000.0);
+fieldConverters = fieldConverterList.toArray(new 
FieldConverter[fieldConverterList.size()]);
--- End diff --

We can use iterator, but it would be called each row, so accessing array is 
much faster than arrayList. And regarding copy is a one time operation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #247: [CARBONDATA-301] Added Sort processo...

2016-10-21 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/247#discussion_r84495043
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/RemoveDictionaryUtil.java
 ---
@@ -123,6 +123,60 @@ private static int calculateTotalBytes(ByteBuffer[] 
byteBufferArr) {
   }
 
   /**
+   * This method will form one single byte [] for all the high card dims.
--- End diff --

Can you describe the format of output byte array?
It is LV like right? Length in short, then byte array?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #247: [CARBONDATA-301] Added Sort processo...

2016-10-21 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/247#discussion_r84491588
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/SortProcessorStepImpl.java
 ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.processing.newflow.steps;
+
+import java.util.Iterator;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import 
org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+import org.apache.carbondata.processing.newflow.row.CarbonRowBatch;
+import org.apache.carbondata.processing.newflow.sort.CarbonSorter;
+import 
org.apache.carbondata.processing.newflow.sort.impl.CarbonParallelReadMergeSorterImpl;
+import 
org.apache.carbondata.processing.sortandgroupby.sortdata.SortParameters;
+
+/**
+ * It sorts the data and write them to intermediate temp files. These 
files will be further read
+ * by next step for writing to carbondata files.
+ */
+public class SortProcessorStepImpl extends AbstractDataLoadProcessorStep {
+
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(SortProcessorStepImpl.class.getName());
+
+  private CarbonSorter carbonSorter;
+
+  public SortProcessorStepImpl(CarbonDataLoadConfiguration configuration,
+  AbstractDataLoadProcessorStep child) {
+super(configuration, child);
+  }
+
+  @Override public DataField[] getOutput() {
--- End diff --

move override to previous line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

2016-10-21 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/240#discussion_r84485160
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/constants/DataLoadProcessorConstants.java
 ---
@@ -33,4 +33,8 @@
   public static final String COMPLEX_DELIMITERS = "COMPLEX_DELIMITERS";
 
   public static final String DIMENSION_LENGTHS = "DIMENSION_LENGTHS";
+
--- End diff --

ok. we should separate options expose to user. let's do it in future PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #244: [CARBONDATA-300] Added Encoder proce...

2016-10-21 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/244#discussion_r84484251
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/converter/impl/RowConverterImpl.java
 ---
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.processing.newflow.converter.impl;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.core.cache.Cache;
+import org.apache.carbondata.core.cache.CacheProvider;
+import org.apache.carbondata.core.cache.CacheType;
+import org.apache.carbondata.core.cache.dictionary.Dictionary;
+import 
org.apache.carbondata.core.cache.dictionary.DictionaryColumnUniqueIdentifier;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import org.apache.carbondata.processing.newflow.converter.FieldConverter;
+import org.apache.carbondata.processing.newflow.converter.RowConverter;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+/**
+ * It converts the complete row if necessary, dictionary columns are 
encoded with dictionary values
+ * and nondictionary values are converted to binary.
+ */
+public class RowConverterImpl implements RowConverter {
+
+  private CarbonDataLoadConfiguration configuration;
+
+  private FieldConverter[] fieldConverters;
+
+  public RowConverterImpl(DataField[] fields, CarbonDataLoadConfiguration 
configuration) {
+this.configuration = configuration;
+CacheProvider cacheProvider = CacheProvider.getInstance();
+Cache cache =
+cacheProvider.createCache(CacheType.REVERSE_DICTIONARY,
+configuration.getTableIdentifier().getStorePath());
+List fieldConverterList = new ArrayList<>();
+
+long lruCacheStartTime = System.currentTimeMillis();
+
+for (int i = 0; i < fields.length; i++) {
+  FieldConverter fieldConverter = FieldEncoderFactory.getInstance()
+  .createFieldEncoder(fields[i], cache,
+  
configuration.getTableIdentifier().getCarbonTableIdentifier(), i);
+  if (fieldConverter != null) {
+fieldConverterList.add(fieldConverter);
+  }
+}
+CarbonTimeStatisticsFactory.getLoadStatisticsInstance()
+.recordLruCacheLoadTime((System.currentTimeMillis() - 
lruCacheStartTime) / 1000.0);
+fieldConverters = fieldConverterList.toArray(new 
FieldConverter[fieldConverterList.size()]);
--- End diff --

but in this case it needs one round of copy.
and using list you can also iterate using its iterator


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #244: [CARBONDATA-300] Added Encoder proce...

2016-10-21 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/244#discussion_r84482988
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/encoding/impl/RowConverterImpl.java
 ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.processing.newflow.encoding.impl;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.core.cache.Cache;
+import org.apache.carbondata.core.cache.CacheProvider;
+import org.apache.carbondata.core.cache.CacheType;
+import org.apache.carbondata.core.cache.dictionary.Dictionary;
+import 
org.apache.carbondata.core.cache.dictionary.DictionaryColumnUniqueIdentifier;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import org.apache.carbondata.processing.newflow.encoding.FieldConverter;
+import org.apache.carbondata.processing.newflow.encoding.RowConverter;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+/**
+ *
+ */
+public class RowConverterImpl implements RowConverter {
+
+  private CarbonDataLoadConfiguration configuration;
+
+  private FieldConverter[] fieldConverters;
+
+  public RowConverterImpl(DataField[] fields, CarbonDataLoadConfiguration 
configuration) {
+this.configuration = configuration;
+CacheProvider cacheProvider = CacheProvider.getInstance();
+Cache cache =
+cacheProvider.createCache(CacheType.REVERSE_DICTIONARY,
+configuration.getTableIdentifier().getStorePath());
+List fieldConverterList = new ArrayList<>();
+
+long lruCacheStartTime = System.currentTimeMillis();
+
+for (int i = 0; i < fields.length; i++) {
+  FieldConverter fieldConverter = FieldEncoderFactory.getInstance()
+  .createFieldEncoder(fields[i], cache,
+  
configuration.getTableIdentifier().getCarbonTableIdentifier(), i);
+  if (fieldConverter != null) {
+fieldConverterList.add(fieldConverter);
+  }
--- End diff --

ok. only dictionary and no dictionary and complex column has converter, 
right?
can we make this code more readable, like checking whether it is 
non-measure.
And measure can not be complex column?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

2016-10-21 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/240#discussion_r84476341
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/input/InputProcessorStepImpl.java
 ---
@@ -0,0 +1,171 @@
+package org.apache.carbondata.processing.newflow.steps.input;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+import org.apache.carbondata.common.CarbonIterator;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.CarbonProperties;
+import 
org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.parser.CarbonParserFactory;
+import org.apache.carbondata.processing.newflow.parser.GenericParser;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+import org.apache.carbondata.processing.newflow.row.CarbonRowBatch;
+
+/**
+ * It reads data from record reader and sends data to next step.
+ */
+public class InputProcessorStepImpl extends AbstractDataLoadProcessorStep {
+
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(InputProcessorStepImpl.class.getName());
+
+  private GenericParser[] genericParsers;
+
+  private List> inputIterators;
+
+  public InputProcessorStepImpl(CarbonDataLoadConfiguration configuration,
+  AbstractDataLoadProcessorStep child, List> 
inputIterators) {
+super(configuration, child);
+this.inputIterators = inputIterators;
+  }
+
+  @Override public DataField[] getOutput() {
+DataField[] fields = configuration.getDataFields();
+String[] header = configuration.getHeader();
+DataField[] output = new DataField[fields.length];
+int k = 0;
+for (int i = 0; i < header.length; i++) {
+  for (int j = 0; j < fields.length; j++) {
+if 
(header[j].equalsIgnoreCase(fields[j].getColumn().getColName())) {
+  output[k++] = fields[j];
+  break;
+}
+  }
+}
+return output;
+  }
+
+  @Override public void initialize() throws CarbonDataLoadingException {
+DataField[] output = getOutput();
+genericParsers = new GenericParser[output.length];
+for (int i = 0; i < genericParsers.length; i++) {
+  genericParsers[i] = 
CarbonParserFactory.createParser(output[i].getColumn(),
+  (String[]) configuration
+  
.getDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS));
+}
+  }
+
+  private int getNumberOfCores() {
+int numberOfCores;
+try {
+  numberOfCores = Integer.parseInt(CarbonProperties.getInstance()
+  .getProperty(CarbonCommonConstants.NUM_CORES_LOADING,
+  CarbonCommonConstants.NUM_CORES_DEFAULT_VAL));
+} catch (NumberFormatException exc) {
+  numberOfCores = 
Integer.parseInt(CarbonCommonConstants.NUM_CORES_DEFAULT_VAL);
+}
+return numberOfCores;
+  }
+
+  private int getBatchSize() {
+int batchSize;
+try {
+  batchSize = Integer.parseInt(configuration
+  
.getDataLoadProperty(DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE,
+  
DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE_DEFAULT).toString());
+} catch (NumberFormatException exc) {
+  batchSize = 
Integer.parseInt(DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE_DEFAULT);
+}
+return batchSize;
+  }
+
+  @Override public Iterator[] execute() {
+int batchSize = getBatchSize();
+List>[] readerIterators = 
partitionInputReaderIterators();
+Iterator[] outIterators = new 
Iterator[readerIterators.length];
+for (int i = 0; i < outIterators.length; i++) {
+  outIterators[i] = new InputProcessorIterator(readerIterators[i], 
genericParsers, batchSize);
+}
+return outIterators;
+  }
+
+  private List>[] partitionInputReaderIterators() {
+int numberOfCores = getNumberOfCores();
+if (inputIterators.size() < numberOfCores) {
+  numberOfCores = inputIterators.size();
+}
+

[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

2016-10-21 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/240#discussion_r84475851
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/constants/DataLoadProcessorConstants.java
 ---
@@ -33,4 +33,8 @@
   public static final String COMPLEX_DELIMITERS = "COMPLEX_DELIMITERS";
 
   public static final String DIMENSION_LENGTHS = "DIMENSION_LENGTHS";
+
--- End diff --

May be we can refactor this later


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

2016-10-21 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/240#discussion_r84475566
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/parser/GenericParser.java
 ---
@@ -0,0 +1,22 @@
+package org.apache.carbondata.processing.newflow.parser;
+
+/**
+ * Parse the data according to implementation, The implementation classes 
can be struct, array or
+ * map datatypes.
+ */
+public interface GenericParser {
+
+  /**
+   * Parse the data as per the delimiter
+   * @param data
+   * @return
+   */
+  E parse(String data);
+
+  /**
+   * Children of the parser.
+   * @param parser
+   */
+  void addChildren(GenericParser parser);
--- End diff --

Yes, added new interface ComplexParser that extends GenericParser.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

2016-10-21 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/240#discussion_r84475299
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/AbstractDataLoadProcessorStep.java
 ---
@@ -114,11 +114,15 @@ protected CarbonRowBatch 
processRowBatch(CarbonRowBatch rowBatch) {
   /**
* It is called when task is called successfully.
*/
-  public abstract void finish();
+  public void finish() {
+// implementation classes can override to update the status.
+  }
 
   /**
* Closing of resources after step execution can be done here.
*/
-  public abstract void close();
+  public void close() {
+// implementation classes can override to close the resources if any 
available.
--- End diff --

Removed `finish` method and kept only `close`, so for all cases `close` 
need to be called.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #244: [CARBONDATA-300] Added Encoder proce...

2016-10-21 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/244#discussion_r84474110
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/encoding/impl/RowConverterImpl.java
 ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.processing.newflow.encoding.impl;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.core.cache.Cache;
+import org.apache.carbondata.core.cache.CacheProvider;
+import org.apache.carbondata.core.cache.CacheType;
+import org.apache.carbondata.core.cache.dictionary.Dictionary;
+import 
org.apache.carbondata.core.cache.dictionary.DictionaryColumnUniqueIdentifier;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import org.apache.carbondata.processing.newflow.encoding.FieldConverter;
+import org.apache.carbondata.processing.newflow.encoding.RowConverter;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+/**
+ *
+ */
+public class RowConverterImpl implements RowConverter {
+
+  private CarbonDataLoadConfiguration configuration;
+
+  private FieldConverter[] fieldConverters;
+
+  public RowConverterImpl(DataField[] fields, CarbonDataLoadConfiguration 
configuration) {
+this.configuration = configuration;
+CacheProvider cacheProvider = CacheProvider.getInstance();
+Cache cache =
+cacheProvider.createCache(CacheType.REVERSE_DICTIONARY,
+configuration.getTableIdentifier().getStorePath());
+List fieldConverterList = new ArrayList<>();
+
+long lruCacheStartTime = System.currentTimeMillis();
+
+for (int i = 0; i < fields.length; i++) {
+  FieldConverter fieldConverter = FieldEncoderFactory.getInstance()
+  .createFieldEncoder(fields[i], cache,
+  
configuration.getTableIdentifier().getCarbonTableIdentifier(), i);
+  if (fieldConverter != null) {
+fieldConverterList.add(fieldConverter);
+  }
--- End diff --

No @jackylk , because for measures we don't have any converters so it 
returns null for that case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #244: [CARBONDATA-300] Added Encoder proce...

2016-10-21 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/244#discussion_r84472920
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/encoding/RowEncoder.java
 ---
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.processing.newflow.encoding;
+
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+
+/**
+ * Encodes the row
+ */
+public interface RowEncoder {
--- End diff --

Ok, I will rename the Encoder to converter


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #244: [CARBONDATA-300] Added Encoder proce...

2016-10-21 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/244#discussion_r84473138
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/encodestep/EncoderProcessorStepImpl.java
 ---
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.processing.newflow.steps.encodestep;
+
+import 
org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import org.apache.carbondata.processing.newflow.encoding.RowEncoder;
+import 
org.apache.carbondata.processing.newflow.encoding.impl.RowEncoderImpl;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+
+/**
+ * Encode data with dictionary values and composed with bit/byte packed 
key.
+ * nondictionary values are packed as bytes, and complex types are also 
packed as bytes.
+ */
+public class EncoderProcessorStepImpl extends 
AbstractDataLoadProcessorStep {
--- End diff --

Here I convert only dictionary and nondictionary fields, remaining 
preparation is moved to Sort step. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #238: [CARBONDATA-334] Correct Some Spelli...

2016-10-21 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/238#discussion_r84463724
  
--- Diff: 
hadoop/src/test/java/org/apache/carbondata/hadoop/test/util/StoreCreator.java 
---
@@ -346,26 +346,26 @@ public static void executeGraph(LoadModel loadModel, 
String storeLocation, Strin
   path.delete();
 }
 
-DataProcessTaskStatus schmaModel = new 
DataProcessTaskStatus(databaseName, tableName);
-schmaModel.setCsvFilePath(loadModel.getFactFilePath());
+DataProcessTaskStatus schemaModel = new 
DataProcessTaskStatus(databaseName, tableName);
--- End diff --

This is not schemaModel, please use a proper name


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: Unable to perform compaction,

2016-10-21 Thread Liang Chen

Hi

Can you provide the detail test steps and error logs

Regards
Liang


2016-10-20 19:35 GMT+08:00 prabhatkashyap :

> Hello,
> There is some issue with compaction. Auto and force compaction are not
> working.
> In spark logs I got this error:
>
>
>
>
>
>
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/Unable-to-
> perform-compaction-tp2099.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>



-- 

Regards
Liang

[jira] [Created] (CARBONDATA-335) Load data command options 'QUOTECHAR' perform unexpected behavior.

2016-10-21 Thread Harmeet Singh (JIRA)

Harmeet Singh created CARBONDATA-335:


 Summary: Load data command options 'QUOTECHAR' perform unexpected 
behavior. 
 Key: CARBONDATA-335
 URL: https://issues.apache.org/jira/browse/CARBONDATA-335
 Project: CarbonData
  Issue Type: Bug
Reporter: Harmeet Singh


Hey Team,

I am using load data command with specific 'QUOTECHAR' option. But after 
loading the data, the behavior of quote character is not working as expected. 
Below is my example:

create table one (name string, description string, salary double, age int, dob 
timestamp) stored by 'carbondata';

CSV File Content >>

name, description, salary, age, dob
tammy, $my name$, 90, 22, 19/10/2019

0: jdbc:hive2://127.0.0.1:1> load data local inpath 
'hdfs://localhost:54310/home/harmeet/dollarquote.csv' into table one 
OPTIONS('QUOTECHAR'="$");

Results >> 
0: jdbc:hive2://127.0.0.1:1> select * from one; 

Actual Output >>>

 ---+---+---+--+
| tammy |  $my name$   | NULL  | 90.0  
| 22|
---+--+
 

Expected Output >>>

 ---+---+---+--+
| tammy |  my name   | NULL  | 90.0  | 
22|
---+--+ 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Load data command Quote character unexpected behavior.

2016-10-21 Thread Lion.X

OK, get it.
I will handler it.



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Load-data-command-Quote-character-unexpected-behavior-tp2145p2165.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

[GitHub] incubator-carbondata pull request #244: [CARBONDATA-300] Added Encoder proce...

2016-10-21 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/244#discussion_r84443937
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/encoding/impl/RowConverterImpl.java
 ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.processing.newflow.encoding.impl;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.core.cache.Cache;
+import org.apache.carbondata.core.cache.CacheProvider;
+import org.apache.carbondata.core.cache.CacheType;
+import org.apache.carbondata.core.cache.dictionary.Dictionary;
+import 
org.apache.carbondata.core.cache.dictionary.DictionaryColumnUniqueIdentifier;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import org.apache.carbondata.processing.newflow.encoding.FieldConverter;
+import org.apache.carbondata.processing.newflow.encoding.RowConverter;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+/**
+ *
+ */
+public class RowConverterImpl implements RowConverter {
+
+  private CarbonDataLoadConfiguration configuration;
+
+  private FieldConverter[] fieldConverters;
+
+  public RowConverterImpl(DataField[] fields, CarbonDataLoadConfiguration 
configuration) {
+this.configuration = configuration;
+CacheProvider cacheProvider = CacheProvider.getInstance();
+Cache cache =
+cacheProvider.createCache(CacheType.REVERSE_DICTIONARY,
+configuration.getTableIdentifier().getStorePath());
+List fieldConverterList = new ArrayList<>();
+
+long lruCacheStartTime = System.currentTimeMillis();
+
+for (int i = 0; i < fields.length; i++) {
+  FieldConverter fieldConverter = FieldEncoderFactory.getInstance()
+  .createFieldEncoder(fields[i], cache,
+  
configuration.getTableIdentifier().getCarbonTableIdentifier(), i);
+  if (fieldConverter != null) {
+fieldConverterList.add(fieldConverter);
+  }
--- End diff --

if it is null, should throw exception


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #244: [CARBONDATA-300] Added Encoder proce...

2016-10-21 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/244#discussion_r84443155
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/encoding/impl/RowConverterImpl.java
 ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.processing.newflow.encoding.impl;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.core.cache.Cache;
+import org.apache.carbondata.core.cache.CacheProvider;
+import org.apache.carbondata.core.cache.CacheType;
+import org.apache.carbondata.core.cache.dictionary.Dictionary;
+import 
org.apache.carbondata.core.cache.dictionary.DictionaryColumnUniqueIdentifier;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import org.apache.carbondata.processing.newflow.encoding.FieldConverter;
+import org.apache.carbondata.processing.newflow.encoding.RowConverter;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+/**
+ *
+ */
+public class RowConverterImpl implements RowConverter {
+
+  private CarbonDataLoadConfiguration configuration;
+
+  private FieldConverter[] fieldConverters;
+
+  public RowConverterImpl(DataField[] fields, CarbonDataLoadConfiguration 
configuration) {
+this.configuration = configuration;
+CacheProvider cacheProvider = CacheProvider.getInstance();
+Cache cache =
+cacheProvider.createCache(CacheType.REVERSE_DICTIONARY,
+configuration.getTableIdentifier().getStorePath());
+List fieldConverterList = new ArrayList<>();
+
+long lruCacheStartTime = System.currentTimeMillis();
+
+for (int i = 0; i < fields.length; i++) {
+  FieldConverter fieldConverter = FieldEncoderFactory.getInstance()
+  .createFieldEncoder(fields[i], cache,
+  
configuration.getTableIdentifier().getCarbonTableIdentifier(), i);
+  if (fieldConverter != null) {
+fieldConverterList.add(fieldConverter);
+  }
+}
+CarbonTimeStatisticsFactory.getLoadStatisticsInstance()
+.recordLruCacheLoadTime((System.currentTimeMillis() - 
lruCacheStartTime) / 1000.0);
+fieldConverters = fieldConverterList.toArray(new 
FieldConverter[fieldConverterList.size()]);
+  }
+
+  @Override
+  public CarbonRow convert(CarbonRow row) throws 
CarbonDataLoadingException {
+
+for (int i = 0; i < fieldConverters.length; i++) {
+  fieldConverters[i].convert(row);
+}
+return row;
+  }
+
+  @Override
+  public void finish() {
+List dimCardinality = new ArrayList<>();
+for (int i = 0; i < fieldConverters.length; i++) {
+  if (fieldConverters[i] instanceof 
AbstractDictionaryFieldConverterImpl) dimCardinality
--- End diff --

incorrect coding style


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #244: [CARBONDATA-300] Added Encoder proce...

2016-10-21 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/244#discussion_r84443496
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/DataConverterProcessorStepImpl.java
 ---
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.processing.newflow.steps;
+
+import 
org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import org.apache.carbondata.processing.newflow.encoding.RowConverter;
+import 
org.apache.carbondata.processing.newflow.encoding.impl.RowConverterImpl;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+
+/**
+ * Replace row data fields with dictionary values if column is configured 
dictionary encoded.
+ * And nondictionary columns as well as complex columns will be converted 
to byte[].
+ */
+public class DataConverterProcessorStepImpl extends 
AbstractDataLoadProcessorStep {
+
+  private RowConverter encoder;
+
+  public DataConverterProcessorStepImpl(CarbonDataLoadConfiguration 
configuration,
+  AbstractDataLoadProcessorStep child) {
+super(configuration, child);
+  }
+
+  @Override public DataField[] getOutput() {
--- End diff --

move override to next line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #244: [CARBONDATA-300] Added Encoder proce...

2016-10-21 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/244#discussion_r84443007
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/encoding/impl/RowConverterImpl.java
 ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.processing.newflow.encoding.impl;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.core.cache.Cache;
+import org.apache.carbondata.core.cache.CacheProvider;
+import org.apache.carbondata.core.cache.CacheType;
+import org.apache.carbondata.core.cache.dictionary.Dictionary;
+import 
org.apache.carbondata.core.cache.dictionary.DictionaryColumnUniqueIdentifier;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import org.apache.carbondata.processing.newflow.encoding.FieldConverter;
+import org.apache.carbondata.processing.newflow.encoding.RowConverter;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+
+import org.apache.commons.lang3.ArrayUtils;
+
+/**
+ *
--- End diff --

please add description


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: Unable to load CSV using differenent delimiter like "|"(Pipe)and";" (Semicolon) etc

2016-10-21 Thread ??

hi, Harmeet


  not reproduce. pls provide more details.


reagrds
Jay


-- Original --
From:  "Harmeet";;
Date:  Fri, Oct 21, 2016 02:51 PM
To:  "dev"; 

Subject:  Re: Unable to load CSV using differenent delimiter like 
"|"(Pipe)and";" (Semicolon) etc



Hey Jay, Thanks for your reply.

First case with Pipe symbols working fine, But still in second case i am
getting the same error. Below is my query :

csv file >> 
name; description; salary; age; dob
tammy; 'my name'; 90; 22; 19/10/2019


0: jdbc:hive2://127.0.0.1:1> load data local inpath
'hdfs://localhost:54310/home/harmeet/semiandquote.csv' into table one
OPTIONS("DELIMITER"="\;", 'QUOTECHAR'="'");

Error: org.apache.spark.sql.AnalysisException: missing EOF at 'OPTIONS' near
'one'; line 1 pos 93 (state=,code=0)

Please correnct if something getting wrong.



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Unable-to-load-CSV-using-differenent-delimiter-like-Pipe-and-Semicolon-etc-tp2140p2144.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: Load data command Quote character unexpected behavior.

2016-10-21 Thread 向志强

hi, Harmeet
By your descriptions above, I can not reproduce the problem which you
described. It returned the right result in my env.
Pls use the lasted version and check again, and give more details.
Lionx

2016-10-21 15:07 GMT+08:00 Harmeet :

> Hey Team,
>
> I am using load data command with specific Quote character. But after
> loading the data, the behavior of quote character is not working. Below is
> my example:
>
> *create table one (name string, description string, salary double, age int,
> dob timestamp) stored by 'carbondata';*
>
> csf File >>
>
> name, description, salary, age, dob
> tammy, $my name$, 90, 22, 19/10/2019
>
> 0: jdbc:hive2://127.0.0.1:1> load data local inpath
> 'hdfs://localhost:54310/home/harmeet/dollarquote.csv' into table one
> OPTIONS('QUOTECHAR'="$");
>
> Results >>
> 0: jdbc:hive2://127.0.0.1:1> select * from one;
> +---+--+
> ---+---+---+--+
> |   name| description  |  dob  |
> salary
> |  age  |
> +---+--+
> ---+---+---+--+
> | tammy |  $my name$   | NULL  |
> 90.0  | 22|
> +---+--+
> ---+---+---+--+
>
> I am assuming, in description column only "my name" data is loaded and
> dollars was exclude, but this is not working. The same behavior, if we are
> using ' (Single Quote) with data.
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/Load-data-
> command-Quote-character-unexpected-behavior-tp2145.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>

[jira] [Created] (CARBONDATA-334) Correct Some Spelling Mistakes

2016-10-21 Thread Lionx (JIRA)

Lionx created CARBONDATA-334:


 Summary: Correct Some Spelling Mistakes
 Key: CARBONDATA-334
 URL: https://issues.apache.org/jira/browse/CARBONDATA-334
 Project: CarbonData
  Issue Type: Bug
Reporter: Lionx
Assignee: Lionx
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Beijing Apache CarbonData meetup:https://www.meetup.com/Apache-Carbondata-Meetup/events/235013117/

2016-10-21 Thread Liang Chen

Hi all

Saturday, October 29, 2016 1:30 PM to 5:30 PM
You can apply through this link :
https://www.meetup.com/Apache-Carbondata-Meetup/events/235013117/


Regards
Liang

Load data command Quote character unexpected behavior.

2016-10-21 Thread Harmeet

Hey Team,

I am using load data command with specific Quote character. But after
loading the data, the behavior of quote character is not working. Below is
my example: 

*create table one (name string, description string, salary double, age int,
dob timestamp) stored by 'carbondata';*

csf File >> 

name, description, salary, age, dob
tammy, $my name$, 90, 22, 19/10/2019

0: jdbc:hive2://127.0.0.1:1> load data local inpath
'hdfs://localhost:54310/home/harmeet/dollarquote.csv' into table one
OPTIONS('QUOTECHAR'="$");

Results >> 
0: jdbc:hive2://127.0.0.1:1> select * from one;
+---+--+---+---+---+--+
|   name| description  |  dob  |  salary  
|  age  |
+---+--+---+---+---+--+
| tammy |  $my name$   | NULL  |
90.0  | 22|
+---+--+---+---+---+--+

I am assuming, in description column only "my name" data is loaded and
dollars was exclude, but this is not working. The same behavior, if we are
using ' (Single Quote) with data. 



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Load-data-command-Quote-character-unexpected-behavior-tp2145.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: Unable to load CSV using differenent delimiter like "|"(Pipe) and";" (Semicolon) etc

2016-10-21 Thread Harmeet

Hey Jay, Thanks for your reply.

First case with Pipe symbols working fine, But still in second case i am
getting the same error. Below is my query :

csv file >> 
name; description; salary; age; dob
tammy; 'my name'; 90; 22; 19/10/2019


0: jdbc:hive2://127.0.0.1:1> load data local inpath
'hdfs://localhost:54310/home/harmeet/semiandquote.csv' into table one
OPTIONS("DELIMITER"="\;", 'QUOTECHAR'="'");

Error: org.apache.spark.sql.AnalysisException: missing EOF at 'OPTIONS' near
'one'; line 1 pos 93 (state=,code=0)

Please correnct if something getting wrong.



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Unable-to-load-CSV-using-differenent-delimiter-like-Pipe-and-Semicolon-etc-tp2140p2144.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: Unable to load CSV using differenent delimiter like "|"(Pipe) and";" (Semicolon) etc

2016-10-21 Thread ??

hi, Harmeet


  For problem 1:
u need to keep that column name is also delimited by '|',  
that is ,  name|description|salary|age|dob
 For problem 2:
   in Load command, change "DELIMITER"=";" to "DELIMITER"="\;"  ,
   because ";" is usually means the end,  so need to be escaped.


thanks
Jay




-- Original --
From:  "Harmeet";;
Date:  Fri, Oct 21, 2016 02:23 PM
To:  "dev"; 

Subject:  Unable to load CSV using differenent delimiter like "|"(Pipe) and";" 
(Semicolon) etc



Hey Team, 

I am trying to load data from csv using different delimiters like "|"(Pipe)
and ";" (Semicolon) etc. But the system gave me an different errors
according to delimiter. The example as below: 

*create table one (name string, description string, salary double, age int,
dob timestamp) stored by 'carbondata';*

*1. "|" (Pipe Delimeter)*

deliandquote.csv File >> 

name, description, salary, age, dob
tammy| 'my name'| 90| 22| 19/10/2019

0: jdbc:hive2://127.0.0.1:1> load data local inpath
'hdfs://localhost:54310/home/harmeet/deliandquote.csv' into table one
OPTIONS("DELIMITER"="|", 'QUOTECHAR'="'");

Error: java.lang.Exception: DataLoad failure: CSV File provided is not
proper. Column names in schema and csv header are not same. CSVFile Name :
deliandquote.csv (state=,code=0)

*2. ";" (Semicolon)*

semiandquote.csv File >>

0: jdbc:hive2://127.0.0.1:1> load data local inpath
'hdfs://localhost:54310/home/harmeet/semiandquote.csv' into table one
OPTIONS("DELIMITER"=";", 'QUOTECHAR'="'");

Error: org.apache.spark.sql.AnalysisException: missing EOF at 'OPTIONS' near
'one'; line 1 pos 93 (state=,code=0)

I am not getting, why the different errors are shown, if we are not
supporting other characters. Or this is the defect of carbondata ? 




--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Unable-to-load-CSV-using-differenent-delimiter-like-Pipe-and-Semicolon-etc-tp2140.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Unable to load CSV using differenent delimiter like "|"(Pipe) and ";" (Semicolon) etc

2016-10-21 Thread Harmeet

Hey Team, 

I am trying to load data from csv using different delimiters like "|"(Pipe)
and ";" (Semicolon) etc. But the system gave me an different errors
according to delimiter. The example as below: 

*create table one (name string, description string, salary double, age int,
dob timestamp) stored by 'carbondata';*

*1. "|" (Pipe Delimeter)*

deliandquote.csv File >> 

name, description, salary, age, dob
tammy| 'my name'| 90| 22| 19/10/2019

0: jdbc:hive2://127.0.0.1:1> load data local inpath
'hdfs://localhost:54310/home/harmeet/deliandquote.csv' into table one
OPTIONS("DELIMITER"="|", 'QUOTECHAR'="'");

Error: java.lang.Exception: DataLoad failure: CSV File provided is not
proper. Column names in schema and csv header are not same. CSVFile Name :
deliandquote.csv (state=,code=0)

*2. ";" (Semicolon)*

semiandquote.csv File >>

0: jdbc:hive2://127.0.0.1:1> load data local inpath
'hdfs://localhost:54310/home/harmeet/semiandquote.csv' into table one
OPTIONS("DELIMITER"=";", 'QUOTECHAR'="'");

Error: org.apache.spark.sql.AnalysisException: missing EOF at 'OPTIONS' near
'one'; line 1 pos 93 (state=,code=0)

I am not getting, why the different errors are shown, if we are not
supporting other characters. Or this is the defect of carbondata ? 




--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Unable-to-load-CSV-using-differenent-delimiter-like-Pipe-and-Semicolon-etc-tp2140.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

?????? describe formatted table command does not show block size

2016-10-21 Thread ??

hi,Harmeet


  not clear.
  commiter has reviewed, no further comments, i will @ them to check


thanks 
Jay 




--  --
??: "Harmeet Singh";;
: 2016??10??21??(??) 12:11
??: "dev"; 

: Re:  describe formatted table command does not show block size



Hey  ?? ,

Yes I am checking the PR, so I suppose there is no need for raise the bug on
jira. But still the PR is not merge, Is there some specific reasons? . 



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/describe-formatted-table-command-does-not-show-block-size-tp2098p2138.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

49 matches

Mail list logo