date:20161014

[GitHub] incubator-carbondata pull request #213: [CARBONDATA-286] Support Append mode...

2016-10-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/213


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (CARBONDATA-318) Implement an ExternalSorter that makes maximum usage of memory while sorting

2016-10-14 Thread Jacky Li (JIRA)

Jacky Li created CARBONDATA-318:
---

 Summary: Implement an ExternalSorter that makes maximum usage of 
memory while sorting
 Key: CARBONDATA-318
 URL: https://issues.apache.org/jira/browse/CARBONDATA-318
 Project: CarbonData
  Issue Type: Sub-task
Reporter: Jacky Li


External Sorter should sort in memory until it reach configured size, then 
spill to disk. It should provide following interface:
1. insertRow/insertRowBatch: insert rows into the sorter
2. getIterator: return an iterator that iterate on sorted rows

External Sorter depends on FileWriterFactory to get a FileWriter to spill data 
into files. FileWriterFactory should be provided by user. Multiple 
implementations are possible, like writing into one folder or multiple folder



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-14 Thread Jacky Li

Hi,

I can offer one more approach for this discussion, since new dictionary values 
are rare in case of incremental load (ensure first load having as much 
dictionary value as possible), so synchronization should be rare. So how about 
using Zookeeper + HDFS file to provide this service. This is what carbon is 
doing today, we can wrap Zookeeper + HDFS to provide the global dictionary 
interface.
It has the benefit of 
1. automated: without bordering the user
2. not introducing more dependency: we already using zookeeper and HDFS.
3. performance? since new dictionary value and synchronization is rare.

What do you think?

Regards,
Jacky

> 在 2016年10月15日，上午2:38，Jihong Ma  写道：
> 
> Hi Ravi,
> 
> The major concern I have for generating global dictionary from scratch with a 
> single scan is performance, the way to handle an occasional update to the 
> dictionary is way simpler and cost effective in terms of synchronization cost 
> and refresh the global/local cache copy.  
> 
> There are a lot to worry about for distributed map, and leveraging KV store 
> is overkill if simply just for dictionary generation. 
> 
> Regards.
> 
> Jihong
> 
> -Original Message-
> From: Ravindra Pesala [mailto:ravi.pes...@gmail.com] 
> Sent: Friday, October 14, 2016 11:03 AM
> To: dev
> Subject: Re: Discussion(New feature) regarding single pass data loading 
> solution.
> 
> Hi Jihong,
> 
> I agree, we can use external tool for first load, but for incremental load
> we should have solution to add global dictionary. So this solution should
> be enough to generate global dictionary even if user does not use external
> tool for first time. That solution could be distributed map or KV store.
> 
> Regards,
> Ravi.
> 
> On 14 October 2016 at 23:12, Jihong Ma  wrote:
> 
>> Hi Liang,
>> 
>> This tool is more or less like the first load, the first time after table
>> is created, any subsequent loads/incremental loads will proceed and is
>> capable of updating the global dictionary when it encounters new value,
>> this is easiest way of achieving 1 pass data loading process without too
>> much overhead.
>> 
>> Since this tool is only triggered once per table, not considered too much
>> burden on the end users. Making global dictionary generation out of the way
>> of regular data loading is the key here.
>> 
>> Jihong
>> 
>> -Original Message-
>> From: Liang Chen [mailto:chenliang6...@gmail.com]
>> Sent: Thursday, October 13, 2016 5:39 PM
>> To: dev@carbondata.incubator.apache.org
>> Subject: RE: Discussion(New feature) regarding single pass data loading
>> solution.
>> 
>> Hi jihong
>> 
>> I am not sure that users can accept to use extra tool to do this work,
>> because provide tool or do scan at first time per table for most of global
>> dict are same cost from users perspective, and maintain the dict file also
>> be same cost, they always expecting that system can automatically and
>> internally generate dict file during loading data.
>> 
>> Can we consider this:
>> first load: make scan to generate most of global dict file, then copy this
>> file to each load node for subsequent loading
>> 
>> Regards
>> Liang
>> 
>> 
>> Jihong Ma wrote
>>> the question is what would be the default implementation? Load data
>> without dictionary?
>>> 
>>> My thought is we can provide a tool to generate global dictionary using
>>> sample data set, so the initial global dictionaries is available before
>>> normal data loading. We shall be able to perform encoding based on that,
>>> we only need to handle occasionally adding entries while loading. For
>>> columns specified with global dictionary encoding, but dictionary is not
>>> placed before data loading, we error out and direct user to use the tool
>>> first.
>>> 
>>> Make sense?
>>> 
>>> Jihong
>>> 
>>> -Original Message-
>>> From: Ravindra Pesala [mailto:
>> 
>>> ravi.pesala@
>> 
>>> ]
>>> Sent: Thursday, October 13, 2016 1:12 AM
>>> To: dev
>>> Subject: Re: Discussion(New feature) regarding single pass data loading
>>> solution.
>>> 
>>> Hi Jihong/Aniket,
>>> 
>>> In the current implementation of carbondata we are already handling
>>> external dictionary while loading the data.
>>> But here the question is what would be the default implementation? Load
>>> data with out dictionary?
>>> 
>>> 
>>> Regards,
>>> Ravi
>>> 
>>> On 13 October 2016 at 03:50, Aniket Adnaik 
>> 
>>> aniket.adnaik@
>> 
>>>  wrote:
>>> 
 Hi Ravi,
 
 1. I agree with Jihong that creation of global dictionary should be
 optional, so that it can be disabled to improve the load performance.
 User
 should be made aware that using global dictionary may boost the query
 performance.
 2. We should have a generic interface to manage global dictionary when
 its
 from external sources. In general, it is not a good idea to depend on
>> too
 many external tools.
 3. May be we should allow user to generate global

[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

2016-10-14 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/240#discussion_r83506371
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/input/InputProcessorStepImpl.java
 ---
@@ -0,0 +1,171 @@
+package org.apache.carbondata.processing.newflow.steps.input;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+import org.apache.carbondata.common.CarbonIterator;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.CarbonProperties;
+import 
org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.parser.CarbonParserFactory;
+import org.apache.carbondata.processing.newflow.parser.GenericParser;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+import org.apache.carbondata.processing.newflow.row.CarbonRowBatch;
+
+/**
+ * It reads data from record reader and sends data to next step.
+ */
+public class InputProcessorStepImpl extends AbstractDataLoadProcessorStep {
+
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(InputProcessorStepImpl.class.getName());
+
+  private GenericParser[] genericParsers;
+
+  private List> inputIterators;
+
+  public InputProcessorStepImpl(CarbonDataLoadConfiguration configuration,
+  AbstractDataLoadProcessorStep child, List> 
inputIterators) {
+super(configuration, child);
+this.inputIterators = inputIterators;
+  }
+
+  @Override public DataField[] getOutput() {
+DataField[] fields = configuration.getDataFields();
+String[] header = configuration.getHeader();
+DataField[] output = new DataField[fields.length];
+int k = 0;
+for (int i = 0; i < header.length; i++) {
+  for (int j = 0; j < fields.length; j++) {
+if 
(header[j].equalsIgnoreCase(fields[j].getColumn().getColName())) {
+  output[k++] = fields[j];
+  break;
+}
+  }
+}
+return output;
+  }
+
+  @Override public void intialize() throws CarbonDataLoadingException {
--- End diff --

typo, initialize


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: [Discussion] Code generation in carbon result preparation

2016-10-14 Thread Vimal Das Kammath

Hi Vishal,

I think, we need both solution 1 & 2

Solution1 may need re-desiging several parts of Carbon's query process
starting from scanner, aggregator to result preparation. This can help
avoid the frequent cache invalidation.

In Solution2 code generation will not solve the frequent cache invalidation
problem. However, It will surely help to improve the performance by having
specialised code instead of executing generalised code. Especially as we
support several data types and our code is generalised for that. Code
generation will help to improve performance.

Regards
Vimal

On Thu, Oct 13, 2016 at 3:02 AM, Aniket Adnaik 
wrote:

> Hi Vishal,
>
> In general, it is good idea to have a cache efficient algorithm.
>
> For solution-1 :   how do you want to handle variable length columns and
> nulls? may be you will have to maintain variable length columns separately
> and use offsets ?
>
> For solution 2:  code generation may be more efficient solution. We should
> find out all other places in executor that can benefit from code generation
> apart from row formation. BTW, any specific code generation library you
> have mind?
>
> Best Regards,
> Aniket
>
> On Wed, Oct 12, 2016 at 10:02 AM, Kumar Vishal 
> wrote:
>
> > Hi Jacky,
> > Yes result preparation in exeutor side.
> >
> > -Regards
> > Kumar Vishal
> >
> > On Wed, Oct 12, 2016 at 9:33 PM, Jacky Li  wrote:
> >
> > > Hi Vishal,
> > >
> > > Which part of the preparation are you considering? The column stitching
> > in
> > > the executor side?
> > >
> > > Regards,
> > > Jacky
> > >
> > > > 在 2016年10月12日，下午9:24，Kumar Vishal  写道：
> > > >
> > > > Hi All,
> > > > Currently we are preparing the final result row wise, as number of
> > > columns
> > > > present in project list(80 columns) is high mainly measure column or
> no
> > > > dictionary column there are lots of cpu cache invalidation is
> happening
> > > and
> > > > this is resulting to slower the query performance.
> > > >
> > > > *I can think of two solutions for this problem.*
> > > > *Solution 1*. Fill column data vertically, currently it is
> > > horizontally(It
> > > > may not solve all the problem)
> > > > *Solution 2*. Use code generation for result preparation.
> > > >
> > > > This is an initially idea.
> > > >
> > > > -Regards
> > > > Kumar Vishal
> > >
> > >
> > >
> > >
> >
>

Re: Disscusion shall CI support run carbondata based on multi version spark?

2016-10-14 Thread Vimal Das Kammath

Yes, I Agree. CI should be configured to build Carbon on different spark
versions.

On Fri, Oct 14, 2016 at 7:56 AM, Liang Chen  wrote:

>
> Yes, need to solve it , the CI should support different spark version.
>
> Regards
> Liang
>
>
> zhujin wrote
> > One issue:
> > I modified the spark.version in pom.xml,using spark1.6.2, then
> compliation
> > failed.
> >
> >
> > Root cause:
> > There was a "unused import statement" warinng in CarbonOptimizer class
> > before, we imported AggregationExpression like the following :
> > import org.apache.spark.sql.catalyst.expressions.aggregate._
> > import org.apache.spark.sql.catalyst.expressions._
> > But in spark1.6.2, AggregateExpressions is moved to subpackage
> "aggregate"
> > from "expressions"(that is for spark1.5.2).
> > So if we didn't known this change, we removed this import "import
> > org.apache.spark.sql.catalyst.expressions.aggregate._", it will cause
> > compliation failure when using spark1.6.2
> >
> >
> > Question:
> > So, maybe the CI should verify carbondata based on different version
> > spark, then it will be more helful to check the correctness of the
> > commits, shall we?
>
>
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/Disscusion-
> shall-CI-support-run-carbondata-based-on-multi-
> version-spark-tp1836p1890.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>

RE: Discussion(New feature) regarding single pass data loading solution.

2016-10-14 Thread Jihong Ma

Hi Ravi,

The major concern I have for generating global dictionary from scratch with a 
single scan is performance, the way to handle an occasional update to the 
dictionary is way simpler and cost effective in terms of synchronization cost 
and refresh the global/local cache copy.  

There are a lot to worry about for distributed map, and leveraging KV store is 
overkill if simply just for dictionary generation. 

Regards.

Jihong

-Original Message-
From: Ravindra Pesala [mailto:ravi.pes...@gmail.com] 
Sent: Friday, October 14, 2016 11:03 AM
To: dev
Subject: Re: Discussion(New feature) regarding single pass data loading 
solution.

Hi Jihong,

I agree, we can use external tool for first load, but for incremental load
we should have solution to add global dictionary. So this solution should
be enough to generate global dictionary even if user does not use external
tool for first time. That solution could be distributed map or KV store.

Regards,
Ravi.

On 14 October 2016 at 23:12, Jihong Ma  wrote:

> Hi Liang,
>
> This tool is more or less like the first load, the first time after table
> is created, any subsequent loads/incremental loads will proceed and is
> capable of updating the global dictionary when it encounters new value,
> this is easiest way of achieving 1 pass data loading process without too
> much overhead.
>
> Since this tool is only triggered once per table, not considered too much
> burden on the end users. Making global dictionary generation out of the way
> of regular data loading is the key here.
>
> Jihong
>
> -Original Message-
> From: Liang Chen [mailto:chenliang6...@gmail.com]
> Sent: Thursday, October 13, 2016 5:39 PM
> To: dev@carbondata.incubator.apache.org
> Subject: RE: Discussion(New feature) regarding single pass data loading
> solution.
>
> Hi jihong
>
> I am not sure that users can accept to use extra tool to do this work,
> because provide tool or do scan at first time per table for most of global
> dict are same cost from users perspective, and maintain the dict file also
> be same cost, they always expecting that system can automatically and
> internally generate dict file during loading data.
>
> Can we consider this:
> first load: make scan to generate most of global dict file, then copy this
> file to each load node for subsequent loading
>
> Regards
> Liang
>
>
> Jihong Ma wrote
> >the question is what would be the default implementation? Load data
> without dictionary?
> >
> > My thought is we can provide a tool to generate global dictionary using
> > sample data set, so the initial global dictionaries is available before
> > normal data loading. We shall be able to perform encoding based on that,
> > we only need to handle occasionally adding entries while loading. For
> > columns specified with global dictionary encoding, but dictionary is not
> > placed before data loading, we error out and direct user to use the tool
> > first.
> >
> > Make sense?
> >
> > Jihong
> >
> > -Original Message-
> > From: Ravindra Pesala [mailto:
>
> > ravi.pesala@
>
> > ]
> > Sent: Thursday, October 13, 2016 1:12 AM
> > To: dev
> > Subject: Re: Discussion(New feature) regarding single pass data loading
> > solution.
> >
> > Hi Jihong/Aniket,
> >
> > In the current implementation of carbondata we are already handling
> > external dictionary while loading the data.
> > But here the question is what would be the default implementation? Load
> > data with out dictionary?
> >
> >
> > Regards,
> > Ravi
> >
> > On 13 October 2016 at 03:50, Aniket Adnaik 
>
> > aniket.adnaik@
>
> >  wrote:
> >
> >> Hi Ravi,
> >>
> >> 1. I agree with Jihong that creation of global dictionary should be
> >> optional, so that it can be disabled to improve the load performance.
> >> User
> >> should be made aware that using global dictionary may boost the query
> >> performance.
> >> 2. We should have a generic interface to manage global dictionary when
> >> its
> >> from external sources. In general, it is not a good idea to depend on
> too
> >> many external tools.
> >> 3. May be we should allow user to generate global dictionary separately
> >> through SQL command or similar. Something like materialized view. This
> >> means carbon should avoid using local dictionary and do late
> >> materialization when global dictionary is present.
> >> 4. May be we should think of some ways to create global dictionary
> lazily
> >> as we serve SELECT queries. Implementation may not be that straight
> >> forward. Not sure if its worth the effort.
> >>
> >> Best Regards,
> >> Aniket
> >>
> >>
> >> On Tue, Oct 11, 2016 at 7:59 PM, Jihong Ma 
>
> > Jihong.Ma@
>
> >  wrote:
> >>
> >> >
> >> > A rather straight option is allow user to supply global dictionary
> >> > generated somewhere else or we build a separate tool just for
> >> generating
> >> as
> >> > well updating dictionary. Then the general normal data loading process
> >> will
> >> > encode columns with local

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-14 Thread Ravindra Pesala

Hi Jihong,

I agree, we can use external tool for first load, but for incremental load
we should have solution to add global dictionary. So this solution should
be enough to generate global dictionary even if user does not use external
tool for first time. That solution could be distributed map or KV store.

Regards,
Ravi.

On 14 October 2016 at 23:12, Jihong Ma  wrote:

> Hi Liang,
>
> This tool is more or less like the first load, the first time after table
> is created, any subsequent loads/incremental loads will proceed and is
> capable of updating the global dictionary when it encounters new value,
> this is easiest way of achieving 1 pass data loading process without too
> much overhead.
>
> Since this tool is only triggered once per table, not considered too much
> burden on the end users. Making global dictionary generation out of the way
> of regular data loading is the key here.
>
> Jihong
>
> -Original Message-
> From: Liang Chen [mailto:chenliang6...@gmail.com]
> Sent: Thursday, October 13, 2016 5:39 PM
> To: dev@carbondata.incubator.apache.org
> Subject: RE: Discussion(New feature) regarding single pass data loading
> solution.
>
> Hi jihong
>
> I am not sure that users can accept to use extra tool to do this work,
> because provide tool or do scan at first time per table for most of global
> dict are same cost from users perspective, and maintain the dict file also
> be same cost, they always expecting that system can automatically and
> internally generate dict file during loading data.
>
> Can we consider this:
> first load: make scan to generate most of global dict file, then copy this
> file to each load node for subsequent loading
>
> Regards
> Liang
>
>
> Jihong Ma wrote
> >the question is what would be the default implementation? Load data
> without dictionary?
> >
> > My thought is we can provide a tool to generate global dictionary using
> > sample data set, so the initial global dictionaries is available before
> > normal data loading. We shall be able to perform encoding based on that,
> > we only need to handle occasionally adding entries while loading. For
> > columns specified with global dictionary encoding, but dictionary is not
> > placed before data loading, we error out and direct user to use the tool
> > first.
> >
> > Make sense?
> >
> > Jihong
> >
> > -Original Message-
> > From: Ravindra Pesala [mailto:
>
> > ravi.pesala@
>
> > ]
> > Sent: Thursday, October 13, 2016 1:12 AM
> > To: dev
> > Subject: Re: Discussion(New feature) regarding single pass data loading
> > solution.
> >
> > Hi Jihong/Aniket,
> >
> > In the current implementation of carbondata we are already handling
> > external dictionary while loading the data.
> > But here the question is what would be the default implementation? Load
> > data with out dictionary?
> >
> >
> > Regards,
> > Ravi
> >
> > On 13 October 2016 at 03:50, Aniket Adnaik 
>
> > aniket.adnaik@
>
> >  wrote:
> >
> >> Hi Ravi,
> >>
> >> 1. I agree with Jihong that creation of global dictionary should be
> >> optional, so that it can be disabled to improve the load performance.
> >> User
> >> should be made aware that using global dictionary may boost the query
> >> performance.
> >> 2. We should have a generic interface to manage global dictionary when
> >> its
> >> from external sources. In general, it is not a good idea to depend on
> too
> >> many external tools.
> >> 3. May be we should allow user to generate global dictionary separately
> >> through SQL command or similar. Something like materialized view. This
> >> means carbon should avoid using local dictionary and do late
> >> materialization when global dictionary is present.
> >> 4. May be we should think of some ways to create global dictionary
> lazily
> >> as we serve SELECT queries. Implementation may not be that straight
> >> forward. Not sure if its worth the effort.
> >>
> >> Best Regards,
> >> Aniket
> >>
> >>
> >> On Tue, Oct 11, 2016 at 7:59 PM, Jihong Ma 
>
> > Jihong.Ma@
>
> >  wrote:
> >>
> >> >
> >> > A rather straight option is allow user to supply global dictionary
> >> > generated somewhere else or we build a separate tool just for
> >> generating
> >> as
> >> > well updating dictionary. Then the general normal data loading process
> >> will
> >> > encode columns with local dictionary if not supplied.  This should
> >> cover
> >> > majority of cases for low-medium cardinality column. For the cases we
> >> have
> >> > to incorporate online dictionary update, use a lock mechanism to sync
> >> up
> >> > should serve the purpose.
> >> >
> >> > In another words, generating global dictionary is an optional step,
> >> only
> >> > triggered when needed, not a default step as we do currently.
> >> >
> >> > Jihong
> >> >
> >> > -Original Message-
> >> > From: Ravindra Pesala [mailto:
>
> > ravi.pesala@
>
> > ]
> >> > Sent: Tuesday, October 11, 2016 2:33 AM
> >> > To: dev
> >> > Subject: Discussion(New feature) regarding

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-14 Thread Ravindra Pesala

Hi,

1. Using the external tool to generate the dictionary : I think It cannot
be default solution, it is just one option to user if they are willing to
generate dictionary separately and provide to carbon while loading the data
to boost performance.

2. Using 2 pass solution(current solution) : Currently we have 2 pass
solution and this becomes bottleneck CarbonOutputFormat. And issues arise
when we use dataframe.write().

3. Using local dictionary as default implementation : we can choose this
solution but it hits query performance as late dictionary decoding cannot
work.

4. Using distributed map as default implementation: Generate the global
dictionary using distributed map solution, but need to evaluate loading
performance.

Regards,
Ravi.

On 14 October 2016 at 06:32, Aniket Adnaik  wrote:

> After rethinking at point 4 in my previous email;
> It will be very expensive to rebuild and re-encode the values , so may not
> be a viable option. only future loads can benefit from it. But then will
> end up having some segments using global dictionary and some using local
> dictionary. May be we should not consider this option.
>
> Best Regards,
> Aniket
>
>
> On Thu, Oct 13, 2016 at 5:54 PM, Aniket Adnaik 
> wrote:
>
> > I have following comments;
> >
> > 1. If external dictionary is provided, we accept it. This interface
> should
> > be generic enough, so that we can perform lookup, add, delete, create and
> > drop  functionality. I believe we already have this functionality to some
> > extent. As long as we are able to maintain the dictionary it should be
> fine.
> > 2. If external dictionary is not provided, then by default we should
> build
> > it internally, which is our current behavior.This will continue to impact
> > the load performance though.
> > 3. If load performance is not acceptable, we should allow user to disable
> > building of global dictionary. Carbon should build local dictionary
> > instead. Will this setting apply to all subsequent loads ? may be yes for
> > now.
> > 4. If User decides to build dictionary at later point, either via
> external
> > tool
> > or using carbon sql command ("CREATE  DICTIONARY TABLE...") we should
> > provide that facility. This will help user to improve query performance
> > through late materialization. The local dictionary will not be used in
> this
> > case. Sebsequent loads
> >  will continue to add new entries to this new dictionary (external or
> > carbon specific).
> >
> > This doesn't really solve our double pass problem, but kind of works
> > around it by isolating dictionary building operation out of critical
> path.
> >
> >
> > Best Regards,
> > Aniket
> >
> >
> > On Thu, Oct 13, 2016 at 5:39 PM, Liang Chen 
> > wrote:
> >
> >> Hi jihong
> >>
> >> I am not sure that users can accept to use extra tool to do this work,
> >> because provide tool or do scan at first time per table for most of
> global
> >> dict are same cost from users perspective, and maintain the dict file
> also
> >> be same cost, they always expecting that system can automatically and
> >> internally generate dict file during loading data.
> >>
> >> Can we consider this:
> >> first load: make scan to generate most of global dict file, then copy
> this
> >> file to each load node for subsequent loading
> >>
> >> Regards
> >> Liang
> >>
> >>
> >> Jihong Ma wrote
> >> >the question is what would be the default implementation? Load data
> >> without dictionary?
> >> >
> >> > My thought is we can provide a tool to generate global dictionary
> using
> >> > sample data set, so the initial global dictionaries is available
> before
> >> > normal data loading. We shall be able to perform encoding based on
> that,
> >> > we only need to handle occasionally adding entries while loading. For
> >> > columns specified with global dictionary encoding, but dictionary is
> not
> >> > placed before data loading, we error out and direct user to use the
> tool
> >> > first.
> >> >
> >> > Make sense?
> >> >
> >> > Jihong
> >> >
> >> > -Original Message-
> >> > From: Ravindra Pesala [mailto:
> >>
> >> > ravi.pesala@
> >>
> >> > ]
> >> > Sent: Thursday, October 13, 2016 1:12 AM
> >> > To: dev
> >> > Subject: Re: Discussion(New feature) regarding single pass data
> loading
> >> > solution.
> >> >
> >> > Hi Jihong/Aniket,
> >> >
> >> > In the current implementation of carbondata we are already handling
> >> > external dictionary while loading the data.
> >> > But here the question is what would be the default implementation?
> Load
> >> > data with out dictionary?
> >> >
> >> >
> >> > Regards,
> >> > Ravi
> >> >
> >> > On 13 October 2016 at 03:50, Aniket Adnaik 
> >>
> >> > aniket.adnaik@
> >>
> >> >  wrote:
> >> >
> >> >> Hi Ravi,
> >> >>
> >> >> 1. I agree with Jihong that creation of global dictionary should be
> >> >> optional, so that it can be disabled to improve the load performance.
> >> >> User
> >> >> should

[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

2016-10-14 Thread ravipesala

GitHub user ravipesala opened a pull request:

https://github.com/apache/incubator-carbondata/pull/240

[CARBONDATA-298]Added InputProcessorStep to read data from csv reader 
iterator.

Add InputProcessorStep which should iterate recordreader of csv input and 
parse the data as per the data type.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ravipesala/incubator-carbondata 
input-processor-step

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/240.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #240


commit 96c46d2d31c2f80b89ff755c3683c08b24eca042
Author: ravipesala 
Date:   2016-10-14T17:09:58Z

Added InputProcessorStep to read data from csv reader iterator.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #236: [CARBONDATA-299] Add dictionary inte...

2016-10-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/236


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #204: [CARBONDATA-280]Fix the bug that whe...

2016-10-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/204


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #218: [CARBONDATA-288] In hdfs bad record ...

2016-10-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/218


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #234: [CARBONDATA-315] Data loading fails ...

2016-10-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/234


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...

2016-10-14 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/233#discussion_r83422823
  
--- Diff: 
hadoop/src/test/java/org/apache/carbondata/hadoop/csv/CSVInputFormatTest.java 
---
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.hadoop.csv;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.FileOutputStream;
+import java.io.IOException;
+
+import org.apache.carbondata.hadoop.io.StringArrayWritable;
+
+import junit.framework.TestCase;
+import org.junit.Assert;
+import org.junit.Test;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.compress.BZip2Codec;
+import org.apache.hadoop.io.compress.CompressionOutputStream;
+import org.apache.hadoop.io.compress.GzipCodec;
+import org.apache.hadoop.io.compress.Lz4Codec;
+import org.apache.hadoop.io.compress.SnappyCodec;
+import org.apache.hadoop.mapreduce.Job;
+import org.apache.hadoop.mapreduce.Mapper;
+import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
+import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
+
+public class CSVInputFormatTest extends TestCase {
+
+  /**
+   * generate compressed files, no need to call this method.
+   * @throws Exception
+   */
+  public void testGenerateCompressFiles() throws Exception {
+String pwd = new File("src/test/resources").getCanonicalPath();
+String inputFile = pwd + "/data.csv";
+FileInputStream input = new FileInputStream(inputFile);
+Configuration conf = new Configuration();
+
+// .gz
+String outputFile = pwd + "/data.csv.gz";
+FileOutputStream output = new FileOutputStream(outputFile);
+GzipCodec gzip = new GzipCodec();
+gzip.setConf(conf);
+CompressionOutputStream outputStream = gzip.createOutputStream(output);
+int i = -1;
+while ((i = input.read()) != -1) {
+  outputStream.write(i);
+}
+outputStream.close();
+input.close();
+
+// .bz2
+input = new FileInputStream(inputFile);
+outputFile = pwd + "/data.csv.bz2";
+output = new FileOutputStream(outputFile);
+BZip2Codec bzip2 = new BZip2Codec();
+bzip2.setConf(conf);
+outputStream = bzip2.createOutputStream(output);
+i = -1;
+while ((i = input.read()) != -1) {
+  outputStream.write(i);
+}
+outputStream.close();
+input.close();
+
+// .snappy
+input = new FileInputStream(inputFile);
+outputFile = pwd + "/data.csv.snappy";
+output = new FileOutputStream(outputFile);
+SnappyCodec snappy = new SnappyCodec();
+snappy.setConf(conf);
+outputStream = snappy.createOutputStream(output);
+i = -1;
+while ((i = input.read()) != -1) {
+  outputStream.write(i);
+}
+outputStream.close();
+input.close();
+
+//.lz4
+input = new FileInputStream(inputFile);
+outputFile = pwd + "/data.csv.lz4";
+output = new FileOutputStream(outputFile);
+Lz4Codec lz4 = new Lz4Codec();
+lz4.setConf(conf);
+outputStream = lz4.createOutputStream(output);
+i = -1;
+while ((i = input.read()) != -1) {
+  outputStream.write(i);
+}
+outputStream.close();
+input.close();
+
+  }
+
+  /**
+   * CSVCheckMapper check the content of csv files.
+   */
+  public static class CSVCheckMapper extends Mapper {
+@Override
+protected void map(NullWritable key, StringArrayWritable value, 
Context context)
+throws IOException, InterruptedException {
+  String[] columns = value.get();
+  int id =

[GitHub] incubator-carbondata pull request #230: [CARBONDATA-306]Add block size info ...

2016-10-14 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/230#discussion_r83421288
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/mdkeygen/MDKeyGenStep.java
 ---
@@ -314,7 +314,7 @@ private boolean setStepConfiguration() {
 wrapperColumnSchema = CarbonUtil
 
.getColumnSchemaList(carbonTable.getDimensionByTableName(tableName),
 carbonTable.getMeasureByTableName(tableName));
-blocksize = carbonTable.getBlocksize();
+blocksize = carbonTable.getBlocksizeInMB();
--- End diff --

should be `getBlockSizeInMB`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: Subscribe mailing list

2016-10-14 Thread Ravindra Pesala

Hi,

Please send mail to  dev-subscr...@carbondata.incubator.apache.org  to
subscribe mailing list.

Thanks,
Ravi.

On 14 October 2016 at 11:45, Anurag Srivastava  wrote:

> Hello ,
>
> I want add my mail in your mailing list.
>
> --
> *Thanks*
>
>
> *Anurag Srivastava**Software Consultant*
> *Knoldus Software LLP*
>
> *India - US - Canada*
> * Twitter  | FB
>  | LinkedIn
> *
>



-- 
Thanks & Regards,
Ravi

[GitHub] incubator-carbondata pull request #218: [CARBONDATA-288] In hdfs bad record ...

2016-10-14 Thread mohammadshahidkhan

Github user mohammadshahidkhan commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/218#discussion_r83391717
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java
 ---
@@ -458,9 +462,11 @@ public boolean processRow(StepMetaInterface smi, 
StepDataInterface sdi) throws K
   break;
 case REDIRECT:
   badRecordsLogRedirect = true;
+  badRecordConvertNullDisable= true;
--- End diff --

Fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #218: [CARBONDATA-288] In hdfs bad record ...

2016-10-14 Thread mohammadshahidkhan

Github user mohammadshahidkhan commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/218#discussion_r83391617
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/BadRecordslogger.java
 ---
@@ -69,9 +68,13 @@
   private BufferedWriter bufferedCSVWriter;
   private DataOutputStream outCSVStream;
   /**
-   *
+   * bad record log file path
+   */
+  private String logFilePath;
+  /**
+   * csv file path
*/
-  private CarbonFile logFile;
+  private String csvFilePath;
--- End diff --

log file will contains bad record row with the detailed reason of the 
failure 
csv will have only the bad record row.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #218: [CARBONDATA-288] In hdfs bad record ...

2016-10-14 Thread mohammadshahidkhan

Github user mohammadshahidkhan commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/218#discussion_r83391382
  
--- Diff: 
integration/spark/src/main/java/org/apache/carbondata/spark/load/CarbonLoadModel.java
 ---
@@ -117,9 +117,9 @@
   private String badRecordsLoggerEnable;
 
   /**
-   * defines the option to specify the bad record log redirect to raw csv
+   * defines the option to specify the bad record logger action
*/
-  private String badRecordsLoggerRedirect;
+  private String badRecordsLoggerAction;
--- End diff --

yes corrected


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #229: [CARBONDATA-297]Added interface for ...

2016-10-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/229


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...

2016-10-14 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/233#discussion_r83387366
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/mapreduce/CSVInputFormat.java 
---
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.hadoop.mapreduce;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.io.Reader;
+
+import org.apache.carbondata.hadoop.io.BoundedInputStream;
+import org.apache.carbondata.hadoop.io.StringArrayWritable;
+import org.apache.carbondata.hadoop.util.CSVInputFormatUtil;
+
+import com.univocity.parsers.csv.CsvParser;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.Seekable;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.Text;
+import org.apache.hadoop.io.compress.CodecPool;
+import org.apache.hadoop.io.compress.CompressionCodec;
+import org.apache.hadoop.io.compress.CompressionCodecFactory;
+import org.apache.hadoop.io.compress.CompressionInputStream;
+import org.apache.hadoop.io.compress.Decompressor;
+import org.apache.hadoop.io.compress.SplitCompressionInputStream;
+import org.apache.hadoop.io.compress.SplittableCompressionCodec;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.RecordReader;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
+import org.apache.hadoop.mapreduce.lib.input.FileSplit;
+import org.apache.hadoop.util.LineReader;
+
+/**
+ * An {@link org.apache.hadoop.mapreduce.InputFormat} for csv files.  
Files are broken into lines.
+ * Values are the line of csv files.
+ */
+public class CSVInputFormat extends FileInputFormat {
+
+  @Override
+  public RecordReader 
createRecordReader(InputSplit inputSplit,
+  TaskAttemptContext context) throws IOException, InterruptedException 
{
+return new NewCSVRecordReader();
+  }
+
+  /**
+   * Treats value as line in file. Key is null.
+   */
+  public static class NewCSVRecordReader extends 
RecordReader {
+
+private long start;
+private long end;
+private BoundedInputStream boundedInputStream;
+private Reader reader;
+private CsvParser csvParser;
+private StringArrayWritable value;
+private String[] columns;
+private Seekable filePosition;
+private boolean isCompressedInput;
+private Decompressor decompressor;
+
+@Override
+public void initialize(InputSplit inputSplit, TaskAttemptContext 
context)
+throws IOException, InterruptedException {
+  FileSplit split = (FileSplit) inputSplit;
+  this.start = split.getStart();
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...

2016-10-14 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/233#discussion_r83386474
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/mapreduce/CSVInputFormat.java 
---
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.hadoop.mapreduce;
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...

2016-10-14 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/233#discussion_r83386400
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/io/StringArrayWritable.java 
---
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.hadoop.io;
+
+import java.io.DataInput;
+import java.io.DataOutput;
+import java.io.IOException;
+import java.nio.charset.Charset;
+import java.util.Arrays;
+
+import org.apache.hadoop.io.Writable;
+
+/**
+ * A String sequence that is usable as a key or value.
+ */
+public class StringArrayWritable implements Writable {
+  private String[] values;
+
+  public String[] toStrings() {
+return values;
+  }
+
+  public void set(String[] values) {
+this.values = values;
+  }
+
+  public String[] get() {
+return values;
+  }
+
+  @Override public void readFields(DataInput in) throws IOException {
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #230: [CARBONDATA-306]Add block size info ...

2016-10-14 Thread Jay357089

Github user Jay357089 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/230#discussion_r83376650
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -1422,6 +1422,7 @@ private[sql] case class DescribeCommandFormatted(
 results ++= Seq(("Table Name : ", 
relation.tableMeta.carbonTableIdentifier.getTableName, ""))
 results ++= Seq(("CARBON Store Path : ", relation.tableMeta.storePath, 
""))
 val carbonTable = relation.tableMeta.carbonTable
+results ++= Seq(("Table Block Size : ", carbonTable.getBlocksize + " 
MB", ""))
--- End diff --

done. CI passed. 
http://136.243.101.176:8080/job/ApacheCarbonManualPRBuilder/429/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #229: [CARBONDATA-297]Added interface for ...

2016-10-14 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/229#discussion_r83373827
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/AbstractDataLoadProcessorStep.java
 ---
@@ -73,15 +72,15 @@ public 
AbstractDataLoadProcessorStep(CarbonDataLoadConfiguration configuration,
* Create the iterator using child iterator.
*
* @param childIter
-   * @return
+   * @return new iterator with step specific processing.
*/
-  protected Iterator getIterator(final Iterator 
childIter) {
-return new CarbonIterator() {
+  protected Iterator getIterator(final Iterator 
childIter) {
--- End diff --

ok. Added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Subscribe mailing list

2016-10-14 Thread Anurag Srivastava

Hello ,

I want add my mail in your mailing list.

-- 
*Thanks*


*Anurag Srivastava**Software Consultant*
*Knoldus Software LLP*

*India - US - Canada*
* Twitter  | FB
 | LinkedIn
*

Subscribe to carbondata

2016-10-14 Thread Abhishek Giri

Hi,
 Please add me to carbon data mailing list
Regards
-
Abhishek Giri
9717415895

[GitHub] incubator-carbondata pull request #213: [CARBONDATA-286] Support Append mode...

[jira] [Created] (CARBONDATA-318) Implement an ExternalSorter that makes maximum usage of memory while sorting

Re: Discussion(New feature) regarding single pass data loading solution.

[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

Re: [Discussion] Code generation in carbon result preparation

Re: Disscusion shall CI support run carbondata based on multi version spark?

RE: Discussion(New feature) regarding single pass data loading solution.

Re: Discussion(New feature) regarding single pass data loading solution.

Re: Discussion(New feature) regarding single pass data loading solution.

[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

[GitHub] incubator-carbondata pull request #236: [CARBONDATA-299] Add dictionary inte...

[GitHub] incubator-carbondata pull request #204: [CARBONDATA-280]Fix the bug that whe...

[GitHub] incubator-carbondata pull request #218: [CARBONDATA-288] In hdfs bad record ...

[GitHub] incubator-carbondata pull request #234: [CARBONDATA-315] Data loading fails ...

[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...

[GitHub] incubator-carbondata pull request #230: [CARBONDATA-306]Add block size info ...

Re: Subscribe mailing list

[GitHub] incubator-carbondata pull request #218: [CARBONDATA-288] In hdfs bad record ...

[GitHub] incubator-carbondata pull request #218: [CARBONDATA-288] In hdfs bad record ...

[GitHub] incubator-carbondata pull request #218: [CARBONDATA-288] In hdfs bad record ...

[GitHub] incubator-carbondata pull request #229: [CARBONDATA-297]Added interface for ...

[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...

[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...

[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...

[GitHub] incubator-carbondata pull request #230: [CARBONDATA-306]Add block size info ...

[GitHub] incubator-carbondata pull request #229: [CARBONDATA-297]Added interface for ...

Subscribe mailing list

Subscribe to carbondata

28 matches

Site Navigation

Mail list logo

Footer information