[GitHub] incubator-carbondata pull request #82: [CARBONDATA-165] Support loading fact...

2016-08-29 Thread foryou2030
Github user foryou2030 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/82#discussion_r76726188
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -1443,5 +1446,32 @@ public static int getDictionaryChunkSize() {
 }
 return dictionaryOneChunkSize;
   }
+
+  /**
+   * @param csvFilePath
+   * @return
+   */
+  public static String readHeader(String csvFilePath) {
+
+DataInputStream fileReader = null;
+BufferedReader bufferedReader = null;
+String readLine = null;
+
+try {
+  fileReader =
+  FileFactory.getDataInputStream(csvFilePath, 
FileFactory.getFileType(csvFilePath));
+  bufferedReader =
+  new BufferedReader(new InputStreamReader(fileReader, 
Charset.defaultCharset()));
--- End diff --

ok, handled


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (CARBONDATA-191) load data is null when quote char is single and no '\n' being end.

2016-08-29 Thread Jay (JIRA)
Jay created CARBONDATA-191:
--

 Summary: load data is null when quote char is single and no '\n' 
being end.
 Key: CARBONDATA-191
 URL: https://issues.apache.org/jira/browse/CARBONDATA-191
 Project: CarbonData
  Issue Type: Bug
Reporter: Jay
Priority: Minor


when load data just like below,
CREATE TABLE Priyal11 (id int,name string) STORED BY 
'org.apache.carbondata.format';

LOAD DATA inpath 'hdfs://hacluster/Priyal1/test34.csv' INTO table Priyal11  
options ('DELIMITER'=',', 'QUOTECHAR'='\"', 'FILEHEADER'='id,name');

and test34.csv is like below(note: there is no '\n' in the end of file.):
1,"priyal\"
2,"hello\"

then query  name's result is null. Actually, because of the existence of quote 
char. the expected result should be 
|prival"
6,"hello"|
and if we add new line in the end of file, then query is right.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-carbondata pull request #104: [CARBONDATA-188] Compress CSV file b...

2016-08-29 Thread QiangCai
Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/104#discussion_r76723805
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala
 ---
@@ -364,6 +364,7 @@ object GlobalDictionaryUtil extends Logging {
   .option("escape", carbonLoadModel.getEscapeChar)
   .option("ignoreLeadingWhiteSpace", "false")
   .option("ignoreTrailingWhiteSpace", "false")
+  .option("codec", "gzip")
--- End diff --

Please check whether if it is a compression file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #104: [CARBONDATA-188] Compress CSV file b...

2016-08-29 Thread QiangCai
Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/104#discussion_r76723773
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -657,6 +657,8 @@ object CarbonDataRDDFactory extends Logging {
   val filePaths = carbonLoadModel.getFactFilePath
   
hadoopConfiguration.set("mapreduce.input.fileinputformat.inputdir", filePaths)
   
hadoopConfiguration.set("mapreduce.input.fileinputformat.input.dir.recursive", 
"true")
+  hadoopConfiguration.set("io.compression.codecs",
+"org.apache.hadoop.io.compress.GzipCodec")
--- End diff --

This configuration is only for compression file.
Please check whether if it is a compression file.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: A warning when loading data

2016-08-29 Thread Zen Wellon
Thank you Ravi, but I've set hadoop.tmp.dir in hadoop's core_site.xml. I'll
build the master and  try it again.

2016-08-30 0:11 GMT+08:00 Ravindra Pesala :

> Hi Zen,
>
> It seems this issue is related to the PR
> https://github.com/apache/incubator-carbondata/pull/89 .And it is merged
> to
> master. Alternatively please try to add hadoop.tmp.dir to carbon.properties
> file to solve this issue.
>
> Thanks,
> Ravi
>
> On 29 August 2016 at 20:44, Zen Wellon  wrote:
>
> >  I don't think it's raised by lockfile, because I've tried to recreate a
> > new table with a totally different name. However, I'll check it tomorrow.
> >
> > 2016-08-29 23:09 GMT+08:00 Ravindra Pesala :
> >
> > > Hi,
> > >
> > > Did you check if any locks are created under system temp folder with
> > > //lockfile, if it exists please delete and
> try.
> > >
> > > Thanks,
> > > Ravi.
> > >
> > > On 29 August 2016 at 20:29, Zen Wellon  wrote:
> > >
> > > > Hi Ravi,
> > > >
> > > > After I upgrade carbon to 0.1.0, this problem occurs every time when
> I
> > > try
> > > > to load data, and I'm sure no other carbon is running because I use
> my
> > > > personal dev spark-cluster, I've also tried to recreate a new table,
> > but
> > > > it's still there..
> > > >
> > > > 2016-08-29 18:11 GMT+08:00 Ravindra Pesala :
> > > >
> > > > > Hi,
> > > > >
> > > > > Are you getting this exception continuously for every load? Usually
> > it
> > > > > occurs when you try to load the data concurrently to the same
> table.
> > So
> > > > > please make sure that no other instance of carbon is running and
> data
> > > > load
> > > > > on the same table is not happening.
> > > > > Check if any locks are created under system temp folder with
> > > > > //lockfile, if it exists please delete.
> > > > >
> > > > > Thanks & Regards,
> > > > > Ravi
> > > > >
> > > > > On Mon, 29 Aug 2016 1:27 pm Zen Wellon,  wrote:
> > > > >
> > > > > > Hi guys,
> > > > > > When I tried to load some data into carbondata table with carbon
> > > > 0.1.0, I
> > > > > > met a problem below.
> > > > > >
> > > > > > WARN  29-08 15:40:17,535 - Lost task 10.0 in stage 2.1 (TID 365,
> > > > > > amlera-30-6.gtj): java.lang.RuntimeException: Dictionary file
> > > > > ***(sensitive
> > > > > > column) is locked for updation. Please try after some time
> > > > > > at scala.sys.package$.error(package.scala:27)
> > > > > > at
> > > > > >
> > > > > > org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerate
> > > > > RDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:354)
> > > > > > at
> > > > > >
> > > > > > org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerate
> > > > > RDD.compute(CarbonGlobalDictionaryRDD.scala:294)
> > > > > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.
> > > > > scala:306)
> > > > > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> > > > > > at
> > > > > > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.
> scala:66)
> > > > > > at org.apache.spark.scheduler.Task.run(Task.scala:89)
> > > > > > at
> > > > > > org.apache.spark.executor.Executor$TaskRunner.run(
> > > Executor.scala:227)
> > > > > > at
> > > > > >
> > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(
> > > > > ThreadPoolExecutor.java:1145)
> > > > > > at
> > > > > >
> > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > > > > ThreadPoolExecutor.java:615)
> > > > > > at java.lang.Thread.run(Thread.java:745)
> > > > > >
> > > > > > --
> > > > > >
> > > > > >
> > > > > > Best regards,
> > > > > > William Zen
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > >
> > > > Best regards,
> > > > William Zen
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks & Regards,
> > > Ravi
> > >
> >
> >
> >
> > --
> >
> >
> > Best regards,
> > William Zen
> >
>
>
>
> --
> Thanks & Regards,
> Ravi
>



-- 


Best regards,
William Zen


[GitHub] incubator-carbondata pull request #105: [CARBONDATA-189] Drop database casca...

2016-08-29 Thread sujith71955
Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/105#discussion_r76657713
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/CarbonSqlParser.scala ---
@@ -1342,4 +1345,9 @@ class CarbonSqlParser()
   }
 }
 
+  protected lazy val dropDatabaseCascade: Parser[LogicalPlan] =
+DROP ~> (DATABASE|SCHEMA) ~> opt(IF ~> EXISTS) ~> ident ~> CASCADE <~ 
opt(";") ^^ {
+  case cascade => throw new MalformedCarbonCommandException(
+  "Unsupported cascade operation in drop database command")
--- End diff --

since system supports both database and schema better to provide message 
including schema like database/schema command like "Unsupported cascade 
operation in drop database/schema command" , else better provide  "Unsupported 
cascade operation in drop command".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: A warning when loading data

2016-08-29 Thread Ravindra Pesala
Hi Zen,

It seems this issue is related to the PR
https://github.com/apache/incubator-carbondata/pull/89 .And it is merged to
master. Alternatively please try to add hadoop.tmp.dir to carbon.properties
file to solve this issue.

Thanks,
Ravi

On 29 August 2016 at 20:44, Zen Wellon  wrote:

>  I don't think it's raised by lockfile, because I've tried to recreate a
> new table with a totally different name. However, I'll check it tomorrow.
>
> 2016-08-29 23:09 GMT+08:00 Ravindra Pesala :
>
> > Hi,
> >
> > Did you check if any locks are created under system temp folder with
> > //lockfile, if it exists please delete and try.
> >
> > Thanks,
> > Ravi.
> >
> > On 29 August 2016 at 20:29, Zen Wellon  wrote:
> >
> > > Hi Ravi,
> > >
> > > After I upgrade carbon to 0.1.0, this problem occurs every time when I
> > try
> > > to load data, and I'm sure no other carbon is running because I use my
> > > personal dev spark-cluster, I've also tried to recreate a new table,
> but
> > > it's still there..
> > >
> > > 2016-08-29 18:11 GMT+08:00 Ravindra Pesala :
> > >
> > > > Hi,
> > > >
> > > > Are you getting this exception continuously for every load? Usually
> it
> > > > occurs when you try to load the data concurrently to the same table.
> So
> > > > please make sure that no other instance of carbon is running and data
> > > load
> > > > on the same table is not happening.
> > > > Check if any locks are created under system temp folder with
> > > > //lockfile, if it exists please delete.
> > > >
> > > > Thanks & Regards,
> > > > Ravi
> > > >
> > > > On Mon, 29 Aug 2016 1:27 pm Zen Wellon,  wrote:
> > > >
> > > > > Hi guys,
> > > > > When I tried to load some data into carbondata table with carbon
> > > 0.1.0, I
> > > > > met a problem below.
> > > > >
> > > > > WARN  29-08 15:40:17,535 - Lost task 10.0 in stage 2.1 (TID 365,
> > > > > amlera-30-6.gtj): java.lang.RuntimeException: Dictionary file
> > > > ***(sensitive
> > > > > column) is locked for updation. Please try after some time
> > > > > at scala.sys.package$.error(package.scala:27)
> > > > > at
> > > > >
> > > > > org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerate
> > > > RDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:354)
> > > > > at
> > > > >
> > > > > org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerate
> > > > RDD.compute(CarbonGlobalDictionaryRDD.scala:294)
> > > > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.
> > > > scala:306)
> > > > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> > > > > at
> > > > > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> > > > > at org.apache.spark.scheduler.Task.run(Task.scala:89)
> > > > > at
> > > > > org.apache.spark.executor.Executor$TaskRunner.run(
> > Executor.scala:227)
> > > > > at
> > > > >
> > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(
> > > > ThreadPoolExecutor.java:1145)
> > > > > at
> > > > >
> > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > > > ThreadPoolExecutor.java:615)
> > > > > at java.lang.Thread.run(Thread.java:745)
> > > > >
> > > > > --
> > > > >
> > > > >
> > > > > Best regards,
> > > > > William Zen
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > >
> > > Best regards,
> > > William Zen
> > >
> >
> >
> >
> > --
> > Thanks & Regards,
> > Ravi
> >
>
>
>
> --
>
>
> Best regards,
> William Zen
>



-- 
Thanks & Regards,
Ravi


[jira] [Created] (CARBONDATA-190) Data mismatch issue

2016-08-29 Thread kumar vishal (JIRA)
kumar vishal created CARBONDATA-190:
---

 Summary: Data mismatch issue
 Key: CARBONDATA-190
 URL: https://issues.apache.org/jira/browse/CARBONDATA-190
 Project: CarbonData
  Issue Type: Bug
Reporter: kumar vishal
Assignee: kumar vishal


Issue steps:1. create table , then restart the server and then do data load, in 
that case filter query record count is not matching.
Problem: When user is creating any table and if user has not disabled inverted 
index false for any key column we are setting the inverted index true in column 
schema object. As we are not persisting this information in schema file, so 
after restarting the server useInvertedIndex property is false in columnschema 
object and in data loading column data is not sorted and in filter execution we 
are doing binary search, as data is not sorted binary search is failing and it 
is skipping some of the record.
Solution : In this pr default value is set to true. One more PR will be raised 
to handle inverted index disabled scneario. By default Inverted index will be 
enabled for all the column for better query performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-carbondata pull request #82: [CARBONDATA-165] Support loading fact...

2016-08-29 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/82#discussion_r76625697
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -1443,5 +1446,32 @@ public static int getDictionaryChunkSize() {
 }
 return dictionaryOneChunkSize;
   }
+
+  /**
+   * @param csvFilePath
+   * @return
+   */
+  public static String readHeader(String csvFilePath) {
+
+DataInputStream fileReader = null;
+BufferedReader bufferedReader = null;
+String readLine = null;
+
+try {
+  fileReader =
+  FileFactory.getDataInputStream(csvFilePath, 
FileFactory.getFileType(csvFilePath));
+  bufferedReader =
+  new BufferedReader(new InputStreamReader(fileReader, 
Charset.defaultCharset()));
--- End diff --

@foryou2030 instead of using Charset.defaultCharset() use the below 
line of code.
Charset.forName( CarbonCommonConstants.DEFAULT_CHARSET)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: A warning when loading data

2016-08-29 Thread Zen Wellon
 I don't think it's raised by lockfile, because I've tried to recreate a
new table with a totally different name. However, I'll check it tomorrow.

2016-08-29 23:09 GMT+08:00 Ravindra Pesala :

> Hi,
>
> Did you check if any locks are created under system temp folder with
> //lockfile, if it exists please delete and try.
>
> Thanks,
> Ravi.
>
> On 29 August 2016 at 20:29, Zen Wellon  wrote:
>
> > Hi Ravi,
> >
> > After I upgrade carbon to 0.1.0, this problem occurs every time when I
> try
> > to load data, and I'm sure no other carbon is running because I use my
> > personal dev spark-cluster, I've also tried to recreate a new table, but
> > it's still there..
> >
> > 2016-08-29 18:11 GMT+08:00 Ravindra Pesala :
> >
> > > Hi,
> > >
> > > Are you getting this exception continuously for every load? Usually it
> > > occurs when you try to load the data concurrently to the same table. So
> > > please make sure that no other instance of carbon is running and data
> > load
> > > on the same table is not happening.
> > > Check if any locks are created under system temp folder with
> > > //lockfile, if it exists please delete.
> > >
> > > Thanks & Regards,
> > > Ravi
> > >
> > > On Mon, 29 Aug 2016 1:27 pm Zen Wellon,  wrote:
> > >
> > > > Hi guys,
> > > > When I tried to load some data into carbondata table with carbon
> > 0.1.0, I
> > > > met a problem below.
> > > >
> > > > WARN  29-08 15:40:17,535 - Lost task 10.0 in stage 2.1 (TID 365,
> > > > amlera-30-6.gtj): java.lang.RuntimeException: Dictionary file
> > > ***(sensitive
> > > > column) is locked for updation. Please try after some time
> > > > at scala.sys.package$.error(package.scala:27)
> > > > at
> > > >
> > > > org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerate
> > > RDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:354)
> > > > at
> > > >
> > > > org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerate
> > > RDD.compute(CarbonGlobalDictionaryRDD.scala:294)
> > > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.
> > > scala:306)
> > > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> > > > at
> > > > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> > > > at org.apache.spark.scheduler.Task.run(Task.scala:89)
> > > > at
> > > > org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:227)
> > > > at
> > > >
> > > > java.util.concurrent.ThreadPoolExecutor.runWorker(
> > > ThreadPoolExecutor.java:1145)
> > > > at
> > > >
> > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > > ThreadPoolExecutor.java:615)
> > > > at java.lang.Thread.run(Thread.java:745)
> > > >
> > > > --
> > > >
> > > >
> > > > Best regards,
> > > > William Zen
> > > >
> > >
> >
> >
> >
> > --
> >
> >
> > Best regards,
> > William Zen
> >
>
>
>
> --
> Thanks & Regards,
> Ravi
>



-- 


Best regards,
William Zen


Re: A warning when loading data

2016-08-29 Thread Ravindra Pesala
Hi,

Did you check if any locks are created under system temp folder with
//lockfile, if it exists please delete and try.

Thanks,
Ravi.

On 29 August 2016 at 20:29, Zen Wellon  wrote:

> Hi Ravi,
>
> After I upgrade carbon to 0.1.0, this problem occurs every time when I try
> to load data, and I'm sure no other carbon is running because I use my
> personal dev spark-cluster, I've also tried to recreate a new table, but
> it's still there..
>
> 2016-08-29 18:11 GMT+08:00 Ravindra Pesala :
>
> > Hi,
> >
> > Are you getting this exception continuously for every load? Usually it
> > occurs when you try to load the data concurrently to the same table. So
> > please make sure that no other instance of carbon is running and data
> load
> > on the same table is not happening.
> > Check if any locks are created under system temp folder with
> > //lockfile, if it exists please delete.
> >
> > Thanks & Regards,
> > Ravi
> >
> > On Mon, 29 Aug 2016 1:27 pm Zen Wellon,  wrote:
> >
> > > Hi guys,
> > > When I tried to load some data into carbondata table with carbon
> 0.1.0, I
> > > met a problem below.
> > >
> > > WARN  29-08 15:40:17,535 - Lost task 10.0 in stage 2.1 (TID 365,
> > > amlera-30-6.gtj): java.lang.RuntimeException: Dictionary file
> > ***(sensitive
> > > column) is locked for updation. Please try after some time
> > > at scala.sys.package$.error(package.scala:27)
> > > at
> > >
> > > org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerate
> > RDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:354)
> > > at
> > >
> > > org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerate
> > RDD.compute(CarbonGlobalDictionaryRDD.scala:294)
> > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.
> > scala:306)
> > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> > > at
> > > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> > > at org.apache.spark.scheduler.Task.run(Task.scala:89)
> > > at
> > > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
> > > at
> > >
> > > java.util.concurrent.ThreadPoolExecutor.runWorker(
> > ThreadPoolExecutor.java:1145)
> > > at
> > >
> > > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > ThreadPoolExecutor.java:615)
> > > at java.lang.Thread.run(Thread.java:745)
> > >
> > > --
> > >
> > >
> > > Best regards,
> > > William Zen
> > >
> >
>
>
>
> --
>
>
> Best regards,
> William Zen
>



-- 
Thanks & Regards,
Ravi


Re: A warning when loading data

2016-08-29 Thread Zen Wellon
Hi Ravi,

After I upgrade carbon to 0.1.0, this problem occurs every time when I try
to load data, and I'm sure no other carbon is running because I use my
personal dev spark-cluster, I've also tried to recreate a new table, but
it's still there..

2016-08-29 18:11 GMT+08:00 Ravindra Pesala :

> Hi,
>
> Are you getting this exception continuously for every load? Usually it
> occurs when you try to load the data concurrently to the same table. So
> please make sure that no other instance of carbon is running and data load
> on the same table is not happening.
> Check if any locks are created under system temp folder with
> //lockfile, if it exists please delete.
>
> Thanks & Regards,
> Ravi
>
> On Mon, 29 Aug 2016 1:27 pm Zen Wellon,  wrote:
>
> > Hi guys,
> > When I tried to load some data into carbondata table with carbon 0.1.0, I
> > met a problem below.
> >
> > WARN  29-08 15:40:17,535 - Lost task 10.0 in stage 2.1 (TID 365,
> > amlera-30-6.gtj): java.lang.RuntimeException: Dictionary file
> ***(sensitive
> > column) is locked for updation. Please try after some time
> > at scala.sys.package$.error(package.scala:27)
> > at
> >
> > org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerate
> RDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:354)
> > at
> >
> > org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerate
> RDD.compute(CarbonGlobalDictionaryRDD.scala:294)
> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.
> scala:306)
> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> > at
> > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> > at org.apache.spark.scheduler.Task.run(Task.scala:89)
> > at
> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
> > at
> >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> > at
> >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:745)
> >
> > --
> >
> >
> > Best regards,
> > William Zen
> >
>



-- 


Best regards,
William Zen


[GitHub] incubator-carbondata pull request #107: [WIP]quotechar is single without new...

2016-08-29 Thread Jay357089
GitHub user Jay357089 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/107

[WIP]quotechar is single without newLine



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Jay357089/incubator-carbondata quoteNewline

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/107.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #107


commit bf97e735a0e619e78e47611c1095939d6c6b92eb
Author: Jay357089 
Date:   2016-08-29T14:55:22Z

quotechar and newLine




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #92: [CARBONDATA-176] Deletion of compacte...

2016-08-29 Thread ravikiran23
Github user ravikiran23 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/92#discussion_r76619356
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java
 ---
@@ -449,6 +457,12 @@ public void writeLoadDetailsIntoFile(String 
dataLoadLocation,
 for (LoadMetadataDetails loadMetadata : listOfLoadFolderDetailsArray) {
   Integer result = 
compareDateValues(loadMetadata.getLoadStartTimeAsLong(), loadStartTime);
   if (result < 0) {
+if (CarbonCommonConstants.SEGMENT_COMPACTED
--- End diff --

handled


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #81: [CARBONDATA-132] Fix the bug that the...

2016-08-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/81


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #92: [CARBONDATA-176] Deletion of compacte...

2016-08-29 Thread ravikiran23
Github user ravikiran23 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/92#discussion_r76606724
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java
 ---
@@ -410,18 +410,26 @@ public void writeLoadDetailsIntoFile(String 
dataLoadLocation,
   for (LoadMetadataDetails loadMetadata : 
listOfLoadFolderDetailsArray) {
 
 if (loadId.equalsIgnoreCase(loadMetadata.getLoadName())) {
+  // if the segment is compacted then no need to delete that.
+  if (CarbonCommonConstants.SEGMENT_COMPACTED
+  .equalsIgnoreCase(loadMetadata.getLoadStatus())) {
+LOG.error("Cannot delete the Segment which is compacted. 
Segment is " + loadId);
+loadFound = true;
+invalidLoadIds.add(loadId);
--- End diff --

fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #92: [CARBONDATA-176] Deletion of compacte...

2016-08-29 Thread ManoharVanam
Github user ManoharVanam commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/92#discussion_r76606495
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java
 ---
@@ -410,18 +410,26 @@ public void writeLoadDetailsIntoFile(String 
dataLoadLocation,
   for (LoadMetadataDetails loadMetadata : 
listOfLoadFolderDetailsArray) {
 
 if (loadId.equalsIgnoreCase(loadMetadata.getLoadName())) {
+  // if the segment is compacted then no need to delete that.
+  if (CarbonCommonConstants.SEGMENT_COMPACTED
+  .equalsIgnoreCase(loadMetadata.getLoadStatus())) {
+LOG.error("Cannot delete the Segment which is compacted. 
Segment is " + loadId);
+loadFound = true;
+invalidLoadIds.add(loadId);
--- End diff --

Above two lines are not required as we are deleting  all or none


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: A warning when loading data

2016-08-29 Thread Ravindra Pesala
Hi,

Are you getting this exception continuously for every load? Usually it
occurs when you try to load the data concurrently to the same table. So
please make sure that no other instance of carbon is running and data load
on the same table is not happening.
Check if any locks are created under system temp folder with
//lockfile, if it exists please delete.

Thanks & Regards,
Ravi

On Mon, 29 Aug 2016 1:27 pm Zen Wellon,  wrote:

> Hi guys,
> When I tried to load some data into carbondata table with carbon 0.1.0, I
> met a problem below.
>
> WARN  29-08 15:40:17,535 - Lost task 10.0 in stage 2.1 (TID 365,
> amlera-30-6.gtj): java.lang.RuntimeException: Dictionary file ***(sensitive
> column) is locked for updation. Please try after some time
> at scala.sys.package$.error(package.scala:27)
> at
>
> org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:354)
> at
>
> org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD.compute(CarbonGlobalDictionaryRDD.scala:294)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
> --
>
>
> Best regards,
> William Zen
>


[GitHub] incubator-carbondata pull request #104: [CARBONDATA-188] Compress CSV file b...

2016-08-29 Thread Zhangshunyu
Github user Zhangshunyu commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/104#discussion_r76580960
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java
 ---
@@ -112,25 +116,29 @@ private void initializeReader() throws IOException {
 // if already one input stream is open first we need to close and then
 // open new stream
 close();
-// get the block offset
-long startOffset = 
this.csvParserVo.getBlockDetailsList().get(blockCounter).getBlockOffset();
-FileType fileType = FileFactory
-
.getFileType(this.csvParserVo.getBlockDetailsList().get(blockCounter).getFilePath());
-// calculate the end offset the block
-long endOffset =
-
this.csvParserVo.getBlockDetailsList().get(blockCounter).getBlockLength() + 
startOffset;
-
-// create a input stream for the block
-DataInputStream dataInputStream = FileFactory
-
.getDataInputStream(this.csvParserVo.getBlockDetailsList().get(blockCounter).getFilePath(),
-fileType, bufferSize, startOffset);
-// if start offset is not 0 then reading then reading and ignoring the 
extra line
-if (startOffset != 0) {
-  LineReader lineReader = new LineReader(dataInputStream, 1);
-  startOffset += lineReader.readLine(new Text(), 0);
+
+String path = 
this.csvParserVo.getBlockDetailsList().get(blockCounter).getFilePath();
+FileType fileType = FileFactory.getFileType(path);
+
+if (path.endsWith(".gz")) {
+  DataInputStream dataInputStream =
+  FileFactory.getCompressedDataInputStream(path, fileType, 
bufferSize);
+  inputStreamReader = new BufferedReader(new 
InputStreamReader(dataInputStream));
+} else {
+  long startOffset = 
this.csvParserVo.getBlockDetailsList().get(blockCounter).getBlockOffset();
+  long blockLength = 
this.csvParserVo.getBlockDetailsList().get(blockCounter).getBlockLength();
+  long endOffset = blockLength + startOffset;
+
+  DataInputStream dataInputStream = 
FileFactory.getDataInputStream(path, fileType, bufferSize);
+
+  // if start offset is not 0 then reading then reading and ignoring 
the extra line
+  if (startOffset != 0) {
+LineReader lineReader = new LineReader(dataInputStream, 1);
+startOffset += lineReader.readLine(new Text(), 0);
+  }
+  inputStreamReader = new BufferedReader(new InputStreamReader(
+  new BoundedDataStream(dataInputStream, endOffset - 
startOffset)));
--- End diff --

Can not find class BoundedDataStream


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #92: [CARBONDATA-176] Deletion of compacte...

2016-08-29 Thread ravikiran23
Github user ravikiran23 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/92#discussion_r76581081
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java
 ---
@@ -410,6 +410,14 @@ public void writeLoadDetailsIntoFile(String 
dataLoadLocation,
   for (LoadMetadataDetails loadMetadata : 
listOfLoadFolderDetailsArray) {
 
 if (loadId.equalsIgnoreCase(loadMetadata.getLoadName())) {
+  // if the segment is compacted then no need to delete that.
+  if (CarbonCommonConstants.SEGMENT_COMPACTED
+  .equalsIgnoreCase(loadMetadata.getLoadStatus())) {
+LOG.error("Cannot delete the load which is compacted.");
--- End diff --

Here logs will be only in case of user is trying to delete the compacted 
loads intentionally using delete segment DDL.  so it is ok to add the segment 
ID in logs.
Fixing that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #103: Fix the bug that when using Decimal ...

2016-08-29 Thread Zhangshunyu
GitHub user Zhangshunyu opened a pull request:

https://github.com/apache/incubator-carbondata/pull/103

Fix the bug that when using Decimal type as dictionary gen surrogate key 
will mismatch for the same values during increment load.

## Why raise this pr?
**Fix bug: when using Decimal type as dictionary gen surrogate key will 
mismatch for the same values during increment load.**
For example, when we specify Decimal type column using dictionary, as the 
using of `DataTypeUtil.normalizeColumnValueForItsDataType`, deciaml data for 
example 45, if we specify the precision of this column as 3, parsedValue would 
be 45.000, and this  45.000 would be written into dic file by 
writer.write(parsedValue). As a result, the second time we load the same data 
45, dictionary.getSurrogateKey(value) would compare the value with dic value, 
but here the value is 45, our dic value is 45.000 stored as string, so dic 
would think that i don not have 45, this would lead to repeated values in dic,  
this is a mistake.
How to solve this?
Before check the surrogate key, if the datatype is decimal, we first using 
his parsedValue as value to check, this would not take 45 itself as different 
value.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Zhangshunyu/incubator-carbondata decimalDic

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/103.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #103


commit 0403b9fe4ed32b9cbc4727b5a541cfccb089422e
Author: Zhangshunyu 
Date:   2016-08-29T08:29:54Z

Fix the bug that when Decimal type as dictionary gen surrogate key will 
mismatch for the same values




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


A warning when loading data

2016-08-29 Thread Zen Wellon
Hi guys,
When I tried to load some data into carbondata table with carbon 0.1.0, I
met a problem below.

WARN  29-08 15:40:17,535 - Lost task 10.0 in stage 2.1 (TID 365,
amlera-30-6.gtj): java.lang.RuntimeException: Dictionary file ***(sensitive
column) is locked for updation. Please try after some time
at scala.sys.package$.error(package.scala:27)
at
org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:354)
at
org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD.compute(CarbonGlobalDictionaryRDD.scala:294)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

-- 


Best regards,
William Zen


Re: [Exception] a thrift related problem occured when trying 0.0.1 release version

2016-08-29 Thread Zen Wellon
yes, I resolved this problem by deleting the old carbon metastore.

2016-08-27 0:18 GMT+08:00 Ravindra Pesala :

> Hi William,
>
> It may be because you are using old carbon store. Please try using new
> store path. There were changes in thrift so old store won't work on this
> release.
>
> Thanks & Regards,
> Ravi
>
> On 26 August 2016 at 21:05, Zen Wellon  wrote:
>
> > Hi, guys
> >
> > Congratulations for the first stable version !
> > Today I heard that 0.0.1 was released and build a fresh jar for my spark
> > cluster. But when I try to create a new table, an Exception occured,
> anyone
> > could help?
> >
> > below is the full stack:
> >
> > INFO  26-08 23:23:46,062 - Parsing command: create table if not exists
> > carbondata_001_release_test(..)
> > INFO  26-08 23:23:46,086 - Parse Completed
> > java.io.IOException: org.apache.thrift.protocol.TProtocolException:
> > Required field 'fact_table' was not present! Struct:
> > TableInfo(fact_table:null, aggregate_table_list:null)
> > at
> > org.apache.carbondata.core.reader.ThriftReader.read(
> ThriftReader.java:110)
> > at
> > org.apache.spark.sql.hive.CarbonMetastoreCatalog$$
> anonfun$fillMetaData$1$$
> > anonfun$apply$1.apply(CarbonMetastoreCatalog.scala:216)
> > at
> > org.apache.spark.sql.hive.CarbonMetastoreCatalog$$
> anonfun$fillMetaData$1$$
> > anonfun$apply$1.apply(CarbonMetastoreCatalog.scala:196)
> > at
> > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.
> > scala:33)
> > at
> > scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
> > at
> > org.apache.spark.sql.hive.CarbonMetastoreCatalog$$
> > anonfun$fillMetaData$1.apply(CarbonMetastoreCatalog.scala:196)
> > at
> > org.apache.spark.sql.hive.CarbonMetastoreCatalog$$
> > anonfun$fillMetaData$1.apply(CarbonMetastoreCatalog.scala:191)
> > at
> > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.
> > scala:33)
> > at
> > scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
> > at
> > org.apache.spark.sql.hive.CarbonMetastoreCatalog.fillMetaData(
> > CarbonMetastoreCatalog.scala:191)
> > at
> > org.apache.spark.sql.hive.CarbonMetastoreCatalog.loadMetadata(
> > CarbonMetastoreCatalog.scala:177)
> > at
> > org.apache.spark.sql.hive.CarbonMetastoreCatalog.(
> > CarbonMetastoreCatalog.scala:112)
> > at
> > org.apache.spark.sql.CarbonContext$$anon$1.(
> CarbonContext.scala:70)
> > at
> > org.apache.spark.sql.CarbonContext.catalog$lzycompute(CarbonContext.
> > scala:70)
> > at
> > org.apache.spark.sql.CarbonContext.catalog(CarbonContext.scala:67)
> > at
> > org.apache.spark.sql.CarbonContext$$anon$2.(
> CarbonContext.scala:75)
> > at
> > org.apache.spark.sql.CarbonContext.analyzer$lzycompute(CarbonContext.
> > scala:75)
> > at
> > org.apache.spark.sql.CarbonContext.analyzer(CarbonContext.scala:74)
> > at
> > org.apache.spark.sql.execution.QueryExecution.
> > assertAnalyzed(QueryExecution.scala:34)
> > at org.apache.spark.sql.DataFrame.(DataFrame.scala:133)
> > at
> > org.apache.carbondata.spark.rdd.CarbonDataFrameRDD.(
> > CarbonDataFrameRDD.scala:23)
> > at org.apache.spark.sql.CarbonContext.sql(
> CarbonContext.scala:130)
> > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:35)
> > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:40)
> > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:42)
> > at $iwC$$iwC$$iwC$$iwC$$iwC.(:44)
> > at $iwC$$iwC$$iwC$$iwC.(:46)
> > at $iwC$$iwC$$iwC.(:48)
> > at $iwC$$iwC.(:50)
> > at $iwC.(:52)
> > at (:54)
> > at .(:58)
> > at .()
> > at .(:7)
> > at .()
> > at $print()
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> > sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:
> > 57)
> > at
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(
> > DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:606)
> > at
> > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(
> SparkIMain.scala:1065)
> > at
> > org.apache.spark.repl.SparkIMain$Request.loadAndRun(
> SparkIMain.scala:1346)
> > at
> > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
> > at org.apache.spark.repl.SparkIMain.interpret(
> > SparkIMain.scala:871)
> > at org.apache.spark.repl.SparkIMain.interpret(
> > SparkIMain.scala:819)
> > at
> > org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
> > at
> > org.apache.spark.repl.SparkILoop.interpretStartingWith(
> > SparkILoop.scala:902)
> > at org.apache.spark.repl.SparkILoop.command(SparkILoop.
> scala:814)
> > at
> > 

[GitHub] incubator-carbondata pull request #102: [CARBONDATA-186] Except compaction a...

2016-08-29 Thread nareshpr
GitHub user nareshpr opened a pull request:

https://github.com/apache/incubator-carbondata/pull/102

[CARBONDATA-186] Except compaction all other alter operations on carbon 
table will be unsupported.

Reason: As Carbon table will not support alter operations except 
compaction, all the alter operations on carbon table should be skipped and 
error message should be displayed as "Unsupported alter operation on carbon 
table"
Whereas if the alter operation is on hive table, it should be transferred 
to hive for performing the operation.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nareshpr/incubator-carbondata 
altertableunsupportedoperations

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/102.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #102


commit 8ef206511fd1b8d83de63308fc338f85688f6451
Author: nareshpr 
Date:   2016-08-29T06:57:23Z

Alter operations on carbon table will be unsupported.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (CARBONDATA-186) Exception Compaction all other alter operations on carbon table should not be performed.

2016-08-29 Thread Naresh P R (JIRA)
Naresh P R created CARBONDATA-186:
-

 Summary: Exception Compaction all other alter operations on carbon 
table should not be performed.
 Key: CARBONDATA-186
 URL: https://issues.apache.org/jira/browse/CARBONDATA-186
 Project: CarbonData
  Issue Type: Bug
Reporter: Naresh P R
Priority: Minor


As Carbon table will not support alter operations exception compaction, all the 
alter operations on carbon table should be skipped and error message should be 
displayed as "Unsupported alter operation on carbon table"

Whereas if the alter operation is on hive table, it should be transferred to 
hive for performing the operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)