[GitHub] [carbondata] Indhumathi27 commented on pull request #3809: [CARBONDATA-3881] Fix concurrent main table compaction and SI load issue

2020-07-02 Thread GitBox


Indhumathi27 commented on pull request #3809:
URL: https://github.com/apache/carbondata/pull/3809#issuecomment-652824236


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3817: [CARBONDATA-3845] Bucket table creation fails with exception for empt…

2020-07-02 Thread GitBox


Indhumathi27 commented on a change in pull request #3817:
URL: https://github.com/apache/carbondata/pull/3817#discussion_r448797371



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/CarbonOption.scala
##
@@ -17,6 +17,8 @@
 
 package org.apache.carbondata.spark
 
+import scala.util.Try
+

Review comment:
   Please remove extra line





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3817: [CARBONDATA-3845] Bucket table creation fails with exception for empt…

2020-07-02 Thread GitBox


Indhumathi27 commented on a change in pull request #3817:
URL: https://github.com/apache/carbondata/pull/3817#discussion_r448797700



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/parser/CarbonSpark2SqlParser.scala
##
@@ -766,13 +766,13 @@ class CarbonSpark2SqlParser extends CarbonDDLSqlParser {
   throw new MalformedCarbonCommandException("Invalid table properties")
 }
 if (options.isBucketingEnabled) {
-  if (options.bucketNumber.toString.contains("-") ||
-  options.bucketNumber.toString.contains("+") ||  options.bucketNumber 
== 0) {
+  if (options.bucketNumber == None || 
options.bucketNumber.get.toString.contains("-") ||

Review comment:
   ```suggestion
 if (options.bucketNumber.isEmpty || 
options.bucketNumber.get.toString.contains("-") ||
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-07-02 Thread GitBox


Indhumathi27 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r448811521



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ParquetCarbonWriter.java
##
@@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.sdk.file;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.List;
+
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.parquet.avro.AvroReadSupport;
+import org.apache.parquet.hadoop.ParquetReader;
+
+/**
+ * Implementation to write parquet rows in avro format to carbondata file.
+ */
+public class ParquetCarbonWriter extends AvroCarbonWriter {
+  private AvroCarbonWriter avroCarbonWriter = null;
+  private String filePath = "";
+  private boolean isDirectory = false;
+  private List fileList;
+
+  ParquetCarbonWriter(AvroCarbonWriter avroCarbonWriter) {
+this.avroCarbonWriter = avroCarbonWriter;
+  }
+
+  @Override
+  public void setFilePath(String filePath) {
+this.filePath = filePath;
+  }
+
+  @Override
+  public void setIsDirectory(boolean isDirectory) {
+this.isDirectory = isDirectory;
+  }
+
+  @Override
+  public void setFileList(List fileList) {
+this.fileList = fileList;
+  }
+
+  /**
+   * Load data of all parquet files at given location iteratively.
+   *
+   * @throws IOException
+   */
+  @Override
+  public void write() throws IOException {
+if (this.filePath.length() == 0) {
+  throw new RuntimeException("'withParquetPath()' " +
+  "must be called to support load parquet files");
+}
+if (this.avroCarbonWriter == null) {
+  throw new RuntimeException("avro carbon writer can not be null");
+}
+if (this.isDirectory) {
+  if (this.fileList == null || this.fileList.size() == 0) {
+File[] dataFiles = new File(this.filePath).listFiles();
+if (dataFiles == null || dataFiles.length == 0) {
+  throw new RuntimeException("No Parquet file found at given location. 
Please provide " +
+  "the correct folder location.");
+}
+Arrays.sort(dataFiles);
+for (File dataFile : dataFiles) {
+  this.loadSingleFile(dataFile);
+}
+  } else {
+for (String file : this.fileList) {
+  this.loadSingleFile(new File(this.filePath + "/" + file));
+}
+  }
+} else {
+  this.loadSingleFile(new File(this.filePath));
+}
+  }
+
+  private void loadSingleFile(File file) throws IOException {
+AvroReadSupport avroReadSupport = new AvroReadSupport<>();
+ParquetReader parquetReader = 
ParquetReader.builder(avroReadSupport,
+new Path(String.valueOf(file))).withConf(new Configuration()).build();
+GenericRecord genericRecord = null;
+while ((genericRecord = parquetReader.read()) != null) {
+  System.out.println(genericRecord);

Review comment:
   remove this line

##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ORCCarbonWriter.java
##
@@ -0,0 +1,196 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.sdk.file;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.*;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.a

[GitHub] [carbondata] Indhumathi27 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-07-02 Thread GitBox


Indhumathi27 commented on pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#issuecomment-652849540


   @nihal0107 Please remove unused binary files from this PR



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-07-02 Thread GitBox


Indhumathi27 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r448835565



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -594,6 +613,332 @@ public CarbonWriterBuilder withJsonInput(Schema 
carbonSchema) {
 return this;
   }
 
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading CSV files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath) {
+if (filePath.length() == 0) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+this.filePath = filePath;
+this.isDirectory = new File(filePath).isDirectory();
+this.withCsvInput();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts CSV files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the CSV file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath, List 
fileList) {
+this.fileList = fileList;
+this.withCsvPath(filePath);
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading Parquet files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath) throws 
IOException {
+if (filePath.length() == 0) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+this.filePath = filePath;
+this.isDirectory = new File(filePath).isDirectory();
+this.writerType = WRITER_TYPE.PARQUET;
+this.buildParquetReader();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts parquet files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the parquet file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   * @throws IOException
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withParquetPath(filePath);
+return this;
+  }
+
+  private void buildParquetReader() throws IOException {
+AvroReadSupport avroReadSupport = new AvroReadSupport<>();
+ParquetReader parquetReader;
+if (this.isDirectory) {
+  if (this.fileList == null || this.fileList.size() == 0) {
+File[] dataFiles = new File(this.filePath).listFiles();
+if (dataFiles == null || dataFiles.length == 0) {
+  throw new RuntimeException("No Parquet file found at given location. 
Please provide" +
+  "the correct folder location.");
+}
+parquetReader = ParquetReader.builder(avroReadSupport,

Review comment:
   Please check below points
   1. If filePath consists of other files such as orc or csv or avro, and you 
are trying to build parquet reader with those files
   2. When writer.write is called, you are doing listFiles and directly trying 
to create respective readers. This may fail, if user adds a non-parquet to 
filePath after building CarbonWriterBuilder





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-07-02 Thread GitBox


Indhumathi27 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r448835565



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -594,6 +613,332 @@ public CarbonWriterBuilder withJsonInput(Schema 
carbonSchema) {
 return this;
   }
 
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading CSV files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath) {
+if (filePath.length() == 0) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+this.filePath = filePath;
+this.isDirectory = new File(filePath).isDirectory();
+this.withCsvInput();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts CSV files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the CSV file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath, List 
fileList) {
+this.fileList = fileList;
+this.withCsvPath(filePath);
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading Parquet files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath) throws 
IOException {
+if (filePath.length() == 0) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+this.filePath = filePath;
+this.isDirectory = new File(filePath).isDirectory();
+this.writerType = WRITER_TYPE.PARQUET;
+this.buildParquetReader();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts parquet files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the parquet file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   * @throws IOException
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withParquetPath(filePath);
+return this;
+  }
+
+  private void buildParquetReader() throws IOException {
+AvroReadSupport avroReadSupport = new AvroReadSupport<>();
+ParquetReader parquetReader;
+if (this.isDirectory) {
+  if (this.fileList == null || this.fileList.size() == 0) {
+File[] dataFiles = new File(this.filePath).listFiles();
+if (dataFiles == null || dataFiles.length == 0) {
+  throw new RuntimeException("No Parquet file found at given location. 
Please provide" +
+  "the correct folder location.");
+}
+parquetReader = ParquetReader.builder(avroReadSupport,

Review comment:
   Please check below points
   1. If filePath consists of other files such as orc or csv or avro, and you 
are trying to build parquet reader with those files, which throws error. Better 
to check FileFormat also
   2. When writer.write is called, you are doing listFiles and directly trying 
to create respective readers. This may fail, if user adds a non-parquet to 
filePath after building CarbonWriterBuilder





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-07-02 Thread GitBox


Indhumathi27 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r448837383



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -594,6 +613,332 @@ public CarbonWriterBuilder withJsonInput(Schema 
carbonSchema) {
 return this;
   }
 
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading CSV files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath) {
+if (filePath.length() == 0) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+this.filePath = filePath;
+this.isDirectory = new File(filePath).isDirectory();
+this.withCsvInput();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts CSV files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the CSV file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath, List 
fileList) {
+this.fileList = fileList;
+this.withCsvPath(filePath);
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading Parquet files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath) throws 
IOException {
+if (filePath.length() == 0) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+this.filePath = filePath;
+this.isDirectory = new File(filePath).isDirectory();
+this.writerType = WRITER_TYPE.PARQUET;
+this.buildParquetReader();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts parquet files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the parquet file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   * @throws IOException
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withParquetPath(filePath);
+return this;
+  }
+
+  private void buildParquetReader() throws IOException {
+AvroReadSupport avroReadSupport = new AvroReadSupport<>();
+ParquetReader parquetReader;
+if (this.isDirectory) {
+  if (this.fileList == null || this.fileList.size() == 0) {
+File[] dataFiles = new File(this.filePath).listFiles();
+if (dataFiles == null || dataFiles.length == 0) {
+  throw new RuntimeException("No Parquet file found at given location. 
Please provide" +
+  "the correct folder location.");
+}
+parquetReader = ParquetReader.builder(avroReadSupport,
+new Path(String.valueOf(dataFiles[0]))).build();
+  } else {
+parquetReader = ParquetReader.builder(avroReadSupport,

Review comment:
   What if files has different schema in a directory? How is it handled?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-07-02 Thread GitBox


Indhumathi27 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r448840587



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -594,6 +613,332 @@ public CarbonWriterBuilder withJsonInput(Schema 
carbonSchema) {
 return this;
   }
 
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading CSV files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath) {
+if (filePath.length() == 0) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+this.filePath = filePath;
+this.isDirectory = new File(filePath).isDirectory();
+this.withCsvInput();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts CSV files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the CSV file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath, List 
fileList) {
+this.fileList = fileList;
+this.withCsvPath(filePath);
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading Parquet files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath) throws 
IOException {
+if (filePath.length() == 0) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+this.filePath = filePath;
+this.isDirectory = new File(filePath).isDirectory();
+this.writerType = WRITER_TYPE.PARQUET;
+this.buildParquetReader();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts parquet files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the parquet file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   * @throws IOException
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withParquetPath(filePath);
+return this;
+  }
+
+  private void buildParquetReader() throws IOException {
+AvroReadSupport avroReadSupport = new AvroReadSupport<>();
+ParquetReader parquetReader;
+if (this.isDirectory) {
+  if (this.fileList == null || this.fileList.size() == 0) {
+File[] dataFiles = new File(this.filePath).listFiles();
+if (dataFiles == null || dataFiles.length == 0) {
+  throw new RuntimeException("No Parquet file found at given location. 
Please provide" +
+  "the correct folder location.");
+}
+parquetReader = ParquetReader.builder(avroReadSupport,
+new Path(String.valueOf(dataFiles[0]))).build();
+  } else {
+parquetReader = ParquetReader.builder(avroReadSupport,
+new Path(this.filePath + "/" + this.fileList.get(0))).build();

Review comment:
   What if fileList does not exists in filePath. Better to catch and throw 
File does not exists exception





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-07-02 Thread GitBox


Indhumathi27 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r448842092



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -594,6 +613,332 @@ public CarbonWriterBuilder withJsonInput(Schema 
carbonSchema) {
 return this;
   }
 
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading CSV files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath) {
+if (filePath.length() == 0) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+this.filePath = filePath;
+this.isDirectory = new File(filePath).isDirectory();
+this.withCsvInput();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts CSV files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the CSV file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath, List 
fileList) {
+this.fileList = fileList;
+this.withCsvPath(filePath);
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading Parquet files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath) throws 
IOException {
+if (filePath.length() == 0) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+this.filePath = filePath;
+this.isDirectory = new File(filePath).isDirectory();
+this.writerType = WRITER_TYPE.PARQUET;
+this.buildParquetReader();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts parquet files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the parquet file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   * @throws IOException
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withParquetPath(filePath);
+return this;
+  }
+
+  private void buildParquetReader() throws IOException {
+AvroReadSupport avroReadSupport = new AvroReadSupport<>();
+ParquetReader parquetReader;
+if (this.isDirectory) {
+  if (this.fileList == null || this.fileList.size() == 0) {
+File[] dataFiles = new File(this.filePath).listFiles();
+if (dataFiles == null || dataFiles.length == 0) {
+  throw new RuntimeException("No Parquet file found at given location. 
Please provide" +
+  "the correct folder location.");
+}
+parquetReader = ParquetReader.builder(avroReadSupport,
+new Path(String.valueOf(dataFiles[0]))).build();
+  } else {
+parquetReader = ParquetReader.builder(avroReadSupport,
+new Path(this.filePath + "/" + this.fileList.get(0))).build();
+  }
+} else {
+  parquetReader = ParquetReader.builder(avroReadSupport,
+  new Path(this.filePath)).build();

Review comment:
   Same as previous comment





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-07-02 Thread GitBox


Indhumathi27 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r448842092



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -594,6 +613,332 @@ public CarbonWriterBuilder withJsonInput(Schema 
carbonSchema) {
 return this;
   }
 
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading CSV files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath) {
+if (filePath.length() == 0) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+this.filePath = filePath;
+this.isDirectory = new File(filePath).isDirectory();
+this.withCsvInput();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts CSV files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the CSV file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath, List 
fileList) {
+this.fileList = fileList;
+this.withCsvPath(filePath);
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading Parquet files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath) throws 
IOException {
+if (filePath.length() == 0) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+this.filePath = filePath;
+this.isDirectory = new File(filePath).isDirectory();
+this.writerType = WRITER_TYPE.PARQUET;
+this.buildParquetReader();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts parquet files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the parquet file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   * @throws IOException
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withParquetPath(filePath);
+return this;
+  }
+
+  private void buildParquetReader() throws IOException {
+AvroReadSupport avroReadSupport = new AvroReadSupport<>();
+ParquetReader parquetReader;
+if (this.isDirectory) {
+  if (this.fileList == null || this.fileList.size() == 0) {
+File[] dataFiles = new File(this.filePath).listFiles();
+if (dataFiles == null || dataFiles.length == 0) {
+  throw new RuntimeException("No Parquet file found at given location. 
Please provide" +
+  "the correct folder location.");
+}
+parquetReader = ParquetReader.builder(avroReadSupport,
+new Path(String.valueOf(dataFiles[0]))).build();
+  } else {
+parquetReader = ParquetReader.builder(avroReadSupport,
+new Path(this.filePath + "/" + this.fileList.get(0))).build();
+  }
+} else {
+  parquetReader = ParquetReader.builder(avroReadSupport,
+  new Path(this.filePath)).build();

Review comment:
   Same as previous comment for FilePAth does not exists





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3809: [CARBONDATA-3881] Fix concurrent main table compaction and SI load issue

2020-07-02 Thread GitBox


CarbonDataQA1 commented on pull request #3809:
URL: https://github.com/apache/carbondata/pull/3809#issuecomment-652896458


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3290/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3809: [CARBONDATA-3881] Fix concurrent main table compaction and SI load issue

2020-07-02 Thread GitBox


CarbonDataQA1 commented on pull request #3809:
URL: https://github.com/apache/carbondata/pull/3809#issuecomment-652896859


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1553/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on pull request #3809: [CARBONDATA-3881] Fix concurrent main table compaction and SI load issue

2020-07-02 Thread GitBox


Indhumathi27 commented on pull request #3809:
URL: https://github.com/apache/carbondata/pull/3809#issuecomment-652998410


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3817: [CARBONDATA-3845] Bucket table creation fails with exception for empt…

2020-07-02 Thread GitBox


Indhumathi27 commented on a change in pull request #3817:
URL: https://github.com/apache/carbondata/pull/3817#discussion_r448997627



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/parser/CarbonSpark2SqlParser.scala
##
@@ -766,13 +766,13 @@ class CarbonSpark2SqlParser extends CarbonDDLSqlParser {
   throw new MalformedCarbonCommandException("Invalid table properties")
 }
 if (options.isBucketingEnabled) {
-  if (options.bucketNumber.toString.contains("-") ||
-  options.bucketNumber.toString.contains("+") ||  options.bucketNumber 
== 0) {
+  if (options.bucketNumber.isEmpty || 
options.bucketNumber.get.toString.contains("-") ||

Review comment:
   Since you have wrapped options with Try, i guess in case if Bucket 
number as "+" and "-", it will be empty. Can check and remove those checks

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/parser/CarbonSpark2SqlParser.scala
##
@@ -766,13 +766,13 @@ class CarbonSpark2SqlParser extends CarbonDDLSqlParser {
   throw new MalformedCarbonCommandException("Invalid table properties")
 }
 if (options.isBucketingEnabled) {
-  if (options.bucketNumber.toString.contains("-") ||
-  options.bucketNumber.toString.contains("+") ||  options.bucketNumber 
== 0) {
+  if (options.bucketNumber.isEmpty || 
options.bucketNumber.get.toString.contains("-") ||

Review comment:
   Since you have wrapped bucket options with Try, i guess in case if 
Bucket number as "+" and "-", it will be empty. Can check and remove those 
checks





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai opened a new pull request #3820: [WIP] Improve filter performance on decimal column

2020-07-02 Thread GitBox


QiangCai opened a new pull request #3820:
URL: https://github.com/apache/carbondata/pull/3820


### Why is this PR needed?


### What changes were proposed in this PR?
   
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3817: [CARBONDATA-3845] Bucket table creation fails with exception for empt…

2020-07-02 Thread GitBox


ShreelekhyaG commented on a change in pull request #3817:
URL: https://github.com/apache/carbondata/pull/3817#discussion_r449060268



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/parser/CarbonSpark2SqlParser.scala
##
@@ -766,13 +766,13 @@ class CarbonSpark2SqlParser extends CarbonDDLSqlParser {
   throw new MalformedCarbonCommandException("Invalid table properties")
 }
 if (options.isBucketingEnabled) {
-  if (options.bucketNumber.toString.contains("-") ||
-  options.bucketNumber.toString.contains("+") ||  options.bucketNumber 
== 0) {
+  if (options.bucketNumber == None || 
options.bucketNumber.get.toString.contains("-") ||
+  options.bucketNumber.get.toString.contains("+") ||  
options.bucketNumber.get == 0) {

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3817: [CARBONDATA-3845] Bucket table creation fails with exception for empt…

2020-07-02 Thread GitBox


CarbonDataQA1 commented on pull request #3817:
URL: https://github.com/apache/carbondata/pull/3817#issuecomment-653058106


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3291/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3817: [CARBONDATA-3845] Bucket table creation fails with exception for empt…

2020-07-02 Thread GitBox


CarbonDataQA1 commented on pull request #3817:
URL: https://github.com/apache/carbondata/pull/3817#issuecomment-653059304


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1554/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3773: [CARBONDATA-3830]Presto complex columns read support

2020-07-02 Thread GitBox


CarbonDataQA1 commented on pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#issuecomment-653065406


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1555/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3817: [CARBONDATA-3845] Bucket table creation fails with exception for empt…

2020-07-02 Thread GitBox


ShreelekhyaG commented on a change in pull request #3817:
URL: https://github.com/apache/carbondata/pull/3817#discussion_r449076519



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/parser/CarbonSpark2SqlParser.scala
##
@@ -766,13 +766,13 @@ class CarbonSpark2SqlParser extends CarbonDDLSqlParser {
   throw new MalformedCarbonCommandException("Invalid table properties")
 }
 if (options.isBucketingEnabled) {
-  if (options.bucketNumber.toString.contains("-") ||
-  options.bucketNumber.toString.contains("+") ||  options.bucketNumber 
== 0) {
+  if (options.bucketNumber.isEmpty || 
options.bucketNumber.get.toString.contains("-") ||

Review comment:
The check for "-" is needed to avoid negative values as input. The case 
for "+" is not required and removed the check for it. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3820: [WIP] Improve filter performance on decimal column

2020-07-02 Thread GitBox


CarbonDataQA1 commented on pull request #3820:
URL: https://github.com/apache/carbondata/pull/3820#issuecomment-653084576


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3293/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3773: [CARBONDATA-3830]Presto complex columns read support

2020-07-02 Thread GitBox


CarbonDataQA1 commented on pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#issuecomment-653085499


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3292/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3820: [WIP] Improve filter performance on decimal column

2020-07-02 Thread GitBox


CarbonDataQA1 commented on pull request #3820:
URL: https://github.com/apache/carbondata/pull/3820#issuecomment-653088148


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1556/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3817: [CARBONDATA-3845] Bucket table creation fails with exception for empt…

2020-07-02 Thread GitBox


CarbonDataQA1 commented on pull request #3817:
URL: https://github.com/apache/carbondata/pull/3817#issuecomment-653131655


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3294/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3817: [CARBONDATA-3845] Bucket table creation fails with exception for empt…

2020-07-02 Thread GitBox


CarbonDataQA1 commented on pull request #3817:
URL: https://github.com/apache/carbondata/pull/3817#issuecomment-653131882


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1557/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akkio-97 commented on pull request #3773: [CARBONDATA-3830]Presto complex columns read support

2020-07-02 Thread GitBox


akkio-97 commented on pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#issuecomment-653212349


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3773: [CARBONDATA-3830]Presto complex columns read support

2020-07-02 Thread GitBox


CarbonDataQA1 commented on pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#issuecomment-653254442


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3295/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3773: [CARBONDATA-3830]Presto complex columns read support

2020-07-02 Thread GitBox


CarbonDataQA1 commented on pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#issuecomment-653254762


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1558/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat opened a new pull request #3821: [WIP] Use qualified table name for global sort compaction

2020-07-02 Thread GitBox


ajantha-bhat opened a new pull request #3821:
URL: https://github.com/apache/carbondata/pull/3821


### Why is this PR needed?


### What changes were proposed in this PR?
   
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3817: [CARBONDATA-3845] Bucket table creation fails with exception for empt…

2020-07-02 Thread GitBox


Indhumathi27 commented on a change in pull request #3817:
URL: https://github.com/apache/carbondata/pull/3817#discussion_r449371217



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/parser/CarbonSpark2SqlParser.scala
##
@@ -766,13 +766,13 @@ class CarbonSpark2SqlParser extends CarbonDDLSqlParser {
   throw new MalformedCarbonCommandException("Invalid table properties")
 }
 if (options.isBucketingEnabled) {
-  if (options.bucketNumber.toString.contains("-") ||
-  options.bucketNumber.toString.contains("+") ||  options.bucketNumber 
== 0) {
+  if (options.bucketNumber.isEmpty || 
options.bucketNumber.get.toString.contains("-")
+||  options.bucketNumber.get == 0) {

Review comment:
   Please format these two lines





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3886) Global sort compaction not using qualified name

2020-07-02 Thread Ajantha Bhat (Jira)
Ajantha Bhat created CARBONDATA-3886:


 Summary: Global sort compaction not using qualified name
 Key: CARBONDATA-3886
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3886
 Project: CarbonData
  Issue Type: Bug
Reporter: Ajantha Bhat
Assignee: Ajantha Bhat


problem : 

Global sort compaction is not using database name while making database. 

some times it uses default database when spark cannot match this table name 
belong to which database.

 

solution:

Use qualified table name (dbname + table name) while creating a datafame.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.

2020-07-02 Thread GitBox


kumarvishal09 commented on a change in pull request #3776:
URL: https://github.com/apache/carbondata/pull/3776#discussion_r449384678



##
File path: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java
##
@@ -302,6 +318,61 @@ private void commitJobForPartition(JobContext context, 
boolean overwriteSet,
 commitJobFinal(context, loadModel, operationContext, carbonTable, 
uniqueId);
   }
 
+  /**
+   * Method to create and write the segment file, removes the temporary 
directories from all the
+   * respective partition directories. This method is invoked only when {@link
+   * CarbonCommonConstants#CARBON_MERGE_INDEX_IN_SEGMENT} is disabled.
+   * @param context Job context
+   * @param loadModel Load model
+   * @param segmentFileName Segment file name to write
+   * @param partitionPath Serialized list of partition location
+   * @throws IOException
+   */
+  @SuppressWarnings("unchecked")
+  private void writeSegmentWithoutMergeIndex(JobContext context, 
CarbonLoadModel loadModel,
+  String segmentFileName, String partitionPath) throws IOException {
+Map indexFileNameMap = (Map) 
ObjectSerializationUtil
+
.convertStringToObject(context.getConfiguration().get("carbon.index.files.name"));
+List partitionList =
+(List) 
ObjectSerializationUtil.convertStringToObject(partitionPath);
+SegmentFileStore.SegmentFile finalSegmentFile = null;
+boolean isRelativePath;
+String partitionLoc;
+for (String partition : partitionList) {
+  isRelativePath = false;
+  partitionLoc = partition;
+  if (partitionLoc.startsWith(loadModel.getTablePath())) {
+partitionLoc = 
partitionLoc.substring(loadModel.getTablePath().length());
+isRelativePath = true;
+  }
+  SegmentFileStore.SegmentFile segmentFile = new 
SegmentFileStore.SegmentFile();
+  SegmentFileStore.FolderDetails folderDetails = new 
SegmentFileStore.FolderDetails();
+  
folderDetails.setFiles(Collections.singleton(indexFileNameMap.get(partition)));
+  folderDetails.setPartitions(
+  
Collections.singletonList(partitionLoc.substring(partitionLoc.indexOf("/") + 
1)));
+  folderDetails.setRelative(isRelativePath);
+  folderDetails.setStatus(SegmentStatus.SUCCESS.getMessage());
+  segmentFile.getLocationMap().put(partitionLoc, folderDetails);
+  if (finalSegmentFile != null) {
+finalSegmentFile = finalSegmentFile.merge(segmentFile);
+  } else {
+finalSegmentFile = segmentFile;
+  }
+}
+Objects.requireNonNull(finalSegmentFile);
+String segmentFilesLocation =

Review comment:
   its better to move this code inside SegmentFileStore itself, pass the 
table path and segment file name and internally it  will handle folder 
creation. Pls check may be its already present 
   String segmentFilesLocation =
   CarbonTablePath.getSegmentFilesLocation(loadModel.getTablePath());
   CarbonFile locationFile = 
FileFactory.getCarbonFile(segmentFilesLocation);
   if (!locationFile.exists()) {
 locationFile.mkdirs();
   }
   SegmentFileStore.writeSegmentFile(finalSegmentFile,
   segmentFilesLocation + "/" + segmentFileName + 
CarbonTablePath.SEGMENT_EXT);





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.

2020-07-02 Thread GitBox


kumarvishal09 commented on a change in pull request #3776:
URL: https://github.com/apache/carbondata/pull/3776#discussion_r449386951



##
File path: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java
##
@@ -302,6 +318,61 @@ private void commitJobForPartition(JobContext context, 
boolean overwriteSet,
 commitJobFinal(context, loadModel, operationContext, carbonTable, 
uniqueId);
   }
 
+  /**
+   * Method to create and write the segment file, removes the temporary 
directories from all the
+   * respective partition directories. This method is invoked only when {@link
+   * CarbonCommonConstants#CARBON_MERGE_INDEX_IN_SEGMENT} is disabled.
+   * @param context Job context
+   * @param loadModel Load model
+   * @param segmentFileName Segment file name to write
+   * @param partitionPath Serialized list of partition location
+   * @throws IOException
+   */
+  @SuppressWarnings("unchecked")
+  private void writeSegmentWithoutMergeIndex(JobContext context, 
CarbonLoadModel loadModel,
+  String segmentFileName, String partitionPath) throws IOException {
+Map indexFileNameMap = (Map) 
ObjectSerializationUtil
+
.convertStringToObject(context.getConfiguration().get("carbon.index.files.name"));
+List partitionList =
+(List) 
ObjectSerializationUtil.convertStringToObject(partitionPath);
+SegmentFileStore.SegmentFile finalSegmentFile = null;
+boolean isRelativePath;
+String partitionLoc;
+for (String partition : partitionList) {
+  isRelativePath = false;
+  partitionLoc = partition;
+  if (partitionLoc.startsWith(loadModel.getTablePath())) {
+partitionLoc = 
partitionLoc.substring(loadModel.getTablePath().length());
+isRelativePath = true;
+  }
+  SegmentFileStore.SegmentFile segmentFile = new 
SegmentFileStore.SegmentFile();
+  SegmentFileStore.FolderDetails folderDetails = new 
SegmentFileStore.FolderDetails();
+  
folderDetails.setFiles(Collections.singleton(indexFileNameMap.get(partition)));
+  folderDetails.setPartitions(
+  
Collections.singletonList(partitionLoc.substring(partitionLoc.indexOf("/") + 
1)));
+  folderDetails.setRelative(isRelativePath);
+  folderDetails.setStatus(SegmentStatus.SUCCESS.getMessage());
+  segmentFile.getLocationMap().put(partitionLoc, folderDetails);
+  if (finalSegmentFile != null) {

Review comment:
   @ajantha-bhat code looks fine, it's in a loop





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.m

2020-07-02 Thread GitBox


ajantha-bhat commented on a change in pull request #3776:
URL: https://github.com/apache/carbondata/pull/3776#discussion_r449387619



##
File path: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java
##
@@ -302,6 +318,61 @@ private void commitJobForPartition(JobContext context, 
boolean overwriteSet,
 commitJobFinal(context, loadModel, operationContext, carbonTable, 
uniqueId);
   }
 
+  /**
+   * Method to create and write the segment file, removes the temporary 
directories from all the
+   * respective partition directories. This method is invoked only when {@link
+   * CarbonCommonConstants#CARBON_MERGE_INDEX_IN_SEGMENT} is disabled.
+   * @param context Job context
+   * @param loadModel Load model
+   * @param segmentFileName Segment file name to write
+   * @param partitionPath Serialized list of partition location
+   * @throws IOException
+   */
+  @SuppressWarnings("unchecked")
+  private void writeSegmentWithoutMergeIndex(JobContext context, 
CarbonLoadModel loadModel,
+  String segmentFileName, String partitionPath) throws IOException {
+Map indexFileNameMap = (Map) 
ObjectSerializationUtil
+
.convertStringToObject(context.getConfiguration().get("carbon.index.files.name"));
+List partitionList =
+(List) 
ObjectSerializationUtil.convertStringToObject(partitionPath);
+SegmentFileStore.SegmentFile finalSegmentFile = null;
+boolean isRelativePath;
+String partitionLoc;
+for (String partition : partitionList) {
+  isRelativePath = false;
+  partitionLoc = partition;
+  if (partitionLoc.startsWith(loadModel.getTablePath())) {
+partitionLoc = 
partitionLoc.substring(loadModel.getTablePath().length());
+isRelativePath = true;
+  }
+  SegmentFileStore.SegmentFile segmentFile = new 
SegmentFileStore.SegmentFile();
+  SegmentFileStore.FolderDetails folderDetails = new 
SegmentFileStore.FolderDetails();
+  
folderDetails.setFiles(Collections.singleton(indexFileNameMap.get(partition)));
+  folderDetails.setPartitions(
+  
Collections.singletonList(partitionLoc.substring(partitionLoc.indexOf("/") + 
1)));
+  folderDetails.setRelative(isRelativePath);
+  folderDetails.setStatus(SegmentStatus.SUCCESS.getMessage());
+  segmentFile.getLocationMap().put(partitionLoc, folderDetails);
+  if (finalSegmentFile != null) {

Review comment:
   yesh, it is in loop. ignore this comment. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3821: [CARBONDATA-3886] Use qualified table name for global sort compaction

2020-07-02 Thread GitBox


CarbonDataQA1 commented on pull request #3821:
URL: https://github.com/apache/carbondata/pull/3821#issuecomment-653383759


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3296/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3821: [CARBONDATA-3886] Use qualified table name for global sort compaction

2020-07-02 Thread GitBox


CarbonDataQA1 commented on pull request #3821:
URL: https://github.com/apache/carbondata/pull/3821#issuecomment-653384017


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1559/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3821: [CARBONDATA-3886] Use qualified table name for global sort compaction

2020-07-02 Thread GitBox


ajantha-bhat commented on pull request #3821:
URL: https://github.com/apache/carbondata/pull/3821#issuecomment-653384296


   @jackylk : please check and merge



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] jackylk commented on a change in pull request #3821: [CARBONDATA-3886] Use qualified table name for global sort compaction

2020-07-02 Thread GitBox


jackylk commented on a change in pull request #3821:
URL: https://github.com/apache/carbondata/pull/3821#discussion_r449408180



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/util/SparkSQLUtil.scala
##
@@ -165,9 +165,9 @@ object SparkSQLUtil {
  * datatype of column data and corresponding datatype in schema provided 
to create dataframe.
  * Since carbonScanRDD gives Long data for timestamp column and 
corresponding column datatype in
  * schema is Timestamp, this validation fails if we use createDataFrame 
API which takes rdd as
- * input. Hence, using below API which creates dataframe from tablename.
+ * input. Hence, using below API which creates dataframe from qualified 
tablename.
  */
-sparkSession.sqlContext.table(carbonTable.getTableName)
+sparkSession.sqlContext.table(carbonTable.getDatabaseName + "." + 
carbonTable.getTableName)

Review comment:
   Is there a utility for this? I guess no need to construct it ourself





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3821: [CARBONDATA-3886] Use qualified table name for global sort compaction

2020-07-02 Thread GitBox


ajantha-bhat commented on a change in pull request #3821:
URL: https://github.com/apache/carbondata/pull/3821#discussion_r449409262



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/util/SparkSQLUtil.scala
##
@@ -165,9 +165,9 @@ object SparkSQLUtil {
  * datatype of column data and corresponding datatype in schema provided 
to create dataframe.
  * Since carbonScanRDD gives Long data for timestamp column and 
corresponding column datatype in
  * schema is Timestamp, this validation fails if we use createDataFrame 
API which takes rdd as
- * input. Hence, using below API which creates dataframe from tablename.
+ * input. Hence, using below API which creates dataframe from qualified 
tablename.
  */
-sparkSession.sqlContext.table(carbonTable.getTableName)
+sparkSession.sqlContext.table(carbonTable.getDatabaseName + "." + 
carbonTable.getTableName)

Review comment:
   qualified name is spark usage, so carbon doesn't have utility class for 
it. And carbon's uniqueTableName is dbname_tableName, so cannot use it either. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org