Thanks for your inputs Gabriel. I m thinking of like this, having the consistency to data loader as is and having a new parameter to extend the logic to populate null values for missing columns. Lets see how it goes :)
On Thu, Feb 12, 2015 at 12:17 PM, Gabriel Reid <gabriel.r...@gmail.com> wrote: > Hi Siva, > > If I'm not mistaken, the bulk loader will currently just drop rows > that don't have the correct number of fields. > > I'm actually in favor of this behavior (i.e. enforcing the consistency > of records in an input file). A CSV file is generally considered a > table with a given set of columns, so I think it's pretty reasonable > to consider a CSV record corrupt if it doesn't include the correct > number of columns. > > I think the workaround here (I think you already realized this) would > be to encode your input data as > > r1,c1,c2,c3, > r2,c1,c2,, > r2,c1,c2,c3,4 > > I realize that this makes your use case more complex, because it > requires processing on the file before loading it into Phoenix, but I > think that keeping the Phoenix loader relatively strict (and simple) > in this case is the better choice. > > - Gabriel > > On Thu, Feb 12, 2015 at 7:32 PM, Siva <sbhavan...@gmail.com> wrote: > > Hi Gabriel, > > > > Thanks for your response. > > > > Your understanding is correct. The usecase we have is, we get the data > from > > different sources (having different table structure (in terms of columns) > > based on client type) in csv format. if a column is not available in > source > > we dont have a choice to even append a blank comma (,) with in that > place. > > But in Hbase, it ignores the column if it dont find the data with in a > > record. > > > > if I have set of records like below, and if I specify as 4 columns > > (excluding row key), then for first record it inserts data for 3 columns, > > for 2nd record 2 cols, for 3rd record 4 column, it just ignore the > column if > > it dont find the data. > > > > r1,c1,c2,c3 > > r2,c1,c2 > > r2,c1,c2,c3,c4 > > > > > > Since phoenix doesn't have this capability, we have to create tables in > > HBase and load them through it. Use Phoenix just for sql queries. > > > > I think we should enhance the Phoenix data loader in the same way like > Hbase > > loader. What do you say, any thoughts on this? > > > > Thanks, > > Siva. > > > > On Wed, Feb 11, 2015 at 11:34 PM, Gabriel Reid <gabriel.r...@gmail.com> > > wrote: > >> > >> Hi Siva, > >> > >> If I understand correctly, you want to explicitly supply null values > >> in a CSV file for some fields. In general, this should work by just > >> leaving the field empty in your CSV file. For example, if you have > >> three fields (id, first_name, last_name) in your CSV file, then a > >> record like "1,,Reid" should create a record with first_name left as > >> null. > >> > >> Note that there is still an open bug, PHOENIX-1277 [1] that will > >> prevent inserting null values via the bulk loader or psql, so for some > >> datatypes there currently isn't a way to explicitly supply null > >> values. > >> > >> - Gabriel > >> > >> > >> 1. https://issues.apache.org/jira/browse/PHOENIX-1277 > >> > >> On Thu, Feb 12, 2015 at 1:28 AM, Siva <sbhavan...@gmail.com> wrote: > >> > Hello all, > >> > > >> > is there a way to specify to keep NULL values for the columns which > were > >> > not there in csv file as part of bulk loading? > >> > > >> > Requirement I have is, few row in csv file contains all the column, > but > >> > rows contain only few columns. > >> > > >> > In Hbase, if the given record doesnt have desired columns, it just > >> > ignore > >> > the columns and it goes for next record while loading the data from > >> > ImportTsv. > >> > > >> > > >> > > HADOOP_CLASSPATH=/usr/hdp/2.2.0.0-2041/hbase/lib/hbase-protocol.jar:/usr/hdp/2.2.0.0-2041/hbase/conf > >> > hadoop jar > >> > /usr/hdp/2.2.0.0-2041/phoenix/phoenix-4.2.0.2.2.0.0-2041-client.jar > >> > org.apache.phoenix.mapreduce.CsvBulkLoadTool --table P_TEST_2_COLS > >> > --input > >> > /user/sbhavanari/p_h_test_2_cols_less.csv --import-columns > NAME,LEADID,D > >> > --zookeeper 172.31.45.176:2181:/hbase > >> > > >> > Thanks, > >> > Siva. > > > > >